First time using 1P CLI (0.5.4) - why is this so slow?

Hi,

I've just heard of 1P CLI and it sounds like it could resolve major pains for me, so thanks for making it!

However, I'm experiencing serious slowness: I timed op list items at 3.9 seconds (real), op get item <blah> at 4.4 seconds (real). For my use case, I would need this to be closer to 100-200ms at most. Is there a reason for this slowness? Is it specific to me or is everyone dealing with these speeds?

Thanks Ahead!


1Password Version: 0.5.4 (CLI)
Extension Version: Not Provided
OS Version: 10.14.1 Beta (18B50c)
Sync Type: Not Provided

Comments

  • cohixcohix

    Team Member

    @motak Thanks for writing in. This is indeed something we've seen from several users.

    This is something we're actively working on, keep an eye out for upcoming releases.

  • This is something we're actively working on, keep an eye out for upcoming releases.

    I've seen this response on several threads; is there an op announcement mailing list or RSS feed or anything like that I could sign up for to be notified of new releases? Thanks!

  • cohixcohix

    Team Member

    @c__,

    We do not have a mailing list at this time.

  • I'm using version 0.5.7 and it is still SUPER slow, a simple get item requires 30s to complete. btw. my all vaults contains 2000 items. Will it be faster if I specific the vault?

    λ date; .\op.exe get item 'gqw2xx4uxq***************; date
    Mon Jul 15 17:56:02 CST 2019
    {"uuid":"gqw2xx4uxqaizlleu3kt4icl*************************************************************

    Mon Jul 15 17:56:34 CST 2019

  • cohixcohix

    Team Member

    @tangyouze yes, adding the --vault flag will indeed speed that up a bit.

    We have identified the performance bottlenecks and we have a plan to address them, it's just a matter of getting the bandwidth to implement those plans, which is proving difficult at the moment, but I promise we have not forgotten about this.

  • ARM77ARM77
    edited November 2019

    Any progress on this issue? I'm using 1P to get credentials ("op get item") and it takes at least one second per credential:

    $ time ./one_password_test.sh 3
    real 0m3.176s
    $ op --version
    0.8.0

  • felix_1pfelix_1p

    Team Member

    @ARM77 , how are you fetching the item? If you regularly fetch a specific item (e.g. in a script) I suggest to fetch the item by UUID. That should be the fasted way.

  • @felix_1p - It's only a marginal improvement. Database access (which is what 1Password is essentially) should be much faster:

    $ echo $NAME : $UUID
    DockerHub : wsdfnr73tzekdslkdtrb7h2kxyae

    $ time op get item $NAME --vault=$VAULT > /dev/null

    real 0m1.139s
    user 0m0.112s
    sys 0m0.049s

    $ time op get item $UUID --vault=$VAULT > /dev/null

    real 0m0.973s
    user 0m0.111s
    sys 0m0.049s

  • cohixcohix

    Team Member

    @ARM77 have you updated to the 0.8 release yet?

  • $ op --version
    0.8.0

  • felix_1pfelix_1p

    Team Member

    @ARM77, thank you for the numbers! I will do some investigating but there is also a limit to our ability to improve performance here. While you are right that it's basically a database lookup, there are a number of things happening:

    Fetching an item by UUID is more effective compared to fetching by name if you have many items in your vault.

  • @Arm77 I discovered and reported to the support team that the CLI performance is direclty impacted in a huge manner by the number of certificates you have installed locally (From 0.5s to 7-8s per request).

    My development workflow generates a lot of self-signed certificates that I trust in the keychain. I have to cleanup those certificates regularly to keep a decent CLI performance.

  • Just to add my 2ct; I came here searching for a solution to the general slowness of "op" as well.

    I would like to use the CLI in my command-line workflow, especially for setting secret token variables when a new shell is started, but with it taking 2 seconds per request, that's too big of a disruption to my workflow.

    I tried using both names and uuids, but for me on average, both take about 2 seconds per request. This is using op 0.8.0.

    I've used other secret stores such as Vault by Hashicorp (https://www.vaultproject.io), and pass (https://www.passwordstore.org) which performed any request sub-second, usually in the 100ms range. I know it's not exactly comparing apples to apples, but especially the former definitely has similar characteristics to 1Password (encrypted data, encrypted traffic, database with secrets, remote data access).

    I hope Agilebits can find the time needed to improve the performance of the CLI, even though I realise probably only a small part of their user-base actually uses the tool. Unfortunately the "This is something we're actively working on" comment from 15 months ago doesn't exactly inspire confidence in a timely solution, but one can hope.

  • c__c__
    edited January 13

    Ditto with op 0.8.0, listing items takes 1.17s real (but only 0.34s user time (CPU) and ~0 sys time). So it seems like the program is sleeping for an extra 0.8 seconds for some reason. Fetching a single item by UUID takes 1.3s real (longer than grabbing the entire list!) and still much less CPU time (0.3 user, 0.1 sys). So it also looks like 900ms are spent just sleeping. These numbers are repeatable; it's not like I have a stale disk cache or something.

    DETAILS

    truss (similar to Linux strace) shows Go startup is littered with 20µs sleeps (and sometimes clear binary backoff) but these have negligible effect on cumulative runtime and mostly seem to be an implementation detail of Go channels.

    It looks like the tool initially reads some keys and data from disk, taking negligible time. Then it does a DNS lookup for .1password.com. So far we're still at 0.027s real time — comparatively reasonable.

    It then attempts to "connect" UDP sockets to (?)every DNS result it found? In my trace it attempts 3 IPv6 addresses (connect(2) fails for all as I don't have v6 routes), then attempts to connect to four IPv4 addresses (which all succeed). All of these are on the oddball port "9." This is UDP and no packets are actually exchanged, AFAICT. It's purely a test of what, routing tables? This may be a Golang implementation detail.

    Finally, op opens a TCP connection to :443 (HTTPS) on the first IPv4 address that was routable. The connect(2)-wait accounts for 73ms (0.073s) of our sleep time. If op had started this connection in parallel with the rest of the work it did prior, it would "only" be uselessly sleeping for 0.046s here. Still, we have something like 830ms of sleep unaccounted for. We do a bunch of useless local crap (OpenSSL? or does Go use its own TLS library?) that somehow takes 2 full milliseconds before writing our initial TLS frame to the agilebits server. 70ms later we get an event wakeup that the TLS socket has a response; again, inexplicably it takes us 2-3 ms just to read the response out of the kernel.

    Only after the initial TLS req/response sequence do we open and read /etc/ssl/cert.pem, which takes about 1ms. Then we proceed to spawn a ton of new threads, which do no IO (no filesystem or net interaction) for another 20ms. We re-open, read, and presumably parse any certificate / CA bundles we can find. This is only another 2ms on my system but reportedly a long time on systems with tons of custom certificates installed. All of this work could have been done in parallel at the beginning, while we were waiting on the initial connection!

    More new threads, more nothing. Notably it looks like threads wake each other in a daisy chain of 3 threads, each of which has some latency before the kernel schedules it; 110ms goes by doing nothing except inter-thread wakeups and sleeps. We create two more threads, because Go keeps telling people Goroutines are cheap.

    At ~326ms, we send our next response to the HTTPS server. 150 ms has elapsed since we last received a message from it. ~398ms we get a response; ~400ms we send a 93 byte request, a 233 byte request, ~401ms a 171 byte request, a 38 byte request, another 38 byte request. 71ms later, we get our wakeup for a response; we get a tiny 38 byte response; another 28ms goes by and we get two bigger ~6000 byte responses 2ms apart. We send 42 bytes and get a 7k response 70ms later. We send another 42 bytes, wait 10ms for no obvious reason, then send 100, 171, and 38 bytes in quick succession.

    247ms goes by before we get a response event wakeup and read another ~7k. 2ms pass; we send another 42 bytes. 9ms later we send another 131, 171, 132, 38, 131, 171, 38, 171, and 38 bytes (all individual syscalls over ~5ms). 90ms later, we get our recv event wakeup and read a ~5k response.

    We write another 42 bytes. 8ms later we get another read wakeup and read ~8k. We send another 42. 40ms pass before our event wakeup; we pull another 8k out of the kernel. 2ms later, another 7k; we send 42 again. 17ms pass before our next recv wakeup; another 7kB response. 2ms later, a 4kB response. We send 42 again. 1ms later, we read 2800 & write 42.

    10ms go by; we read another 5k, then 500µs later, 12k and then 4k. We respond with 42 bytes again three more times (must be a TLS ack). 40ms goes by before our read event wakeup. We read 20k, 8k; another 20ms goes by and we get an event wakeup. We read 9k, 4k. 5ms passes and we get another wakeup; we read 3k, 17k, another 4k. 33ms goes by. We read 32k, 20k, 2k, 6k, 2k. 20ms elapses. We read 8k, 3k; 8ms elapses; we read 2k, 19k, 4k. 37ms elapses. We read 35k, 33k, 9k, 3k, 18k, 6k, 6k; 19ms elapses; we read 2k, 3k, 3k; 5ms goes by; we read 5k, another 4ms, another 11k, 5k, 3k, 4k. 34ms elapses. We read 20k, 33k, 33k, 25k, 8k, 32k, 19k, 4k, 3k, 6k, 600B.

    17ms elapses. We read 3k, 7ms elapses, we read 6k. 3 more ms, another 6k, 8 bytes, 17k, 34k, 33k, 33k, 11k. 33ms elapses. We read 18k, 3k, 34k, 33k, 33k, 2k, 3k, 28k, 1k.

    We spawn several more threads...

    336ms passes with no IO/network activity. We send a/some requests to the server: 136, 171, 38 bytes. 93 ms elapses. We read 4k. 5ms elapses doing nothing. We send 143, 171, and 38 bytes. 92ms elapses. We read 2k. 2ms later we write() the JSON item contents to stdout. It takes another 5ms for all 33 threads to exit and the process to terminate.

    So in summary: we do some ridiculously inefficient net protocol, with tons of round trips, large amounts of bytes transferred, etc, all to fetch a single 700B JSON blob with a unique key identifier and a precached authentication token (op signin).

    SUMMARY

    I count a total of 1.28 seconds purely waiting for responses from the Agilebits HTTPS server; another 627 ms doing no IO activity at all; seven full round trips across the network; 758 kB recieved from the HTTPS server at an anemic average rate of 600 kB/s to fetch a single 700 B JSON blob.

    (Edit: This all may be a property of 1Pass's vault security design, which is fine; but every other 1Pass vault client (1PasswordX included) clearly caches encrypted vault contents locally and only requires signin once to access the local cache for some period of time. Access time for items, once signed in, is measured in 10s of ms or less. It is not absurd that the CLI needs to communicate with the server for 1-2 seconds once, but it is absurd that the CLI needs to communicate with the server for 1-2 seconds for every single vault-related action.)

  • I assumed that all data was saved locally and just synced with the server. Is that not true? Why is the CLI making any network requests?

  • That’s a great question @mwean . 758 kB could plausibly be the entire compressed contents of my vaults.

  • felix_1pfelix_1p

    Team Member

    @c__ : That was an amazing analysis, thank you! I'm afraid that a lot of the behavior seems to be internal to Go's standard library. We are not doing anything out of the ordinary. However, I have some ideas how we could provide people with a workaround if they have a lot of certs in the standard locations, without having to delete them. And of course reducing the number of network requests would generally be a good idea.

    @mwean: Since the CLI is rather a short living process, it works differently than other 1Password clients. As such, we don't plan to implement full syncing capabilities. However that is no excuse for not implementing some form of caching to reduce the number of network requests. It's on our road map.

  • invictus26invictus26 Junior Member

    @felix_1p I don't mean syncing with the CLI, I'm saying just pull the data from the local 1Password vault (which should already by synced). Then you wouldn't have any issue with network requests and latency.

  • felix_1pfelix_1p

    Team Member

    @invictus26: You mean if there is another 1Password client running on the device? We generally cannot assume that this data exists (op might run on a machine without another 1Password client). My first impression is also that the complexity of managing the interaction with the CLI and other local clients/vaults would be higher than just implementing some form of caching directly into the CLI. But I will definitely pass this idea along to the rest of the team, thank you!

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file