First time using 1P CLI (0.5.4) - why is this so slow?

motak · October 2018

Hi,

I've just heard of 1P CLI and it sounds like it could resolve major pains for me, so thanks for making it!

However, I'm experiencing serious slowness: I timed op list items at 3.9 seconds (real), op get item <blah> at 4.4 seconds (real). For my use case, I would need this to be closer to 100-200ms at most. Is there a reason for this slowness? Is it specific to me or is everyone dealing with these speeds?

Thanks Ahead!

1Password Version: 0.5.4 (CLI)
Extension Version: Not Provided
OS Version: 10.14.1 Beta (18B50c)
Sync Type: Not Provided

cohix · October 2018

@motak Thanks for writing in. This is indeed something we've seen from several users.

This is something we're actively working on, keep an eye out for upcoming releases.

c__ · December 2018

This is something we're actively working on, keep an eye out for upcoming releases.

I've seen this response on several threads; is there an op announcement mailing list or RSS feed or anything like that I could sign up for to be notified of new releases? Thanks!

cohix · January 2019

@c__,

We do not have a mailing list at this time.

tangyouze · July 2019

I'm using version 0.5.7 and it is still SUPER slow, a simple get item requires 30s to complete. btw. my all vaults contains 2000 items. Will it be faster if I specific the vault?

λ date; .\op.exe get item 'gqw2xx4uxq***************; date
Mon Jul 15 17:56:02 CST 2019
{"uuid":"gqw2xx4uxqaizlleu3kt4icl*************************************************************

Mon Jul 15 17:56:34 CST 2019

cohix · July 2019

@tangyouze yes, adding the --vault flag will indeed speed that up a bit.

We have identified the performance bottlenecks and we have a plan to address them, it's just a matter of getting the bandwidth to implement those plans, which is proving difficult at the moment, but I promise we have not forgotten about this.

ARM77 · November 2019

Any progress on this issue? I'm using 1P to get credentials ("op get item") and it takes at least one second per credential:

$ time ./one_password_test.sh 3
real 0m3.176s
$ op --version
0.8.0

felix_1p · November 2019

@ARM77 , how are you fetching the item? If you regularly fetch a specific item (e.g. in a script) I suggest to fetch the item by UUID. That should be the fasted way.

ARM77 · November 2019

@felix_1p - It's only a marginal improvement. Database access (which is what 1Password is essentially) should be much faster:

$ echo $NAME : $UUID
DockerHub : wsdfnr73tzekdslkdtrb7h2kxyae

$ time op get item $NAME --vault=$VAULT > /dev/null

real 0m1.139s
user 0m0.112s
sys 0m0.049s

$ time op get item $UUID --vault=$VAULT > /dev/null

real 0m0.973s
user 0m0.111s
sys 0m0.049s

cohix · November 2019

@ARM77 have you updated to the 0.8 release yet?

ARM77 · November 2019

$ op --version
0.8.0

felix_1p · November 2019

@ARM77, thank you for the numbers! I will do some investigating but there is also a limit to our ability to improve performance here. While you are right that it's basically a database lookup, there are a number of things happening:

The CLI verifies that your current session is valid
The (encrypted) item is fetched from 1Password's servers
The item is decrypted
All network communication is additionally encrypted via SRP (https://support.1password.com/secure-remote-password/)

Fetching an item by UUID is more effective compared to fetching by name if you have many items in your vault.

mickael · January 2020

@Arm77 I discovered and reported to the support team that the CLI performance is direclty impacted in a huge manner by the number of certificates you have installed locally (From 0.5s to 7-8s per request).

My development workflow generates a lot of self-signed certificates that I trust in the keychain. I have to cleanup those certificates regularly to keep a decent CLI performance.

JeanMertz · January 2020

Just to add my 2ct; I came here searching for a solution to the general slowness of "op" as well.

I would like to use the CLI in my command-line workflow, especially for setting secret token variables when a new shell is started, but with it taking 2 seconds per request, that's too big of a disruption to my workflow.

I tried using both names and uuids, but for me on average, both take about 2 seconds per request. This is using op 0.8.0.

I've used other secret stores such as Vault by Hashicorp (https://www.vaultproject.io), and pass (https://www.passwordstore.org) which performed any request sub-second, usually in the 100ms range. I know it's not exactly comparing apples to apples, but especially the former definitely has similar characteristics to 1Password (encrypted data, encrypted traffic, database with secrets, remote data access).

I hope Agilebits can find the time needed to improve the performance of the CLI, even though I realise probably only a small part of their user-base actually uses the tool. Unfortunately the "This is something we're actively working on" comment from 15 months ago doesn't exactly inspire confidence in a timely solution, but one can hope.

c__ · January 2020

Ditto with op 0.8.0, listing items takes 1.17s real (but only 0.34s user time (CPU) and ~0 sys time). So it seems like the program is sleeping for an extra 0.8 seconds for some reason. Fetching a single item by UUID takes 1.3s real (longer than grabbing the entire list!) and still much less CPU time (0.3 user, 0.1 sys). So it also looks like 900ms are spent just sleeping. These numbers are repeatable; it's not like I have a stale disk cache or something.

DETAILS

truss (similar to Linux strace) shows Go startup is littered with 20µs sleeps (and sometimes clear binary backoff) but these have negligible effect on cumulative runtime and mostly seem to be an implementation detail of Go channels.

It looks like the tool initially reads some keys and data from disk, taking negligible time. Then it does a DNS lookup for .1password.com. So far we're still at 0.027s real time — comparatively reasonable.

It then attempts to "connect" UDP sockets to (?)every DNS result it found? In my trace it attempts 3 IPv6 addresses (connect(2) fails for all as I don't have v6 routes), then attempts to connect to four IPv4 addresses (which all succeed). All of these are on the oddball port "9." This is UDP and no packets are actually exchanged, AFAICT. It's purely a test of what, routing tables? This may be a Golang implementation detail.

Finally, op opens a TCP connection to :443 (HTTPS) on the first IPv4 address that was routable. The connect(2)-wait accounts for 73ms (0.073s) of our sleep time. If op had started this connection in parallel with the rest of the work it did prior, it would "only" be uselessly sleeping for 0.046s here. Still, we have something like 830ms of sleep unaccounted for. We do a bunch of useless local crap (OpenSSL? or does Go use its own TLS library?) that somehow takes 2 full milliseconds before writing our initial TLS frame to the agilebits server. 70ms later we get an event wakeup that the TLS socket has a response; again, inexplicably it takes us 2-3 ms just to read the response out of the kernel.

Only after the initial TLS req/response sequence do we open and read /etc/ssl/cert.pem, which takes about 1ms. Then we proceed to spawn a ton of new threads, which do no IO (no filesystem or net interaction) for another 20ms. We re-open, read, and presumably parse any certificate / CA bundles we can find. This is only another 2ms on my system but reportedly a long time on systems with tons of custom certificates installed. All of this work could have been done in parallel at the beginning, while we were waiting on the initial connection!

More new threads, more nothing. Notably it looks like threads wake each other in a daisy chain of 3 threads, each of which has some latency before the kernel schedules it; 110ms goes by doing nothing except inter-thread wakeups and sleeps. We create two more threads, because Go keeps telling people Goroutines are cheap.

At ~326ms, we send our next response to the HTTPS server. 150 ms has elapsed since we last received a message from it. ~398ms we get a response; ~400ms we send a 93 byte request, a 233 byte request, ~401ms a 171 byte request, a 38 byte request, another 38 byte request. 71ms later, we get our wakeup for a response; we get a tiny 38 byte response; another 28ms goes by and we get two bigger ~6000 byte responses 2ms apart. We send 42 bytes and get a 7k response 70ms later. We send another 42 bytes, wait 10ms for no obvious reason, then send 100, 171, and 38 bytes in quick succession.

247ms goes by before we get a response event wakeup and read another ~7k. 2ms pass; we send another 42 bytes. 9ms later we send another 131, 171, 132, 38, 131, 171, 38, 171, and 38 bytes (all individual syscalls over ~5ms). 90ms later, we get our recv event wakeup and read a ~5k response.

We write another 42 bytes. 8ms later we get another read wakeup and read ~8k. We send another 42. 40ms pass before our event wakeup; we pull another 8k out of the kernel. 2ms later, another 7k; we send 42 again. 17ms pass before our next recv wakeup; another 7kB response. 2ms later, a 4kB response. We send 42 again. 1ms later, we read 2800 & write 42.

10ms go by; we read another 5k, then 500µs later, 12k and then 4k. We respond with 42 bytes again three more times (must be a TLS ack). 40ms goes by before our read event wakeup. We read 20k, 8k; another 20ms goes by and we get an event wakeup. We read 9k, 4k. 5ms passes and we get another wakeup; we read 3k, 17k, another 4k. 33ms goes by. We read 32k, 20k, 2k, 6k, 2k. 20ms elapses. We read 8k, 3k; 8ms elapses; we read 2k, 19k, 4k. 37ms elapses. We read 35k, 33k, 9k, 3k, 18k, 6k, 6k; 19ms elapses; we read 2k, 3k, 3k; 5ms goes by; we read 5k, another 4ms, another 11k, 5k, 3k, 4k. 34ms elapses. We read 20k, 33k, 33k, 25k, 8k, 32k, 19k, 4k, 3k, 6k, 600B.

17ms elapses. We read 3k, 7ms elapses, we read 6k. 3 more ms, another 6k, 8 bytes, 17k, 34k, 33k, 33k, 11k. 33ms elapses. We read 18k, 3k, 34k, 33k, 33k, 2k, 3k, 28k, 1k.

We spawn several more threads...

336ms passes with no IO/network activity. We send a/some requests to the server: 136, 171, 38 bytes. 93 ms elapses. We read 4k. 5ms elapses doing nothing. We send 143, 171, and 38 bytes. 92ms elapses. We read 2k. 2ms later we write() the JSON item contents to stdout. It takes another 5ms for all 33 threads to exit and the process to terminate.

So in summary: we do some ridiculously inefficient net protocol, with tons of round trips, large amounts of bytes transferred, etc, all to fetch a single 700B JSON blob with a unique key identifier and a precached authentication token (op signin).

SUMMARY

I count a total of 1.28 seconds purely waiting for responses from the Agilebits HTTPS server; another 627 ms doing no IO activity at all; seven full round trips across the network; 758 kB recieved from the HTTPS server at an anemic average rate of 600 kB/s to fetch a single 700 B JSON blob.

(Edit: This all may be a property of 1Pass's vault security design, which is fine; but every other 1Pass vault client (1PasswordX included) clearly caches encrypted vault contents locally and only requires signin once to access the local cache for some period of time. Access time for items, once signed in, is measured in 10s of ms or less. It is not absurd that the CLI needs to communicate with the server for 1-2 seconds once, but it is absurd that the CLI needs to communicate with the server for 1-2 seconds for every single vault-related action.)

mwean · January 2020

I assumed that all data was saved locally and just synced with the server. Is that not true? Why is the CLI making any network requests?

c__ · January 2020

That’s a great question @mwean . 758 kB could plausibly be the entire compressed contents of my vaults.

felix_1p · January 2020

@c__ : That was an amazing analysis, thank you! I'm afraid that a lot of the behavior seems to be internal to Go's standard library. We are not doing anything out of the ordinary. However, I have some ideas how we could provide people with a workaround if they have a lot of certs in the standard locations, without having to delete them. And of course reducing the number of network requests would generally be a good idea.

@mwean: Since the CLI is rather a short living process, it works differently than other 1Password clients. As such, we don't plan to implement full syncing capabilities. However that is no excuse for not implementing some form of caching to reduce the number of network requests. It's on our road map.

invictus26 · January 2020

@felix_1p I don't mean syncing with the CLI, I'm saying just pull the data from the local 1Password vault (which should already by synced). Then you wouldn't have any issue with network requests and latency.

felix_1p · January 2020

@invictus26: You mean if there is another 1Password client running on the device? We generally cannot assume that this data exists (op might run on a machine without another 1Password client). My first impression is also that the complexity of managing the interaction with the CLI and other local clients/vaults would be higher than just implementing some form of caching directly into the CLI. But I will definitely pass this idea along to the rest of the team, thank you!

felix_1p · June 2020

Just as a follow up here for anyone running into this issue: At least on *NIX systems (from what I can tell not on macOS), you can overwrite were Go reads TLS certificates from via the SSL_CERT_DIR and SSL_CERT_FILE variables.

If you have a lot of (irrelevant) certificates the default cert location but you also have a single .pem file with all root CAs, then you can do something like this:

SSL_CERT_DIR=/dev/null SSL_CERT_FILE=/etc/ssl/cert.pm op ...

Or of course you can point SSL_CERT_DIR to a directory that contains root CAs but not self-signed CAs.

First time using 1P CLI (0.5.4) - why is this so slow?

Comments

DETAILS

SUMMARY