Password with real words (like Diceware) really safe?

khad · August 2013

Having an "avoid L and M" checkbox in a diceware-ish generator sure sounds odd, but if it were there I might well use it!!

I really like this idea. I can't imagine it having much more of an effect on entropy than the "Avoid ambiguous characters" option.

benfdc · August 2013

"Avoid ambiguous characters" has little effect on random alphanumeric passwords, but I suspect that striking L and M would make a noticeable dent in the Diceware word list (L is the sixth most common consonant in the English language). A decent perl programmer (which I am not) could probably give you the exact figure in a minute or two.

jpgoldberg · August 2013

I definitely see the benefits of an "avoid 'L' or 'M'" option for exactly the reasons you mention, @benfdc. I would probably use it myself.

But that needs to be balanced out by an effort to keep options to a minimum. It's easy to say, "well, it's just one more advanced option, people can ignore it if it isn't meaningful to them", but unfortunately it is so easy to say that, we could end up with a page full of scores of options. So we tend to set the bar fairly high for including options.

It is a cool idea and nice observation. I'm certainly not ruling it out, but I guess I'm trying to "manage expectations". Lots of great suggestions for useful options don't get incorporated.

One thing that is sort of connected to this is the choice of keyboard layout. Right now our "pronounceable passwords" are not localized. No matter what language you use 1Password with, "pronounceable" means pronounceable in English.

Building up a phonotactic database for a bunch of different languages, particularly when we also want to keep these to 7bit ASCII, is something that some of us would enjoy (well, I know that I would, but my background is in Linguistics). But it's not the kind of things that is a particularly high priority. Our English one is ad-hoc, but if we were to do this systematically, we would need to devise an over-all scheme for this. Markov might initially seem like a useful way to go, but I suspect that they will fail badly for languages that involve vowel harmony (Finish, Hungarian, Turkish, etc), where what vowels you have toward the end of a word depend on the kinds of vowels you've had at the beginning.

OK, I'm rambling. Time to get back to trying to write usefully about entropy and passwords.

Cheers,

-j

–-
Jeffrey Goldberg
Chief Defender Against the Dark Arts @ AgileBits
http://agilebits.com

poyntesm · August 2013

One thing about diceware based passwords that confuse me .. some words are different lengths. ie. some are 3 characters and some are 6. Does this not impact the entropy .. I often see the entropy for 4 or 5 or 6 words in a diceware scheme. But since different words are different length and each letter adds the extra bits of entropy are the shorter words not less secure.

I personally discarded 2 words from my diceware select as they were short, was this need or misguided?

khad · August 2013

Good question! The entropy is calculated based on the entire word. Each word has 12.925 bits of entropy regardless of its length since if an attacker knows the system (which we always assume he does) he will not be trying every letter but every word in the Diceware word list.

It's best to randomly choose words and use the ones that are randomly chosen regardless of their length, but as long as you randomly chose new ones (the randomness is the important part) it shouldn't really do any harm.

I hope that helps. Please let me know. :)

Cheers!

poyntesm · August 2013

Ah a penny or dice just dropped when working through the maths .(I think) . The reason I dropped a 3 character word was it had a character not on the default iPhone keyboard. I.e a numerical character

This was to ease entry on my iPhone obviously but it was mean the entropy of the word totally changed as in reality the list I chose from has shrunk!! Did nit think about that. I wonder how many words from the list are ONLY alpha characters. Then i could compare that entropy to a brute force of only alpha chats. And how many words based on that list would you need to make same entropy as other list.

Maybe a list of same length with only words of alpha characters at same lengths as list today would be good for iOS users. Or maybe there are not enough short words to easily do that before its effectively just a string of the alphabet.

In which case it might be needed for some to have lets say a 7 or 8 words password from the shorter list to match a 5 or 6 words from the full but here entry will be much easier on iOS. I HATE switching keyboard layout on IOS. Bring back webOS ;-)

poyntesm · August 2013

Or interesting idea. Replace all the words with none alpha character with another alpha only word but this time it has a capitalisation on one letter, then can we keep same overall entropy bits? As list length is same.. Hmm guess though possible inputs per character in now 52 v's what is it 97 for full keyboard

But overall I think some guidelines/system for iOS users wanting to not switch keyboard view would be useful to reference. I.e what bits could be achieved if we used existing list but only pure alpha chars and rolled dice to randomly capatilse 1 char per word. Sorry for 2nd post right after first.

poyntesm · August 2013

$ egrep '[[:punct:]]|[[:digit:]]' words.txt | wc -l
365

Seems only 365 of the full diceward password list have any non alpha (from http://world.std.com/~reinhold/diceware.wordlist.asc) so we have then a list of 7411. Again my maths could easily be wrong but a character chosen from a-ZA-Z has 5.7 bits but I am struggling to link the two to give me the overall strength of this list

log base 2 of 7411 is 12.8 so rounding up do we still get 13 bits per word??

poyntesm · August 2013

Love to get @jpgoldberg comments to above...

jpgoldberg · August 2013

Hi,

I've got this side project of producing new, alpha-only, diceware lists. Unzip the zipfile and read the instructions.

Anyway, you are correct that as long as you are picking words by rolling dice or using a good random number generator, then the entropy per word will be log_2 (number of possible words), and so your calculations are correct.

And yes, each character drawn (uniformly) from [a-zA-Z] will add about 5.7 bits. Note that randomly mixed case is not going to be great on an iPhone keyboard.

There is nothing saying that you have to use the "official" Diceware list. You can use a word list of your own devising. The Diceware list was construct for people who didn't want to use any software or computer for picking, and so it was set with a number that worked well with typical (six sided) dice.

If you've got some 8-sided dice, then you can construct a list of 32768 words (many will be long and unfamiliar) that you get with 5 rolls giving you exactly 15 bits per word. Or could could use 4 rolls per word, giving you exactly 12 bits per word, but each word will be short (drawn from a list of only 4096 words). Using five such words gives you a total of 60 bits, which seems like a good target for a Master Password.

The calculation for using a system with other dice (D12, D20, etc) is left as an exercise to the reader.

poyntesm · August 2013

So a D12 would construct a 248832 word list ..and 5 rolls giving you 17.92 bits per word. As such 4 words chosen with 5 rolls per word from the list would give you 71.68 bit.
So a D20 would construct a 3200000 word list ..and 5 rolls giving you 21.60 bits per word.

Right?

RodD · September 2013

If a non-math guy can butt in here... I follow the logic of calc-ing how long it will take, but sometimes the implication is that every attempt to crack a password will take that long. Isn't that the max time it would take? Isn't it altogether possible for a cracker to hit the right combination very quickly by chance? If so, then aren't we are just trying to decrease the probability of such an event? I wonder if there is a false sense of absolute security if that's not noted appropriately. Or am I missing the point?

Second, I follow the reasoning for your recommended system, and I realize that you acknowledge there are other systems that may work. One that I've never heard you evaluate (though the brief allusion to Shakespeare above might be similar--which is why I am posting to this thread) runs like this. Take a lengthy passage of text that one has memorized and use the first letter of each word. Combine that with some coded reference to the text (e.g., book and chapter nmbr in the Tolkien corpus, or Shakepeare, etc enables adding digits and symbols. If the text is 30 words long, one ends up with a 30+ string of unrelated characters, but one can spin out the length as long as needed and it's still easy to remember--the goal of a 1P master password. I suppose the first objection is that it not as random as dice, but is this sort of pattern really used by crackers? (I honestly have no clue in that regard.) Given the almost unlimited array of texts from around the world (even just in English), is there any credibility to the approach? And how would various lengths compare with the entropy (is that the right term here?) of the Diceware system?

Perhaps a foolish question, but I've learned a lot from the blog posts on the subject. Having used 1P for the last year has transformed my (formerly bad) password habits. Thanks.

Rod (the non-math guy)

jpgoldberg · September 2013

If a non-math guy can butt in here... I follow the logic of calc-ing how long it will take, but sometimes the implication is that every attempt to crack a password will take that long. Isn't that the max time it would take? Isn't it altogether possible for a cracker to hit the right combination very quickly by chance?

Hi @RodD, you sound like a math guy to me.

When people report estimates of crack time, they should present the average crack time. But when we talk about strength, we tend to talk about the total number of possibilities. So you will often see people shift back and forth. You should look for phrases like "go through all of the possibilities" or "mean crack time".

For the moment, lets just stick to things where each possible password is no more or no less likely than any of the others. Suppose we have a four word Diceware password. There are 7776^4 possibilities (which I'm going to write as 2^51 or "51 bits"). Each of these (if this were done properly, like with dice) will be just as likely as any other.

After going through half of them, 2^50, you've got a 50% chance of getting the right one. Now lets see how many you have to go through to get a 1% chance. That would be about about 2^45 or 36 trillion. For the 50% case, I just divided the total number of possibilities by 2. For the 1% case, I just divided by 100.

So yes, we are going for the probabilities. But they are easy to calculated. We can easily calculated the probability of a "hit" after any given number of guesses. We can also calculate how many guesses are need to achieve any given probability of a hit.

When some passwords are more likely than others

When some passwords are more likely than others, then we can't do that simple division. Imagine a system where people flip a coin. If it comes up heads they use "password" as their password. If it comes up tails they use the 1Password Strong Password Generator to create a 23 character password.

An attacker is going to guess the password on the first try in 50% of the cases. Note that here the "average" number of guesses that attacker needs is actually really high; half the time, they will need a really huge number of guesses. So the average number of guesses really isn't a good measure of the strength of this. (Also, the "entropy" of that system isn't a good measure either.) I actually have a far too technical talk on this stuff, which you can watch if you are really bored.

Basically, once an attacker has some way to figure which sorts of things to try first, then the system is enormously weaker. This is why I so strongly recommended doing something like rolling dice.

Take a lengthy passage of text that one has memorized and use the first letter of each word. [...] but is this sort of pattern really used by crackers?

Crackers have developed such systems. Some have built datasets based on dictionaries of quotations. Others have just collected enormous quantities of textual data to build these sorts of things.

So if people started using such a system it would be exactly the sort of pattern that crackers would use. As I said, they have already developed rules sets for cracking such passwords, but at the moment they rarely waste time on such guesses as – at the moment – not very many people use such systems.

One point I tried to make in the article is that the cracking are adaptable to whatever system people are using. So we have to build our systems to be strong against tools that are designed to beat our system.

And one thing to consider about the system that you described is not all letters are equally likely to appear in it. "y" doesn't show up as a first letter of a word nearly as often as "t" does. So an attacker can use this as part of her system to figure out what sorts of things to try first.

Your scheme is probably going to work well for you for some time. But it is going to get worse the more people use it.

Attacks built for Diceware

It's also true that pretty much every tool kit out there has looked at Diceware. Those rulesets exist as well and are ready to go as soon as crackers figure that enough people use Diceware for it to be worth the effort.

But unlike every single other one of these schemes, Diceware is designed with that in mind. Its security does not hugely degrade if everyone starts using it. Pretty much all other schemes that people have proposed do fall apart if a substantial portion of people use them.

I like to give advice that remains good advice even on the off chance that people follow it.

Cheers,

-j

poyntesm · September 2013

@jpgoldberg .. you say about Diceware "...It's security does hugely degrade if everyone starts using it..." ?? Is that correct or do you really mean it does not? I thought here the key difference is it make no difference if the attacker knows what we are using, the security from dicewar is in the truely randomness and the math behind it.

khad · September 2013

@poyntesm: You are absolutely correct. I've edited Jeff's post above to fix that. ;)

Diceware does not hugely degrade if everyone starts using it. That's one of the reasons we recommend it.

Thanks for catching that.

jpgoldberg · September 2013

Thanks, @poyntesm and @khad.

Leaving out a "n't" in a crucial sentence can hugely degrade the quality of my advice.

dougsko · October 2013

If anyone is looking for an easy way to use Diceware without having to carry around the word list with them, check out the FOSS Diceware app on Google Play. The source code is available on GitHub.

jpgoldberg · October 2013

There is also https://www.xkpasswd.net/c/index.cgi

Obviously you shouldn't use a password generated over the web, but you can download the perl module source. Note, the last time I looked at the source (a while back) it did not use a cryptographically appropriate random number generator. Personally, I actually enjoy rolling the dice. But I'm weird that way.

Uno_Lavoz · November 2013

@jpgoldberg You're not weird for enjoying the satisfying "rrrrrrrrrrr...pop...clunk..." of rolling dice. You might in fact be a re-incarnated pirate. A well-known fact of history that I just made up is that pirates used to enjoy rolling dice so much that they'd carry several with them in pouches at all times. Just before a raid on some unsuspecting merchant vessel, they would all stuff their mouths full of dice and grin widely, letting their dice spill out onto the ship's deck. Whoever got the highest dice roll got to pick their loot before everyone else! It's an astonishing fact of forgotten history... and that made up fact is why I think you may in fact be a pirate. The more you know. The end.

benfdc · November 2013

Probably a Pastafarian too.

jpgoldberg · November 2013

A well-known fact of history that I just made up is that pirates used to enjoy rolling dice so much that they'd carry several with them in pouches at all times. Just before a raid on some unsuspecting merchant vessel, they would all stuff their mouths full of dice and grin widely, letting their dice spill out onto the ship's deck. Whoever got the highest dice roll got to pick their loot before everyone else! It's an astonishing fact of forgotten history... and that made up fact is why I think you may in fact be a pirate.

I have been outed!

Nancy G · November 2013

I'm not a geek, so I'm warning you that this is not a geek question/comment. So don't laugh!

I was trying to explain diceware to a friend when it occurred to me that the reason diceware passphrases can't be easily cracked is because once you figure out the method used to create a password, you're half way there. In other words, the reason diceware passphrases can't easily be broken is that randomness isn't a method.

Does that sum it up? Did I get it right?

You may now laugh at me.

benfdc · November 2013

I’m an amateur geek. The way I would put it is that figuring out the method used to create a password usually gets you half way there, but that methods which incorporate a sufficient amount of randomness (which in the case of diceware means a sufficient number of words) are exceptions to the rule.

To make that sentence geekier, substitute entropy for randomness. To go completely off the rails, toss in a reference to Kerckhoffs’s principle.

But if your way of putting it gets the idea across to the people you are talking to, I say stick with it!

jpgoldberg · November 2013

I think that you've got the right intuition, Nancy.

The strength of diceware comes from the fact that it doesn't use people as a source of randomness. It uses dice.

The weakness of alternatives is as you say. Once the attacker knows what (human) system you used, then the attacker can exploit the non-random stuff that people do.

As Ben has pointed out, diceware doesn't get weaker if people know what system you used.

One way that I like to put it is that the advice we offer should remain good advice even if people actually follow it.

benfdc · November 2013

One way that I like to put it is that the advice we offer should remain good advice even if people actually follow it.

And even if the bad guys know that people actually follow it.

jpgoldberg · November 2013

The second half is simply that people are really bad at being random, even when they think they are.

One thing is that if you ask people to pick words at random, they will tend to pick (concrete) nouns more often then they really should. (Yes, nouns are the most common word category, but people seem to draw nearly exclusively from nouns when you ask for "random words".) There are other things that are characteristic of human choices, some of which may not have been discovered yet. So this is why for the best passwords we need to remove the human, as much as possible, from the part of the process that is supposed to be "random".

pervel · April 2014

I think Diceware is a brilliant idea... in theory... but somewhat less so in practice. The problem is related to the predictability and "dishonesty" of humans. How many of you who have used Diceware actually settled for the first 4 or 5 words that your dice-rolls selected? Honestly? Really honestly? ;)

I suspect many people will roll the dice a few extra times if they get a word they don't "like" so to speak. If a lot of people do this, they have now introduced predictability that a hacker can exploit and the passphrase could be quite a lot easier to crack.

The problem is exacerbated by the fact that the original Diceware word list contains many strings that aren't actually words such as aa, aaa. aaaa, etc. I think that is a really, really bad choice and I don't understand why the original creator of the list didn't make sure to include only proper words. Surely, there are enough to choose from.

Now some would perhaps reply that people who don't follow the "rules" only cheat on themselves. That definitely true. But the entire idea with a system like Diceware is to assist us fallible and very predictable humans in making strong, memorable passwords.

There is no easy way to solve this problem. But improving the word list would go along way. And of course, improving people's understanding of why they should follow the rules of the game precisely.

benfdc · May 2014

@pervel wrote:

I suspect many people will roll the dice a few extra times if they get a word they don't "like" so to speak.

This issue is not unique to Diceware-style passwords. I think of myself as more security-minded than most, but I can and do hit the "regenerate" button for pronounceable passwords and even random passwords (e.g. looking for something that will be easier to type).

FWIW, if I roll a Diceware passphrase that contains a word or two that I don't like, I like to substitute a longer related word, usually by appending a few extra letters. I figure this is just as likely to make the passphrase stronger as to make it weaker.

This could be formalized by overloading the word list—giving the user a choice of several words for each roll of the dice.

—Ben F

DavidB · May 2014

jpgoldberg wrote:

I definitely see the benefits of an "avoid 'L' or 'M'" option for exactly the reasons you mention, @benfdc. I would probably use it myself.

Has @benfdc's post been deleted? I am curious to know his ideas about an "avoid 'L' or 'M'" option can't seem to find what he said.

David

DavidB · May 2014

@pervel,

So far I haven't had a problem using the exact words selected by the dice. Also, the non-word strings haven't bothered me. I have found it easy enough to incorporate them into my mnemonic story line.

As far as rejecting a word and rolling the dice again for a new one, since the reasons for rejection could be manifold, as long as it wasn't overdone--to produce, say, only grammatically correct sentences--would it necessarily affect the entropy significantly?

David

Password with real words (like Diceware) really safe?

Comments

When some passwords are more likely than others

Attacks built for Diceware