Decreasing randomness from Diceware

nopenotme · March 2014

Has anyone looked at how much the entropy of Diceware phrases is degraded by modifying the result of dice throwing? That is, if someone decides not to use a word because it is obscure, or if the words are rearranged to make them more memorable? I'm guess that doing anything other than throwing the dice and using the indicated words is introducing an element of human cognition (and hence, predictability). I know if I were working on cracking Diceware passphrases those would be among the first rules I'd build in, assuming that most people can't be bothered to use words they don't know, and to put verbs after nouns etc. Have I got this right?

khad · March 2014

It would probably be hard to estimate the actual degradation in entropy, but, yeah, the idea is to "surrender your pride and roll the die".

If the crackers are not already listing "easy" words first, they certainly ought to be if they want to be most efficient. I suspect @jpgoldberg‌ would have a more complete answer. It may not be until next week as he is tied up at the moment (thankfully not literally and not by any three letter agency), but I'll ping him on this. :)

jpgoldberg · March 2014

Hi @nopenotme!

One thing that we can encourage is that if the rolled word is obscure, that people will take a couple of minutes to familiarize themselves with the word. I've actually created one diceware-like passphrase from word lists of a number of languages, including languages I don't know. The meant that I had to spend a minute or so becoming familiar with some words that I didn't know.

For people rejected words, we can calculate the strength, easily. If, say, you reject 5% of the words on the diceware list, then you are in effect using a list of 7387 words. And so your entropy per (abridged) diceware word will be approximate 12.85 bits per word.

That assumes that when you roll a word you don't wish to use, you re-roll. If, on the other hand, you simply go down the list and pick the next acceptable word, then it gets much more complicated. The complication doesn't (only) come from calculating the Shannon entropy of the password distribution we get from such behavior, but it comes from the fact that the distribution is no longer a uniform distribution. Some possible passwords become more likely than others. Once this happens, entropy is no longer an appropriate measure of password strength.

So for that sort of behavior, calculating the Shannon entropy is messy but tractable. But actually calculating password strength becomes much much harder. I've proposed a way that that could be done, but there is hardly consensus on this.

As you see, I haven't even touched on the re-ordering question. I'm not sure how to model that behavior, and thus even begin to see how to calculate the weakening that would go with it.

Decreasing randomness from Diceware

Comments