Feature Request: Diceware Password Generator

12346»

Comments

  • jpgoldberg
    jpgoldberg
    1Password Alumni

    Thanks all. It's only been four years! As you can imagine I am really happy to see this in place.

    How many words?

    The word list, at present, has 18436 items. Giving you 14.17 bits per word. So a four word pass phrase is 56 bits.

    The reason that I say "for now" is because it will probably lose a few words as certain taboo words that we missed in our first pass get culled. So far people have been amused by some of the peculiar combinations we've generated, but I expect that there will be some complaints that will require the removal of some words.

    The other thing that we are undecided about are words of one or two characters in length. We might want to get rid of those just to make sure that we never generate a password that is so short that its hash already exists in rainbow tables.

    But I expect the list to stay above 18000, so still just a tad over 14 bits per word.

    If you want to take a look at the list itself, you will find it on the Mac version (on which we still haven't wired this into any user interface) in 1Password 5.app/Contents/Frameworks/AgileLibrary-Mac.framework/Versions/A/Resources/AgileWords.txt

    Developing the list

    There were a number of decisions that went into putting this list together, and we experimented with a variety of these. This is not to say that our decision is final.

    • Limit to 8 characters

      Nothing over eight characters. Otherwise the pass phrases get too long.

    • Don't worry about prefixing
      You will find the words "white", "blue", cap", and "whitecap" on the list. But you will not find "bluecap". This means that if there is no separator between words there are two ways to get "whitecap" but only one way to get "bluecap".

      This means that the full strength of the system depends on people using some sort of separators between words.

    • Don't worry about phonetic distance
      For things that may be spoken, it is important to make sure that the things on your list don't sound alike. You wouldn't want "free" and "three" to both be used for something designed to be spoken. Or more extreme cases of things like "raise" and "raze".

      Again, we chose not to try to eliminate those. Yes, some people will use these passwords for things that may have to be spoken, but we didn't want to substantially shorten the list just for a relatively rare case.

    • Don't include inflectional morphology
      If we have "dog" (we do) should we also have "dogs"? This was tough. We decided not to, even though this decision shortened the list substantially.

      We did not want to wait to do usability studies on this, but the suspicion was that differently inflected forms of the same word would lead to error in remembering passwords. Ideally, we would like to see real studies of this and possibly revisit this decision if it turns out that people don't misremember "walks" and "walk" and the like.

    • Don't fill the list with unfamiliar words.

      One of base word lists that we played with was the list from the Summer Institute of Linguistics (SIL) 1991 list of about 100,000 English words. When limited to no more than 8 characters, and removing some taboo words, this produced an impressive list of 58782 words. (That would be nice 15.84 bits per word.)

      But as soon as we started playing with that list, we found that a substantial number of the words on the list were unfamiliar and difficult to spell. It's fine to have a passphrase that teaches you one new word, but do you really want stuff like accidies anear balladic xxiii?

    • Letters only

      The Diceware™ contains digits and punctuation. We wanted a list that (other than the separator) could by typed on a mobile keyboard without having to switch keyboards.

    • No taboo words

      I wouldn't mind having dirty words show up when I generate passwords. And I expect that a lot of people would also find that amusing. But remember that taboo language also includes racial or ethnic insults. So we tried to eliminate those by first getting a list of taboo words. We found a list of words that are used to flag comments on web forums, and it is a deeply unpleasant list to read.

    Now if only I could remember what list of words we actually started with. I believe it was the John the Ripper English dictionary, and so this should be credited somewhere, but I've been playing with words lists for so long that I haven't always kept the best track of their original source. So we've had various lists floating around for a while.

    Polyglossia

    As you probably know, 1Password for Windows has been doing this for a while. But other than our English language list, the other lists are from the Diceware™ lists for those languages. It would be nice to get this done for a range of languages, but particularly the step of removing taboo words will take considerable effort if lists of such terms aren't readily available.

  • benfdc
    benfdc
    Community Member
    edited September 2015

    Thanks for the info, Jeff, as well as for the pointer to the actual word list.

    I guess word lists are like PBKDF2 iterations—the bang for the buck diminishes as the numbers get larger. The AgileWords lexicon is well over twice the size of the Diceware™ lexicon, but the added length buys little in terns of passphrase strength—a tad over 14 bits of entropy per word versus a tad under 13 bits per word.

    Your discussion of taboo words reminds me of the situation in the Scrabble® world. The Official Scrabble Player's Dictionary that you can pick up in your local bookstore (if you still have one) is bowlderized because it is used in schools. An unexpurgated word list is used by serious North American Scrabble players, and I think you can find a list of the excluded words. If similar taboo word lists exist for foreign language Scrabble lexicons, that might help with localization efforts.

    I'm looking forward to 1P6/Mac, although perhaps a test UI will show up sooner in the 1P5.x betas.. As a bridge, I've filled a Secure Note with a few dozen AgileWords generated on my iPhone.

  • jpgoldberg
    jpgoldberg
    1Password Alumni

    @benfdc was spot on when saying:

    I guess word lists are like PBKDF2 iterations—the bang for the buck diminishes as the numbers get larger. The AgileWords lexicon is well over twice the size of the Diceware™ lexicon, but the added length buys little in terns of passphrase strength.

    Exactly. You have to double the the number of items on the list to get a 1 bit improvement.

  • rob
    rob
    edited September 2015

    The entropy bang for effort buck diminishes exponentially in the sense that it's harder and harder to add one bit of entropy, but remember the strength bang for entropy buck is increasing exponentially. A five-word AgileWords passphrase at almost 71 bits is approximately 75 times as strong as a five-word Diceware™ passphrase at almost 65 bits.

    So, computer abilities remaining the same, the strength to effort ratio is actually linear – if you double the size of the pool (word list), you double the average time it takes to guess a random selection from that pool. Problem is that computer speeds are increasing exponentially so that tips the balance back toward the crackers unfortunately and means that we will indeed have to add more words to passphrases as the years go by.

This discussion has been closed.