Diceware: Ordinary dice vs casino dice

DavidB
DavidB
Community Member

I am a big fan of Diceware for choosing a 1Password master password, as mentioned here: blog.agilebits.com/2011/06/21/toward-better-master-passwords/.

On the Diceware Passphrase Home Page, the author recommends using inexpensive, ordinary dice and says that casino dice are "overkill for this purpose."
world.std.com/~reinhold/diceware.html

Comments?

Thank you,

David

Comments

  • khad
    khad
    1Password Alumni

    I agree with what Reinhold says in the FAQ (emphasis added):

    Casino dice are precision made, translucent dice for use in gambling establishments. The added uniformity over toy dice is probably not significant for creating passphrases, but might be important if you want to use dice to directly generate random numbers for statistical purposes.

    There is some more interesting information about casino dice in the FAQ, but that's the useful part. :)

  • DavidB
    DavidB
    Community Member

    khad wrote:

    I agree with what Reinhold says in the FAQ (emphasis added)

    Yes--I read that, but how can you know that the bias of toy dice is not significant if you don't know what it is?

  • hawkmoth
    hawkmoth
    Community Member

    I'm going to speculate without a lot of background here, but I would assume that if toy dice are not quite as uniform as casino dice, the non uniformity will vary from one sample to the next. If the cracker of Diceware passwords doesn't have the actual die you used to construct your master password, s/he will not know what bias, if any, went into your creation. And averaged over many toy dice, each with its own peculiar irregularities, the net result will be random over many databases. So the cracker can't do any better than random anyway.

    Makes sense to me, anyway. I'll be happy to be enlightened if I've gotten lost.

  • DavidB
    DavidB
    Community Member

    @hawkmoth,

    I thought it might be something like that too, although I am still trying to get my head around the concepts involved. I trust everything Arnold Reinhold says about this, but would just like to understand the reasoning behind it.

    Some further questions I have are whether it matters how many dice you use and whether the typical manufacturing process results in, as you say, each die having its own peculiar irregularities or the same irregularities as others in the same lot.

  • [Deleted User]
    [Deleted User]
    Community Member

    You should always assume the attacker has access to your dice, or exact copies of them. Therefore it is very important that they can produce the largest number of dicewords. And remember to buy them in a store that you have randomly selected on a map, or else an attacker can intercept the delivery and replace the real dice with fake ones.

  • hawkmoth
    hawkmoth
    Community Member

    @Xe997, good one! I like that.

    I am sure thieves do not have access to the die I used to generate my master password. I don't know where it is myself.

  • jpgoldberg
    jpgoldberg
    1Password Alumni

    Hmm. Let's assume that the attacker does not have physical access to the dice that you used. Let is also assume that you used one biased die. This is the situation in which the bias is easiest to exploit. After if you used five dice, there is no particular reason to believe that the die that contributes the first digit for one word will do so for the others.)

    So then we have to ask with an unknown bias in the one die used, how biased does it need to be to reduce the search space by, say, 1 bit.

    Let's also assume that the bias is pair based. That is, a bias toward rolling a 6 is matched by an equal and opposite bias against rolling the opposite side, in this case a 1. So 1 v 6, 2 v 5, and 3 v 4.

    OK. That at least defines the problem. Now the question is setting it up formally and solving it. Not something I'm going to do before dinner. So perhaps I'll be back later with some way to advance on this.

  • hawkmoth
    hawkmoth
    Community Member
    edited January 2014

    So, when you come back from dinner, I'm going to bet that unless the thief knows the exact bias of whatever die or dice I used, the difference in chances of breaking into my data are so small as to be negligible. Otherwise stated, the difference between using casino dice and toy dice is trivial and not worth worrying about. Fun to think about, though.

  • jpgoldberg
    jpgoldberg
    1Password Alumni

    Oh, I agree that the difference in practice will be too small to be of use to an attacker, but for other kinds of things even small biases in random number generation can have a real impact.

    (I still haven't done the math on this yet)

  • hawkmoth
    hawkmoth
    Community Member

    @jpgoldberg, I've done no math either, but avoiding such biases is a reason to avoid random number generators provided by computers. The computer generators of random numbers are properly referred to a pseudorandom number generators for that reason. But even so, modern pseudorandom number generators are actually very good. Just not perfectly random.

  • jpgoldberg
    jpgoldberg
    1Password Alumni

    avoiding such biases is a reason to avoid random number generators provided by computers. The computer generators of random numbers are properly referred to a pseudorandom number generators for that reason

    Pseudo-random number generators (PRNG) are not called "pseudo" because of bias. Indeed most genuine sources of random numbers have more bias than PRNGs. Genuine sources (radioactive decay for example) tend toward a normal distribution which have a strong bias toward their mean.

    Instead PRNGs are called "pseudo" because they are computed. And computation alone can never produce genuine random sequences.

    PRNGs can be "cryptographically appropriate". Pretty much every cryptographic system, including 1Password, relies on some CSPRNG (Cryptographically Secure PRNG).

    Defining a "Cryptographically Secure"

    Roughly speaking, a random sequence of bits is cryptographically secure iff there is no "efficient" way for an attacker who knows every bit in the sequence except one of them to be able to guess at that one remaining bit with a better than 50/50 chance of getting it right.

    Deterministic PRNGs

    There are two kinds of CSPRNG. There is the "Continuously reseeded" kind and the "Deterministic" kind. The deterministic kind is set up so that when it is given the same key (or "seed") it will produce the same pseudo-random sequence each time. A different seed (or key) is needed to get a different PR sequence. In cryptography Deterministic PRNGs are used for stream ciphers. 1Password does not use these.

    Dual_EC_DRBG

    The now infamous Dual_EC_DRBG is a deterministic PRNG. (The "D" in "DRBG" stands for "Deterministic") was constructed in such a way that if you new a particular relationship between two parameters in its design, then once you had about 30 bits of the stream, you could predict the rest, even without knowing the key. There is no "efficient"
    way to calculate the secret relation. But if you have the power to select those two parameters you can actually pick one of them and your secret to derive the second (public) parameter.

    So when the parameters for Dual_EC_DRBG were picked, it would have been possible to construct them as to provide a back door. It was shown that this was a possibility in 2007. It was revealed that this indeed happened in 2013.

    Of course there are other Deterministic RNGs that are not vulnerable to such construction.

    Reseeded PRNGs

    The continuously reseeded PRNGs also take seeds, but fresh seeds are pumped into them all of the time. The seeds are derived from a variety of different things, including clock jitter, disk seek times, timing of network events, and in some modern computer chips specially designed circuits that produce noise from quantum effects. These are constantly being added to an "entropy pool", but because of the bias (they are not uniformly distributed) a cryptographic hash function is used to smooth them out. So that smoothed out entropy gets fed into the PRNG to mix with the current state of the PRNG.

    Because there is no purely computational relationship between one part of the sequence and another part of the sequence, these are considered to be much stronger, even if the attacker knows or controls a portion of the information that went into the seeding.

    This system is what used on most modern operating systems to provide cryptographically secure PRNGs. So everything from cryptographic keys and initialization vectors to the random numbers behind the Strong Password Generator come from those sources using a Continuously reSeeded PRNG.

  • hawkmoth
    hawkmoth
    Community Member

    Wow! Thanks for that discussion @jpgoldberg.

    A long time ago in my past I used random numbers generated by a desktop computer to do some biological simulations. Each time I started a run, I would draw a seed from a little container full of cells cut out from a table of random numbers from a book of statistical tables. That was to be sure I didn't bias the chosen seed. Now I am not sure if that actually helped. But in any case, security wasn't my issue. And I clearly had a very superficial understanding of what pseudo-random numbers represent. I doubt that mattered in my application, but thanks for correcting me here, where there is more at stake.

  • jpgoldberg
    jpgoldberg
    1Password Alumni

    @hawkmoth, you are absolutely right that different purposes require different properties of PRNGs. Non-cryptographic uses don't require cryptographically secure PRNGs.

    For a Monte Carlo simulation you do need some more important statistical properties. Also by being able to pick your seed in those cases gives you the ability to publish "reproducible" runs. One thing is that some PRNGs that ran on desktop PCs had a problem in that they alternated between even and odd numbers. So if you used them for flipping a coin, you would just get a sequence of "...HTHTHTHTHTHTHTHTHT..." which might make it unsuitable for simulations (or games). So even though you don't need a CSPRNG for those purposes, you do still need a reasonable one. I can't recall which deployed systems had that problem.

    A lot of early crypto systems (and some modern ones) were badly broken because people used inappropriate PRNGs. In other cases, people used good CSPRNGs but used a very narrow range of possible seeds (for example, limiting seeds to numbers between 0 and 32767.) Quite simply, there are lots of way to get this stuff wrong. But again, if you aren't designing cryptographic systems, you are off the hook in having to worry about most of those.

  • ethansisson
    ethansisson
    Community Member

    What about using this as an alternative to physical dice? https://www.random.org/dice/?num=6 Is RANDOM.ORG "random" enough for the purposes of Diceware?

  • hawkmoth
    hawkmoth
    Community Member

    @ethansisson - I guess I'd be interested in the answer to your question too, but the directions for using Diceware explicitly say to use real dice, not a computer. Besides, think of all the retro fun you have sitting down with a die and a piece of paper to generate a highly secure, high tech, modern, strong password for use in a digital application! :D

  • jpgoldberg
    jpgoldberg
    1Password Alumni

    @ethansisson,

    random.org is emphatically not appropriate for creating passwords or secret keys.

    The rest of this comment kind of got away from me. But do pay attention to the line above.

    What is random.org good for

    You can use a computer random number generator if you wish. The original idea of Diceware was if you didn't want to trust any software or hardware.

    random.org is fine for random numbers used for non-secret stuff. So in cryptography it might be appropriate for things like salts or initialization vectors. This is because an attacker may have knowledge of what numbers you are getting from random.org.

    However, I wouldn't recommend it for those purposes either because even if those numbers aren't secret, we don't want an attacker to be in a position to select those numbers for you. Quite simply your communication with random.org is neither private nor tamper-proof.

    Picking random numbers on your computer

    If you would like to use your own computer, and you've got the diceware list, you can just pick numbers between 1 and 7776 to get the right line number.

    It is actually harder to pick a random number in a particular range than a lot of people imagine. (There are a surprising number of bad uses of random number generators out there.). So even if you have a good random number generator, it is easy to use it poorly.

    Everyone using a Mac has access to a good random numbers. I'm going to work with openssl rand instead of reading from /dev/random because it's easier to convert it into a decimal number in the shell. (I'm not saying that there isn't a way to do this on Windows, I just don't know how.)

    So much bad advice

    The mistake that people make is illustrated by this example. Suppose you wanted to a random number between 1 and 100. You could use something that gives you a random byte of data, which will be a number between 0 and 255. Because you only want from 1 through 100 you might think that you can take your random byte, divide it by 100 and look at the remainder. So you might say

    myrand = random_byte % 100
    

    where "%" is the modulus operator. This will get you a random number between 0 and 99, so you add one to it.

    myrand = myrand + 1
    

    So now you have a random number between 1 and 100. You think everything is fine. If you Google for getting a random number in a range, that is the advice you will get. It is wrong.

    The problem with this is that the numbers between 1 and 56 will show up more often than the numbers between 57 and 100.

    Let's look at the number 50. There are three ways of getting a random byte that will yield 50 with our system. The random byte could be 49, 149, or 249. For example, 249 mod 100 will equal 49. and when we add one to it we get 50.

    Now lets like at, say, 75. There are only two random bytes that will get us 75 as a result: 74 and 174.

    So even though the random bytes we are getting may be high quality, uniformly distributed random numbers, our (far too frequently recommend scheme) will produce some numbers more commonly than others.

    One way to do it right

    To actually use any of the stuff described below, you need to be very comfortable with the Unix-command line. You can still read it, but don't expect to try to turn this into something that works for you without some background in a Unix shell.

    So the way that we fix this is that when we get a random byte, we check to see if it is 200 or greater. If it is, we throw it out and get another random byte. We do this until we get one that is less than 200.

    Anyway, here is a little shell script that will pick a number randomly and uniformly between 1 and 7776.

    #!/bin/bash
    # Copyright 2014, Jeffrey Goldberg of AgileBits. BSD license.
    
    # The trick with picking a number uniformally from a range is that our 
    # sources of random numbers return bytes and the ranges we want aren't
    # always even devisors of powers of 256.
    
    # The smallest power of 256 greater than or equal to 7776 is 256^2
    # (65536)  So we will need to ask openssl for 2 bytes of random data.
    # 7776 goes into 65536 eight times and some change. The "change" is the
    # problem. So we have to throw out numbers that are equal to or greater
    # than 8*7776.
    
    ((dec = 65536)) # start out with a value that will cause loop to 
                    # continue because do ... while constructions are
                    # unpopular.
    
    while (( dec >= 62208 ))
    do
        hex=`openssl rand -hex 2 | tr a-f A-F` # bc needs uppercase HEX
        dec=`echo "ibase=16; $hex" | bc` # convert from hex to decimal.
    done
    # d now holds a random number between 0 and (7776 * 8) -1
    
    (( a = dec % 7776 ))
    # a will be between 0 and 7775, uniformly chosen.
    
    (( ++a ))   # increment a
    echo $a     # print it.
    
    exit
    

    Anyway, this is hardly the most efficient thing in the world. And it could easily be generalized to work for other ranges of numbers. It would just need to calculate the number of bytes needed and where it should be throwing out numbers.

    Getting specific lines from diceware list

    OK. So now we have a way of getting high quality random numbers between 1 and 7776. Suppose the number we get in N. How do we get the _N_th line from the diceware list.

    Suppose that you have the diceware list and you've made sure that it is trimmed of other stuff and so that line number 1 contains the first word and line number 7776 contains the 7776th word. Suppose that file is called diceware.txt. In a Terminal window in the directory where diceware.txt resides you can use

    sed -n Np diceware.txt
    

    Where the "N" of "Np" is the line number you want. So if you want line 1856, you would use

    sed -n 1856p diceware.txt
    

    and get the 1856th line.

This discussion has been closed.