snippetMajor
How does an operating system create entropy for random seeds?
Viewed 0 times
randomcreateseedssystemoperatingentropyfordoeshow
Problem
On Linux, the files
They can be read as normal files:
Many other unix variants provide
The Windows equivalent is the
How does the operating system generate pseudo-randomness?
/dev/random and /dev/urandom files are the blocking and non-blocking (respectively) sources of pseudo-random bytes. They can be read as normal files:
$ hexdump /dev/random
0000000 28eb d9e7 44bb 1ac9 d06f b943 f904 8ffa
0000010 5652 1f08 ccb8 9ee2 d85c 7c6b ddb2 bcbe
0000020 f841 bd90 9e7c 5be2 eecc e395 5971 ab7f
0000030 864f d402 74dd 1aa8 925d 8a80 de75 a0e3
0000040 cb64 4422 02f7 0c50 6174 f725 0653 2444
...Many other unix variants provide
/dev/random and /dev/urandom as well, without the blocking/non-blocking distinction.The Windows equivalent is the
CryptGenRandom() function.How does the operating system generate pseudo-randomness?
Solution
The title and the body of your question ask two different questions: how the OS creates entropy (this should really be obtains entropy), and how it generates pseudo-randomness from this entropy. I'll start by explaining the difference.
Where does randomness come from?
Random number generators (RNG) come in two types:
Some applications, such as simulations of physical phenomena, can be content with random numbers that pass statistical tests. Other applications, such as the generation of cryptographic keys, require a stronger property: unpredictability. Unpredictability is a security property, not (only) a statistical property: it means that an adversary cannot guess the output of the random number generator. (More precisely, you can measure the quality of the RNG by measuring the probability for an adversary to guess each bit of RNG output. If the probability is measurably different from 1/2, the RNG is bad.)
There are physical phenomena that produce random data with good statistical properties — for example, radioactive decay, or some astronomical observations of background noise, or stock market fluctuations. Such physical measurements need conditioning (whitening), to turn biased probability distributions into a uniform probability distribution. A physical measurement that is known to everyone isn't good for cryptography: stock market fluctuations might be good for geohashing, but you can't use them to generate secret keys.
Cryptography requires secrecy: an adversary must not be able to find out the data that went into conditioning. There are cryptographically secure pseudo-random number generators (CSPRNG): PRNG algorithms whose output is suitable for use in cryptographic applications, in addition to having good statistical properties. One of the properties that make a CSPRNG cryptographically secure is that its output does not allow an adversary to reconstruct the internal state (knowing all the bits but one produced by a CSPRNG does not help to find the missing bit). I won't go into how to make a CSPRNG because that's the easy bit — you can follow recipes given by professional cryptographers (use a standard algorithm, such as Hash_DRBG, HMAC_DRBG or CTR_DRBG from NIST SP 800-90A) or the ANSI X9.31 PRNG. The CSPRNG requires two properties of its state in order to be secure:
Architecture of a random number generator
In practice, almost all good random number generators combine a CSPRNG with one or more entropy sources. To put it succintly, entropy is a measure of the unpredictability of a source of data. Basing a random number generator purely on a hardware RNG is difficult:
Thus the RNG in an operating system almost always works like this:
A random number generation service is part of the job of an operating system, because entropy gathering requires access to hardware, and entropy sources constitute a shared resource: the operating system must assemble them and derive output from them that will suit applications. Pseudo-random conditioning of the entropy sources is required in the operating system; it might as well be cryptographically secure, because this isn't fundamentally harder (and it is required on operating systems where applications do not trust each other; on fully cooperative systems, each application would have to run its own CSPRNG internally if the operating system didn't provide one anyway).
Most systems with persistent storage will load an RNG seed from disk (I'll use “disk” as an abbreviation for any kind of persistent storage) when they boot, and overwrite
Where does randomness come from?
Random number generators (RNG) come in two types:
- Pseudo-random number generators (PRNG), also called deterministic random bit generators (DRBG) or combinations thereof, are deterministic algorithms which maintain a fixed-size variable internal state and compute their output from that state.
- Hardware random number generator (HRNG), also called “true” random number generators, are based on physical phenomena. “True” is a bit of a misnomer, because there are no sources of information that are known to be truly random, only sources of information that are not known to be predictable.
Some applications, such as simulations of physical phenomena, can be content with random numbers that pass statistical tests. Other applications, such as the generation of cryptographic keys, require a stronger property: unpredictability. Unpredictability is a security property, not (only) a statistical property: it means that an adversary cannot guess the output of the random number generator. (More precisely, you can measure the quality of the RNG by measuring the probability for an adversary to guess each bit of RNG output. If the probability is measurably different from 1/2, the RNG is bad.)
There are physical phenomena that produce random data with good statistical properties — for example, radioactive decay, or some astronomical observations of background noise, or stock market fluctuations. Such physical measurements need conditioning (whitening), to turn biased probability distributions into a uniform probability distribution. A physical measurement that is known to everyone isn't good for cryptography: stock market fluctuations might be good for geohashing, but you can't use them to generate secret keys.
Cryptography requires secrecy: an adversary must not be able to find out the data that went into conditioning. There are cryptographically secure pseudo-random number generators (CSPRNG): PRNG algorithms whose output is suitable for use in cryptographic applications, in addition to having good statistical properties. One of the properties that make a CSPRNG cryptographically secure is that its output does not allow an adversary to reconstruct the internal state (knowing all the bits but one produced by a CSPRNG does not help to find the missing bit). I won't go into how to make a CSPRNG because that's the easy bit — you can follow recipes given by professional cryptographers (use a standard algorithm, such as Hash_DRBG, HMAC_DRBG or CTR_DRBG from NIST SP 800-90A) or the ANSI X9.31 PRNG. The CSPRNG requires two properties of its state in order to be secure:
- The state must be kept secret from the start and at all times (though exposure of the state will not reveal past outputs).
- The state must be linear: the RNG must never be started twice from the same state.
Architecture of a random number generator
In practice, almost all good random number generators combine a CSPRNG with one or more entropy sources. To put it succintly, entropy is a measure of the unpredictability of a source of data. Basing a random number generator purely on a hardware RNG is difficult:
- The raw physical data is likely to need conditioning anyway, to turn probabilistic data into a uniform distribution.
- The output from the source of randomness must be kept secret.
- Entropy sources are often slow compared with the demand.
Thus the RNG in an operating system almost always works like this:
- Accumulate sufficient entropy to build an unpredictable internal state.
- Run a CSPRNG, using the accumulated entropy as the seed, i.e. as the initial value of the internal state.
- Optionally, periodically mix additional entropy into the internal state. (This is not strictly necessary, since entropy is not “consumed” at any measurable rate. It helps against certain threats that leak the RNG state without compromising the whole system.)
A random number generation service is part of the job of an operating system, because entropy gathering requires access to hardware, and entropy sources constitute a shared resource: the operating system must assemble them and derive output from them that will suit applications. Pseudo-random conditioning of the entropy sources is required in the operating system; it might as well be cryptographically secure, because this isn't fundamentally harder (and it is required on operating systems where applications do not trust each other; on fully cooperative systems, each application would have to run its own CSPRNG internally if the operating system didn't provide one anyway).
Most systems with persistent storage will load an RNG seed from disk (I'll use “disk” as an abbreviation for any kind of persistent storage) when they boot, and overwrite
Context
StackExchange Computer Science Q#29880, answer score: 34
Revisions (0)
No revisions yet.