Random Number Generators (RNG)



Pseudo Random Number Generators (PRNG)

CSC 692 Cryptography

Tim Jacobson

Nathan Sachs

Introduction

The goal of this paper is to explain the process of artificially generating random numbers using algorithmic techniques that can be performed on a computer. There are many known sources for random numbers in the real world. Some good sources of random numbers are: electrical and thermal noise from semiconductors or radioactive decay. However, it is difficult to access and gather random numbers from such sources. Some sources like radioactive decay change too slowly to be of use. They also require distant measurements, which could be observed by others. Electronic monitoring is subject to malfunction and may have known biases that are undesirable. There are some known hardware devices that can generate random numbers such as the Intel RNG, but this paper will not address such devices. One alternative is to use software algorithms that try to be as random as possible. This is known as a Pseudo Random Number Generator (PRNG).

Linear Generators

Among the multitude of PRNGs that exist are many that can be classified as Linear Congruence Generators (LCG). These algorithms all follow the same general formula:

[pic]

In this formula p may be a prime or a nonreducible polynomial. Another observation that can be made is that b provides no benefit to the randomness of the algorithm. It is common to have b set to zero. Many authors describe LCG using the notation [pic]. LCGs are not very good for cryptographic use. The reason for this is that they are very predictable. However, they are good for generating huge samples of random numbers quickly. The rand() function in C is of this type of generator.

Nonlinear Generators

Nonlinear generators are less predictable than linear ones. However, this comes at the cost of calculation. Nonlinear generators tend to be 5 to 10 times slower. Among the nonlinear generators is the Inverse Congruential Generators (ICG). The general formula for an ICG is:

[pic]

The notation used is [pic], and again it assumed that p is a prime or irreducible polynomial. Also, b commonly has a value of zero. ICGs take much more time to calculate because of the inverse operation, which must be obtained from the Euclidean Algorithm. ICGs also have undesirable traits that will be explained later in testing.

A close relative to the ICG is the Explicit Inverse Congruential Generator (EICG). The EICG has a general formula of:

[pic]

The notation is similar to the previous two, [pic]. The EICG produces less predictable results than ICG. ICGs and EICGs can be combined to produce a compound EICGs. It is claimed the compound EICG can obtain long periods easier than the regular ICG or EICG types. It also allows for operations with small moduli, which is faster. Of course all of this come at the cost of performance due to added calculations.

With the linear and nonlinear congruence generators described above, selection of [pic]is a task that must be done with careful consideration. Subtle changes in these parameters have dramatic effects on the randomness of the algorithm used.

Randomness

One of the problems with any of these generators is the question of how do we know a generated sequence is random? Random numbers should be arbitrary, unknowable, and unpredictable but testing for this is not an accurate science. Arbitrary means that all numbers have an equal chance at being the one generated. Any bias in the generator needs to be removed, if possible, so that the results are arbitrary. Unknowable means that generated numbers will not reveal the parameters of the algorithm used. Given a generated number, it should be next to impossible to determine what values were used in the generator to create it. The PRNG should be viewed as a black box to outside sources. Unpredictable means that the next number to be generated cannot be predicted from previous numbers. Given a series of generated numbers, there should not be any way to use them in calculating the next number to be generated. All generators have regularities, which may become deficiencies. Assessing generators can be performed in two ways: theoretical analysis and empirical analysis.

Theoretical Analysis

Theoretical Analysis is performed by using mathematical principles to study the algorithm used. Theoretical analysis asks five primary questions:

1. What is the maximal period length of the given type of generator?

2. How are the parameters chosen to obtain the maximal period length?

3. Which algorithms let us obtain such parameters?

4. What are the regularities of this type of generator?

5. What correlations can be drawn between theoretical analysis and the results of empirical distribution samples?

All generators have a maximal period length. It is desirable for a generator to have as long a period length as possible. For congruential generators, this period length is controlled by the value of p. However, just picking a large prime number does not ensure that the algorithm will generate the maximal period. Care must also be given to choose the other parameters so this can be achieved. This addresses questions 1, 2 and 3.

For question 4, the spectral test can be used to look for regularities. The spectral test takes a set of overlapping vectors of random numbers that form lattice structures. Then the test measures the maximal distance between adjacent parallel hyperplanes. These lattice structures can be seen in 2D and 3D plots on the Cartesian coordinate system. LCGs show very clear lattice structures. Nonlinear generators avoid the lattice structures completely, which makes them less predictable. ICGs show symmetric similarity along diagonals which could make parts of them predictable. EICGs have no recognizable patterns using the spectral test. In addition, studies have shown that EICGs also exhibit properties that allow them to be implemented in parallel. This feature may be useful for parallel applications. Another test that can be used is called the discrepancy test. This test samples the whole period of a generator and determines the order of magnitude of the discrepancies between the numbers.

Question 5 is the most difficult to answer. With all the mathematical analysis that can be performed on an algorithm, there is still no guarantee that it will behave randomly in real samples. For this reason, testing must also include empirical (statistical) analysis of the algorithm. Unfortunately there is no direct correlation between theoretical and empirical testing to this date. What this implies is that both must be performed in order to perform a rigorous analysis.

Empirical Analysis

There are many different statistical tests that may be performed on a string of numbers that was generated by a PRNG to see if that string appears to be truly random. We will look at a few of these tests here. For this discussion, however, we will talk about testing pseudo random bit generators (PRBGs), which simply generate random strings of 1’s and 0’s. Converting a collection of bits into a number or converting a number to its representative bit string may easily map a PRBG to a PRNG and vice versa. Therefore, this discussion does not leave us lacking in ways to test a random number generator. The method would simply be to test the bit strings that represent the collection of numbers that was output by the PRNG.

Although unlikely, especially for long strings, a true random number generator could conceivable come up with the string of all zeros. However, this looks incredibly suspicious. The statistical tests we want to perform are based on hypothesis tests -- a method in statistics in which the statistician hypothesizes that some statement is true. He or she then collects data and computes the likelihood that the observed data could have arisen if the stated hypothesis is assumed to be true. If this likelihood is less than some predefined threshold (significance level) then the hypothesis is rejected. Due to limited space we will not go into the details of hypothesis testing here, but sources are widely available which describe these methods much more clearly. Suffice it to say that the tests we describe all involve creating a string of bits with a proposed algorithm and then hypothesizing that those bits came from a random number generator and creating statistics that indicate how random the string appears.

One simple test, the “frequency test,” checks to see that the number of 0’s and 1’s is approximately the same, as would be expected for a random sequence. Another is called the “serial test,” or “two-bit” test. This checks to see that the numbers of occurrences of 00, 01, 10, and 11 are approximately the same. A third test, called the “poker test,” checks to see all possible subsequence of a given length appear approximately the same number of times, as would be expected for a random sequence with sufficient length.

The runs test takes a slightly different approach. The method is to count the blocks and gaps of a given length and see if that count matches expected results for a random sequence. A block is a subsequence of 1’s stopped on each end by 0’s (or the end of the sequence). A gap is such a subsequence of 0’s. For a given length i, the number of blocks (or gaps) that would be expected in a random sequence of length n is given by [pic]. The runs test statistic will give a measure of how likely it would be to see the observed number of runs in the generated bit string if the string were a true random sequence.

The “autocorrelation test” uses the method of autocorrelation[1] to see if perhaps there is some correlation between every kth bit, for some stride size k. Note, any bit string generated by a PRBG will eventually be periodic. This test is used to see if there is some correlation between evenly spaced bits with stride size less than the period. This would not be expected in a random sequence.

A bit string that appears to be truly random will pass all of these tests and others. Although, it is perfectly possible for a truly random sequence to fail one or more of these tests, it is very unlikely and so such sequences are passed off as being non-random.

Seeding

All methods of pseudo random number generation must begin with an initial value, commonly referred to as the seed. For cryptographic use, calculating the seed must be as secure a process as the encryption algorithm that it will be used for. For example, a common technique for seeding a PRNG is to use the internal clock on the system. If an attacker knows the general time frame or day in which the key is generated. The attacker could try every possible time increment and get the key for the encryption. A better solution is to use a truly random or unpredictable value for the seed.

Good seed has both high quality and high quantity. The idea is that seed material may be collected and stored into what is called a seed pool. Out of this collection of material a PRNG may be “rebooted” every now and then. It has been suggested that the majority of this material be discarded such that only one bit out of each byte of seed material actually be used. It is imperative that a PRNG be reseeded before it reaches the end of its period. If this is not done, then the random numbers that are generated will begin to repeat themselves. Cryptographically speaking this is horrible. The PRNG will then become completely predictable until it is reseeded. Clearly it is important to maintain a high quantity of seed material. If many random numbers are being used, then the seed pool may become depleted which would cause the PRNG to enter that state of predictability just mentioned.

Entropy

In addition to having sufficient quantity of seed material it is important to maintain high quality material. There is a concept in this area called entropy. Entropy is defined as a measure of uncertainty. One article uses the following example: the gender of a member of a male soccer team has zero entropy. On the other hand, the next time that I’ll get a date has high entropy. This is partly because there are many variables involved and partly because there are some external influences involved as well. In general, system unique sources, such as configuration files and environment settings, have some entropy. Variable sources such as screen contents, system time and log file statistics have medium entropy. External random sources, however, provide the highest entropy. Some of these sources include mouse click timing, video input, and cursor position with time. A combination of many of these sources can provide the highest levels of entropy. Good schemes combine a few external sources with several medium level sources.

It is important, when selecting sources of entropy, to avoid those sources that may be easily monitored by possible attackers. System configuration files, for example, are poor sources of entropy not only because they are rather static, but also because they are open to reading by attackers. Even keyboard and mouse statistics may be monitored by possible attackers if connected to the network, as is the case with workstations that are connected to a remote server. Bias in entropy sources should also be taken into consideration. For example, if one source of entropy is to take the last key pressed on the keyboard then there could be a bias to the enter key. Also, certain letters of the English alphabet have higher frequency, which makes them vulnerable to attack. Certainly it is more likely that the last key pressed was the ‘E’ key than the ‘Q’ key. These biases can be compensated for using known methods that we will not discuss here; however, if the programmer is unaware of these biases then such compensation will not likely be made.

Cryptographically secure

What makes a RNG cryptographically secure? There are several ways of saying it. If there is no polynomial-time method for determining what the next bit in a sequence will be, given all proceeding bits, with a confidence level significantly greater than 50%, then the RNG is considered to be cryptographically secure. This is equivalent to saying that there are no polynomial-time methods for determining whether a given string was generated by a true RNG or a PRNG with a confidence level significantly greater than 50%. Equivalent, still, is the statement that there are no polynomial-time algorithms for compressing the string. However you say it, the sequence is considered to be cryptographically secure.

There are several approaches to creating such a sequence. One is to use a well-known cipher. The following algorithm is taken from the Handbook of Applied Cryptography, chapter 5, and makes use of RSA encryption to generate a pseudorandom bit string.

Algorithm: RSA pseudorandom bit generator

SUMMARY: a pseudorandom bit sequence [pic] of length k is generated.

1. Setup. Generate two secret RSA-like primes p and q, and compute n = pq and ( = (p – 1)(q – 1). Select a random integer e, 1 < e < (, such that gcd(e, () = 1.

2. Select a random integer x0 (the seed) in the interval [1, n – 1].

3. For i from 1 to k do the following:

a. [pic]

b. [pic]

4. The output sequence is [pic]

This is simply an application of the RSA algorithm one time for each bit to be entered into the pseudorandom bit string. Clearly, this algorithm is slow. Modifications exist on this basic premise such as the Micali-Schnorr PRBG, which can produce many bits per exponentiation.

A similar algorithm that was discussed in the text for the class is the Blum-Blum-Shub (BBS) algorithm. This works similarly to the above; however, instead of applying the RSA algorithm at each iteration in the 3rd step we square the last value mod n. Once again, the last bit is used as a bit in the generated bit string.

The RSA algorithm says to use a pair of large random prime numbers to generate the modulus n and the exponents e and d used for encryption and decryption. Suppose an unauthorized person, Eve, wanted to decrypt the message. She could easily do this if she knew the values of these two primes. Even if Eve could get some idea of a relatively small range of values one of these primes might fall into, then she might be able to quickly factor n. This would be catastrophic to the encryption scheme. Despite the difficulty in breaking the RSA algorithm itself, Eve may be able to decrypt the message simply because the RNG used to create the RSA keys was not secure. In fact, one author says that such an attack is actually more common than the cryptanalysis attacks that we have been discussing in class and points to Ian Goldberg and David Wagner, CS students who cracked Netscape’s SSL implementation in 1996 after discovering a weakness in the PRNG used by Netscape (this was, of course, fixed in Netscape 2.0).

Conclusion

Truly random number generation is a difficult thing to perfect on a computer. There are many algorithms that attempt to do this. Surprisingly, many programmers use poorly chosen PRNGs or blindly trust the ones provided with a software package. For cryptographic use this can be catastrophic since many attacks focus on the PRNG rather than breaking the encryption algorithm. Determining the true randomness of an algorithm is not an exact science. Testing for randomness can be performed by theoretical analysis that uses mathematics principles or empirical analysis that uses statistical testing. Good testing of any algorithm should include both theoretical and empirical testing. Creating the seed is also important for cryptographic use. Often times the seed will be a combination of entropy values. These entropy values are random and semi-random events that happen on the computer. Samples from these events can be used to create entropy that is very random. Ironically, some of the best known ways to create the seed value for cryptographic use are by using an encryption algorithm. This would suggest that encryption algorithms are PRNGs.

References

Atreya, Mohan. Pseudo Random Number Generators (PRNGs), matreya@ .

Callas, Jon. Using and Creating Cryptographic-Quality Random Numbers, jon/usingrandom.html, June 3, 1996.

Hellekalek, Peter, Random Number Generators, University of Salzburg’s Mathematical Dept. .

Knuth, D.E. The Art of Computer Programming, Vol. 2. Addison-Wesley, Reading, Mass. 1981.

Menezes, A. J., van Oorschot, P. C., and Vanstone, S. A. Handbook of Applied Cryptography. CRC Press, 1996. Available online at hac.

Testing Random Number Generators for Crypto-Security, National Institute for Standards and Technology (NIST), itl.div898/pubs/ar/ar1998/node6.html .

Trappe, W., and Washington, L., Introduction to Cryptography with Coding Theory. Prentice Hall Inc., 2002.

Using Random Number Generators, National HPCC Software Exchange (NHSE), npac.syr.edu/projects/random/brief/html .

-----------------------

[1] A method which involves shifting a sequence and seeing how well it correlates to the unshifted sequence. For each shift a value will be given. This is the method that was used to determine a key size when attempting to break the Vigenère cipher.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download