Randomness (no, really!)

Linear congruential generators

Courtney
Randomness (no, really!)

--

(Originally published on LinkedIn)

Randomness has always been a crucial concept in computing. But, with random forests on the rise in machine learning, there is perhaps a greater need now than ever for an understanding of pseudorandom number generators (PRNGs).

Pseudorandom number generators are designed to produce numbers that look random according to statistical tests. Good PRNGs generate numbers that are essentially “random” for any practical purpose. However, the degree to which the generated numbers approximate true randomness depends on the PRNG algorithm—resultant sequences will never be truly random since they are generated using relatively small sets of initial values.

The simplest and most common kind of PRNG is the linear congruential generator (LCG), perhaps best explained through a manual example:

a_{i+1} = ((1664525)a_i + 1013904223) % 2^32

To generate a pseudorandom number using the equation above, input the previously generated number a_i for an output of the next generated number, a_{i+1}. The initial value (“the seed”) used in such computations is often based on the current time (usually the number of seconds since January 1, 1970, or epoch time). Then, do “x % 2^32" (dividing the first part of the second half the equation by 2^32 is super fast since you only have to consider the final 32 bits and can ignore the rest!) Take the remainder of this calculation, and you have your next pseudorandom number!

The current Unix epoch time is 1402606601, so we will use this value as the seed:

a_1 = ((1664525)a_0 + 1013904223) % 2^32

a_1 = ((1664525)(1402606601) + 1013904223) % 2^32

a_1 = 2334674766433748 % 2^32

a_1 = 3558772180

This is essentially how a linear congruential generator works. There are, however, some statistical issues with LCGs; “if an LCG is used to choose points in an n-dimensional space, the points will lie on, at most, (n!m)1/n hyperplanes”. In other words, if you randomly choose points in a high dimensional space with it, they will tend to not be evenly distributed through the space, but fall into certain subspaces:

(You can view an animated .gif version of LCG hyperplanes here!)

Additionally, some LCGs repeat their lower order bits frequently; for example, there was a buggy implementation of rand() that alternated between even and odd, which is obviously not very random. Good LCGs can avoid these issues by generating more bits than they need and discarding some of those bits at the end.

Hopefully this post has been a useful high-level explanation of linear congruential generators!

--

--

Courtney
Randomness (no, really!)

Former SpaceX hardware engineer turned Microsoft software engineer. Done some other things along the way.