Seeding Random Number Generators

Despite what administrators, software engineers, most IT professionals and your family members tell you, computers actually do exactly what you tell them to do. This leads to one of the oddest practical problems in computer science: generating truly random data. You might have heard about some unique ways that people have used to feed in extra bits of entropy in order to seed random number generators. Pointing cameras at lava lamps, measuring mouse movements, microphone input, internal component temperatures and CPU frequencies have all been used as sources of randomness in order to seed random number generators. What I’m going to talk about is how some popular programming languages seed their random number generators by default.

Python

Python has a pretty easy to use random module. The documentation for how the random number is seeded is very explicit. If no seed is provided, /dev/urandom is used. On systems that don’t provide a source of randomness, the system time is used. It’s not kidding. It literally just calls time.time().

Java

Java uses the system time mixed in with a value that intends to make the seed a bit more unique from the other seed. I’ve since found a really good StackOverflow question with the relevant code, references and speculation on why those magic numbers were chosen.

Go

Go has two different random number generators. One to power simple cases where naive randomness is needed and one to assist with the fairly extensive crypto packages in go’s standard library. Let’s look at math/rand (the non-crypto package) first.

math/rand

math/rand provides a mutex-gated global which enables naive usage without the need to set up a new instance. This is what powers each function that math/rand provides. The code below shows the initialization of this global defined in math/rand/rand.go.

var globalRand = New(&lockedSource{src: NewSource(1)})

This random source defaults to using 1 as the seed. This doesn’t even try to seed it with entropy. It’s the caller’s responsibility to do that. math/rand allows you to create your own random source instances (each with their own seeds), which you’ll want to do for each goroutine if you need to get random data from multiple goroutines because of the mutex around the global random number generator source. Remember: It’s far too common for benchmarks to unintentionally measure the performance of random number generators instead of the intended target.

crypto/rand

When I saw how the math/rand random number generator was seeded by default I figured that the crypto libraries must use something else or seed it explicitly. That turns out to be the case. The crypto random number generator is much simpler. It just consists of an io.Reader which produces random data. That’s shown in crypto/rand/rand.go. You might notice that there’s no implementation for io.Reader in that file. The reason for this is that the implementation of the io.Reader interface is platform specific. For unix machines, /dev/urandom is used.

const urandomDevice = "/dev/urandom"
...
func (r *devReader) Read(b []byte) (n int, err error) {
if altGetRandom != nil && r.name == urandomDevice && altGetRandom(b) {
return len(b), nil
}
r.mu.Lock()
defer r.mu.Unlock()
if r.f == nil {
f, err := os.Open(r.name)

The mention of altGetRandom looks a bit strange. Well, the rabbit hole gets just slightly deeper as far as unix systems are concerned. For linux, there’s an even more specific implementation which decides if it can use the getrandom system call instead of reading /dev/urandom like a file.

Windows is obviously an altogether different beast than unix. With windows, go uses CryptoAPI 2.0 as shown in crypto/rand/rand_windows.go.

What I Learned

For trivial uses, random number generators are usually seeded with the current timestamp. Random number generators intended for cryptography usually defer to the operating system in order to generate randomness. Operating system kernels are in a better place to maintain a common pool of entropy for many different programs. In unix-based systems, this is typically done through a number of kernel modules which pull from many of the sources I mentioned at the beginning.