TIFU by using Math.random()
By Betable CTO, Mike Malone
Like most good TIFUs this didn’t happen today. It actually happened about two years ago. It is, however, still relevant. More significantly, it’s not just me who screwed up. Math.random() in the V8 Javascript engine is screwed up, too.
“Many random number generators in use today are not very good. There is a tendency for people to avoid learning anything about such subroutines; quite often we find that some old method that is comparatively unsatisfactory has blindly been passed down from one programmer to another, and today’s users have no understanding of its limitations.”
— Donald Knuth; The Art of Computer Programming, Volume 2.
By the end of this post I’m hoping we’ll all agree on two things, one of which may be slightly more controversial than the other:
- We were stupid not to understand the limitations of V8’s PRNG before using it, and CSPRNGs are a safer option if you’re feeling lazy.
- The implementation of Math.random() in V8 should be replaced. The current algorithm, which appears to have been passed down from one programmer to another, is comparatively unsatisfactory (and arguably completely broken) due to subtle, non-intuitive degenerate behavior that is likely to be encountered under realistic circumstances.
For the record, I do want to say that I think V8 is a very impressive piece of software and the people who work on it are clearly very talented. This isn’t an indictment of any of them. Rather, it’s an illustration of how subtle some aspects of software development can be.
Now that we all know where we’re headed, let’s begin at the beginning.
Betable is built on random numbers. Aside from other more obvious uses we like using randomly generated identifiers. Our architecture is distributed and microservice-y, and random identifiers are easier to implement than sequential identifiers in this sort of system.
For example, we generate random request identifiers whenever we receive an API request. We thread these identifiers through to sub-requests in headers, log them, and use them to collate and correlate all of the things that happened, across all of our services, as a result of a single request.
Generating random identifiers isn’t rocket science. There’s only one requirement —
It must be really, really unlikely that the same identifier will ever be generated twice, causing a collision.
And there are just two factors that impact the likelihood of a collision —
- The size of the identifier space — the number of unique identifiers that are possible
- The method of identifier generation — how an identifier is selected from the space of all possible identifiers
Ideally we want a big identifier space from which identifiers are selected at random from a uniform distribution (henceforth, we’ll assume that anything done “at random” uses a uniform distribution).
We did the birthday paradox math and settled on making our request identifiers 22 character words with each character drawn from a 64 character alphabet. They look like EB5iGydiUL0h4bRu1ZyRIi or HV2ZKGVJJklN0eH35IgNaB. Each character in the word has 64 possible values, there are 22 characters, so there are 64²² such words. That makes the size of our identifier space 64²² or ~2¹³².
With 2¹³² possible values, if identifiers were randomly generated at the rate of one million per second for the next 300 years the chance of a collision would be roughly 1 in six billion.
So we’ve got a big enough identifier space, but how do we generate identifiers at random? The answer is a decent pseudo-random number generator (PRNG), a common feature of many standard libraries. The top of our API stack is a Node.js service (we also use a lot of Go, but that’s another blog post). Node.js, in turn, uses the V8 Javascript engine that Google built for its Chrome web browser. All compliant ECMAScript (Javascript) implementations must implement Math.random(), which takes no arguments and returns a random number between 0 and 1. Good start.
So, given a sequence of pseudo-random numbers between 0 and 1 we need to generate a random word with characters from our 64 character alphabet. This is a pretty common problem, here’s the pretty standard solution we chose —
Before you start picking it apart, there’s nothing wrong with this code — it does exactly what it’s supposed to do.
We’re in business. A procedure for producing random identifiers that is extremely unlikely to produce a collision, even if we’re producing a million a second for 300 years. Test, commit, push, test, deploy.
The above code hit production and we forgot about it until a rather alarming email came through from a colleague, Nick Forte, telling us that the impossible had happened —
“Anyone who considers arithmetical methods of producing random digits is, of course, in a state of sin.” John von Neumann was stating the obvious here — it’s a simple tautology that deterministic methods, like arithmetic, cannot produce random bits. It’s a contradiction in terms. So what is a PRNG?
What is a PRNG?
Let’s ground our discussion by looking at a simple PRNG and the “random numbers” it produces —
This should make von Neumann’s point clear — the sequence of numbers generated by this algorithm is obviously not random. For most purposes, this non-randomness doesn’t really matter. What we really need is an algorithm that can generate a sequence of numbers that appear to be random (technically, they should appear to be independent and identically distributed random variables having a uniform distribution over the generator’s range). In layman’s terms, we should be able to safely pretend that the pseudo-random numbers are truly random numbers.
If it’s hard to distinguish a generator’s output from truly random sequences we call it a high quality generator. If it’s easy, we call it a low quality generator.
For the most part, quality is determined empirically by pulling a bunch of numbers from the generator and running some statistical tests for randomness (e.g., by checking that there are an equal number of 0 bits and 1 bits, figuring out how many collisions there are, doing a Monte-Carlo estimation of pi, etc). Another, more pragmatic measure of PRNG quality is how well the algorithm actually works in practice as a stand-in for truly random numbers.
Apart from its non-randomness, our simple algorithm demonstrates another important characteristic of all PRNGs. If you keep pulling numbers, it will eventually start repeating the same sequence over again. This is called periodicity, and all PRNGs are periodic.
The period or cycle length of a PRNG is the length of the sequence of numbers that the PRNG generates before repeating.
You can think of a PRNG as a highly compressed codebook containing a sequence of numbers. The kind a spy might use as a one-time pad. The seed is the starting position in the book. Eventually, you’ll loop around the end of the book and get back to where you started. That’s a cycle.
A long cycle length doesn’t guarantee high quality, but it helps. Often, cycle length can be guaranteed by mathematical proof. Even when we can’t calculate cycle length exactly, we can determine an upper bound. Since a PRNG’s next state and output are both deterministic functions of its current state, the cycle length cannot be larger than the number of possible states. To achieve this maximum cycle length the generator must enter every possible state before returning, again, to its current state.
If a PRNG’s state has a k-bit representation, the cycle length is less than or equal to 2ᵏ. A PRNG that actually achieves this maximum cycle length is called a full-cycle generator.
Good PRNGs are designed so that their cycle length is close to this upper bound. Otherwise you’re wasting memory.
Let’s go one step further and analyze the number of distinct random values that a PRNG can produce through some deterministic transformation on its output. For instance, consider the problem of generating triples of random values between 0 and 15, like (2, 13, 4) or (5, 12, 15). There are 16³ or 4096 such triples, but our simple PRNG can only produce 16 of them.
It turns out that this is another general characteristic of PRNGs —
The number of distinct values that can be generated from a pseudo-random sequence is bounded by the sequence’s cycle length.
The same holds regardless of the sort of value we’re producing — we can only produce 16 distinct tuples of four values (or any other length), 16 distinct array shuffles, 16 random fuzz test values, etc. We will always only be able to generate 16 distinct values of any type. Ever.
Remember our algorithm for producing random identifiers? We randomly generated words consisting of 22 characters drawn from a 64 character alphabet. In other words, we generated tuples of 22 numbers between 0 and 63. It’s the same problem, and the number of distinct identifiers we can produce is bounded by the size of the PRNG’s internal state, and its cycle length, in the same way.
Math.random()
Back to our problem. In response to Nick’s email about identifier collisions we quickly reviewed our birthday paradox math and checked our scaling code. We couldn’t find anything wrong, so the problem had to be deeper. Armed with our general knowledge of PRNGs, let’s start digging.
The ECMAScript standard says the following about Math.random() —
Returns a Number value with positive sign, greater than or equal to 0 but less than 1, chosen randomly or pseudo randomly with approximately uniform distribution over that range, using an implementation-dependent algorithm or strategy.
The specification leaves a lot to be desired. First, it doesn’t mention anything about precision. Since ECMAScript Numbers are IEEE 754 binary64 double-precision floats, we might expect 53-bit precision (i.e., random values taking the form x/2⁵³ for all x in 0..2⁵³-1). Mozilla’s SpiderMonkey engine seems to agree but, as we’ll soon find out, V8’s Math.random() only has 32-bit precision (i.e., values taking the form x/2³² for all x in 0..2³²-1). Good to know, but no matter, we only need six bits to produce a random letter from our 64 character alphabet.
What does matter for us is that the specification leaves the actual algorithm up to implementors. It has no minimum cycle length requirement, and is hand-wavy about the PRNG’s quality — the distribution just needs to be “approximately uniform.” To do our analysis we need to know what algorithm V8 uses. It’s nowhere to be found in any documentation, so let’s go to the source.
The V8 PRNG
“Iwould have gone with mersenne twister since it is what everyone else uses (python, ruby, etc).” This short critique, left by Dean McNamee, is the only substantive feedback on the code review of V8’s PRNG when it was first committed on June 15, 2009. Dean’s recommendation is the same one I’ll eventually get around to making in this post.
V8’s PRNG code has been tweaked and moved around over the past six years. It used to be native code, now it’s in user-space, but the algorithm has remained essentially the same. The actual implementation uses internal API and is a bit obfuscated, so let’s look at a more readable implementation of the same algorithm —
Well, that’s still pretty opaque, but let’s slog through.
There is one more clue. A comment in older versions of the V8 source stated simply: “random number generator using George Marsaglia’s MWC algorithm.” A few minutes with Google teaches the following —
- George Marsaglia was a mathematician who spent much of his career studying PRNGs. He also created the original Diehard battery of statistical tests for measuring the quality of a random number generator.
- MWC stands for multiply-with-carry, a class of PRNGs that Marsaglia invented. MWC generators are very similar to classic linear congruential generators (LCGs) like our simple example from earlier (in fact, there is a one-to-one correspondence between MWCs and LCGs, see section 3.6 of this paper for details). Their advantage over standard LCGs is that they can produce sequences with longer cycle lengths with about the same number of CPU cycles.
So if you’re going to crib a PRNG off of someone, Marsaglia seems like a good choice, and MWC seems like a reasonable algorithm.
The V8 algorithm doesn’t look like a typical MWC generator though. It turns out that’s because the V8 algorithm is not an MWC generator. It’s two MWC sub-generators — one on line 5, the other on line 6 — combined to produce one random number on line 9. I’ll spare you the math, but each of the sub-generators have prime cycle lengths of about 2³⁰, making the combined cycle length of the generated sequence about 2⁶⁰.
If you’ll recall, we have 2¹³² possible identifiers, but now we know that V8’s Math.random() can only produce 2⁶⁰ of them. Still, assuming a uniform distribution, the probability of collision after randomly generating 100,000,000 identifiers should be less than 0.4%. We started seeing collisions after generating far fewer identifiers than that. Something must be wrong with our analysis. The cycle length estimate is provably correct, so we must not have a uniform distribution — there must be some additional structure to the sequence being generated.
A Tale of Two Generators
Before returning to the V8 PRNG, let’s look one more time at our random identifier generation code —
The scaling method on line 6 is important. This is the method that MDN recommends for scaling a random number, and it’s in widespread use in the wild. It’s called the multiply-and-floor method, because that’s what it does. It’s also called the take-from-top method, because the lower bits of the random number are truncated, leaving the left-most or top bits as our scaled integer result. (Quick note: it’s subtle, but in general this method is slightly biased if your scaled range doesn’t evenly divide your PRNG’s output range. A general solution should use rejection sampling like this, which is part of the standard library in other languages.)
Do you see the problem yet? What’s weird about the V8 algorithm is how the two generators are mixed. It doesn’t xor the numbers from the two streams together. Instead, it simply concatenates the lower 16 bits of output from the two sub-generators. This turns out to be a critical flaw. When we multiply Math.random() by 64 and floor it we end up with the left-most, or top 6 bits of the number. These top 6 bits come exclusively from the first of the two MWC sub-generators.
If we analyze the first sub-generator independently we see that it has 32 bits of internal state. It’s not a full-cycle generator — its actual cycle length is about 590 million (18,030*2¹⁵-1, the math is tricky but it’s explained here and here, or you can just trust me). So we can only produce a maximum of 590 million distinct request identifiers with this generator. If they were randomly selected there would be a 50% chance of collision after generating just 30,000 identifiers.
If that were true, we should have started seeing collisions almost immediately. To understand why we didn’t, recall our simple example where we pulled triples from a 4 bit LCG. Birthday paradox math doesn’t apply — for this application the sequence is nowhere near random, so we can’t pretend it is. It’s clear that we won’t produce a duplicate until the 17th triple. The same thing is happening with the V8 PRNG and our random identifiers — under certain conditions, the PRNG’s lack of randomness is making it less likely that we’ll see a collision.
In this case the generator’s determinism worked in our favor, but that’s not always true. The general lesson here is that, even for a high quality PRNG, you can’t assume a random distribution unless the generator’s cycle length is much larger than the number of random values you’re generating. A good general heuristic is —
If you need to use n random values you need a PRNG with a cycle length of at least n².
The reason is that, within a PRNGs period, excessive regularity can cause poor performance on some important statistical tests (in particular, collision tests). To perform well, the sample size n must be proportional to the square root of the period length. Page 22 of Pierre L’Ecuyer’s excellent chapter on random number generation has more detail.
For a use case like ours, where we’re trying to generate unique values using multiple independent sequences from the same generator, we’re less concerned about statistical randomness and more concerned that the sequences not overlap. If we have n sequences of length l from a generator with period p, the probability of an overlap is [1-(nl)/(p-1)]ⁿ⁻ ¹, or approximately ln²/p for a big enough p (see here and here for details). The point is we need a long cycle length. Otherwise we’re making a mistake pretending our sequence is random.
Long story short, if you’re using Math.random() in V8 and you need a sequence of random numbers that’s reasonably high quality, you shouldn’t use more than about 24,000 numbers. If you’re generating multiple streams of any substantial size and don’t want any overlap, you shouldn’t use Math.random() at all.
If the algorithm that V8’s Math.random() uses is poor quality, you might be wondering how it was chosen at all. Let’s see if we can find out.
A Brief History of MWC1616
“The MWC generator concatenates two 16-bit multiply-with-carry generators […] has period about 2⁶⁰ and seems to pass all tests of randomness. A favorite stand-alone generator — faster than KISS, which contains it.” That’s the extent of Marsaglia’s analysis of MWC1616, which is the name of the algorithm that powers V8’s Math.random(). If you take him at his word, the algorithm ticks the box for most of the important criteria you’d consider in choosing a PRNG.
MWC1616 was first introduced by Marsaglia in 1997 as a simple general purpose generator that, in his words, “seems to pass all tests of randomness put to it,” a comment that betrays Marsaglia’s largely empirical methodology. He seems to have trusted an algorithm if it passed his Diehard tests. Unfortunately, the Diehard tests he was using in the late 1990s weren’t that good, at least by today’s standards. If you run MWC1616 through a more modern empirical testing framework like TestU01's SmallCrush it fails catastrophically (it does even worse than the MINSTD generator, which was outdated even in the 1990s, but Marsaglia’s Diehard tests probably didn’t have the granularity to tell him that).
As far as I know, there’s no mathematical basis for combining sub-generators the way MWC1616 does — concatenating subsets of the generated bits. It’s more typical to see bits from sub-generators mixed using some form of modulo arithmetic (e.g., addition modulo 2³², or xor). It appears that Marsaglia, himself, became aware of this deficiency shortly after posting MWC1616 on Usenet as a component of one version of his KISS generator. On January 12, 1999, Marsaglia posted the version of MWC1616 used in V8. Eight days later, on January 20, he posted a different version of the algorithm. It’s subtle, but in the updated version, the upper bits of the second generator are not masked away, mixing bits from the two sequences more thoroughly.
Both versions of the algorithm appear in other places, adding to the confusion. The January 20 version of MWC1616 (i.e., the better version) is in Numerical Recipes, labeled MWC with Base b = 2¹⁶, under the heading When You Have Only 32-Bit Arithmetic, and only after first advising that, rather than implementing one of the algorithms listed, you should “get a better compiler!” Pretty discouraging words for an algorithm that’s better than what V8 has powering Math.random(). Rather inexplicably (because it’s so obscure) the January 20 version of MWC1616 is also given as an example computational method in Wikipedia’s article on random number generation. Implementations of the older January 12 version are included in TestU01 twice, once labeled MWC1616 and a second time labeled MWC97R. It’s also one of the generators available in R (apparently it used to be the default).
So there are lots of places the algorithm can be found. It was obscure to me, but given the bona fides listed above I guess it’s not surprising it was chosen. Hopefully this article will serve as a warning, strengthening and confirming Knuth’s observation that kicked off this post —
- In general, PRNGs are subtle and you should do your own analysis and understand the limitations of any algorithm you’re implementing or using
- Specifically, don’t use MWC1616, it’s not very good
There are lots of better options. Let’s look at a couple of them.
The CSPRNG Workaround
To fix our identifier code we needed a replacement for Math.random() and we needed it fast. Lots of alternative PRNG implementations exist for Javascript, but we were looking for something that —
- Has a long enough period to generate all of our 2¹³² identifiers
- Is well supported and battle tested
Luckily, the Node.js standard library has another PRNG that meets both requirements: crypto.randomBytes(), a cryptographically secure PRNG (CSPRNG) that calls OpenSSL’s RAND_bytes (which, according to the docs, produces a random number by generating the SHA-1 hash of 8184 bits of internal state, which it regularly reseeds from various entropy sources). If you’re in a web browser crypto.getRandomValues() should do the same job.
This isn’t a perfect general solution for three reasons —
- CSPRNGs almost always use non-linear transformations and are generally slower than non-cryptographic alternatives
- Many CSPRNG systems are not seed-able, which makes it impossible to create a reproducible sequence of values (e.g., for testing)
- CSPRNGs emphasize unpredictability over all other measures of quality, some of which might be more important for your use case
However —
- Speed is relative, and CSPRNGs are fast enough for most use cases (I can get about 100MB/s of random data from crypto.getRandomValues() in Chrome on my machine)
- In the limit, unpredictability implies an inability to distinguish the generator’s output from true randomness, which implies everything else we want out of a pseudo-random sequence
- It’s likely that a generator advertised as “cryptographically secure” has been carefully code reviewed and subjected to many empirical tests of randomness
We’re still making some assumptions, but they are evidence-based and pragmatic. If you’re unsure about the quality of your non-cryptographic alternatives, and unless you need deterministic seeding or require rigorous proofs of quality measures, using a CSPRNG is your best option. If you don’t trust your standard library’s CSPRNG (and you shouldn’t for cryptographic purposes) the right solution is to use urandom, which is managed by the kernel (Linux uses a scheme similar to OpenSSL’s, OS X uses Bruce Schneier’s Yarrow generator).
I can’t tell you the exact cycle length of crypto.randomBytes() because as far as I know there’s no closed form solution to that problem (i.e., no one knows). All I can say is that with a large state space and a continuous stream of new entropy coming in, it should be safe. If you trust OpenSSL to generate your public/private key pairs then it doesn’t make much sense not to trust it here. Empirically, once we swapped our call to Math.random() with a call to crypto.randomBytes() our collision problem went away.
In fact, Chrome could just have Math.random() call the same CSPRNG they’re using for crypto.randomBytes(), which appears to be what Webkit is doing. That said, there are lots of fast, high quality non-cryptographic alternatives, too. Let’s put a final nail in the MWC1616 coffin and take a look at some other options.
V8’s PRNG is Comparatively Unsatisfactory
My goal was to convince you that V8’s Math.random() is broken, and should be replaced. So far we’ve found obvious structural patterns in its output bits, catastrophic failure on empirical tests, and poor performance in the real world. If you still want more evidence, here are some pretty pictures that might sway you —
Hopefully you’ll agree at this point that V8’s Math.random() is comparatively unsatisfactory and should be fixed. The question is how should it be fixed? A one line patch would improve the bit-mixing, but I can’t see any reason to keep MWC1616 at all. There are better options.
A detailed comparison of the myriad existing methods for producing pseudo-random bits is going to have to wait for another post. Roughly, though, the requirements we’re looking for are —
- A large state space, and a large seed — ideally at least 1024 bits, since this will be an upper bound on other qualities of the generator. A state space of 2¹⁰²⁴ is enough for 99.9% of use cases, with a significant safety factor.
- Speed, let’s make it at least as fast as the current implementation which produces around 25 million numbers per second on my machine.
- Memory efficiency — we’ll probably need at least 256 bytes for a generator with 1024 bits of state in user-space Javascript (we can only use 32 bits per 64 bit Number), if that’s infeasible there are workarounds, but I’m going to assume we can afford this.
- A very long period, a full cycle generator is great but anything over 2⁵⁰ should be sufficient to avoid cycling. Anything over 2¹⁰⁰ should let us safely pull 2⁵⁰ values while continuing to pretend we’ve got a random sequence.
- At a minimum, passing marks on empirical tests of randomness like TestU01's SmallCrush — extra credit for passing marks on BigCrush and good equidistribution (which, unfortunately, I don’t have room to explain). TestU01 is more rigorous than Dieharder, and I don’t know much about the NIST tests or rngtest, but those might work too.
There are many PRNG algorithms that meet or exceed these requirements. Xorshift generators (also discovered by Marsaglia) are fast and do very well on statistical tests (much better than MWC1616). An xorshift variant called xorgens4096 has been implemented in Javascript by David Bau. It has a 4096-bit state space, a cycle length of ~2⁴⁰⁹⁶, and it runs faster than MWC1616 in Chrome on my machine. Moreover, it has no systematic failures on BigCrush.
Recently it’s been shown that taking the output of an xorshift generator and multiplying by a constant is a sufficient non-linear transformation for the generator to pass BigCrush. This class of generators, called xorshift*, is very fast, easy to implement, and memory efficient. The xorshift1024* generator meets or exceeds all of our requirements. If the memory premium turns out to be a real problem, the xorshift64* generator has the same memory footprint, a longer cycle length, and is faster than MWC1616, beating it on all counts. Another new family of linear/non-linear hybrid generator called PCG claims similar performance and quality characteristics.
So there are lots of good algorithms to chose from. That said, the safest choice is probably a standard Mersenne Twister. The most popular variant, MT19937, was introduced in the late 90s. Since then it’s become the standard generator in dozens of software packages. It’s not perfect, but it has been battle tested and thoroughly analyzed. Its properties are well understood, and it does well on empirical tests. With an ostentatiously long cycle length of 2¹⁹⁹³⁷-1 it’s hard to misuse, but it does have an imposing 2KB state space and is criticized for its memory footprint and relatively poor performance. A quick search uncovered an existing Javascript implementation, by Sean McCullough. Inexplicably, it’s as fast as the existing Math.random() implementation in Chrome on my machine.
So my advice is that V8 reconsider Dean McNamee’s comment from six years ago and use the Mersenne Twister algorithm. It’s fast enough, and robust enough to be safely used by developers who don’t have a deep understanding of how PRNGs work. A more exotic alternative is fine too. Just get rid of MWC1616, please!
In Summary
This was a long post, so let me re-cap the important bits —
- The PRNG algorithm behind V8’s Math.random() is called MWC1616. If you’re only using the most significant 16 bits it has a very short effective cycle length (less than 2³⁰). In general, it does poorly on empirical tests of quality. For most non-trivial use cases it’s not safe to pretend that it’s output is truly random. Be careful using it for anything you care about.
- Cryptographically secure PRNGs are a better option if you don’t have time to do proper diligence on non-cryptographic alternatives. The safest option (and the right solution for crypto) is to use urandom. In browser you can use crypto.getRandomValues().
- There are options for non-cryptographic PRNG algorithms that are faster and higher quality than MWC1616. V8 should replace its Math.random() implementation with one of them. There are no losers. Mersenne Twister (MT19937) is the most popular, and probably the safest choice.
I’ll note, in passing, that Mozilla’s use of the LCG from Java’s util.Random package isn’t much better than MWC1616. So SpiderMonkey should probably go ahead and upgrade too.
In the meantime, the browser continues to be a confusing and dangerous place. Be safe out there!
Special thanks to Nick Forte, who discovered the collision bug in our production code, Wade Simmons, who originally tracked the problem down to V8’s Math.random() implementation, and the entire Betable Engineering team, who put up with me ranting about random numbers for two weeks while I wrote this post.