Simulation

How to generate Gaussian samples

Part 2: Box-Muller transform

Khanh Nguyen

Published in

MTI Technology

8 min readMay 1, 2020

To see the code I wrote for this project, you can check out its Github repo
For other parts of the project: part 1, part 2, part 3

Background

In part 1 of this project, I’ve shown how to generate Gaussian samples using the common technique of inversion sampling:

First, we sample from the uniform distribution between 0 and 1 — green points in the below animation. These uniform samples represent the cumulative probabilities of a Gaussian distribution i.e. the area under the distribution to the left of some point.
Next, we apply the inverse Gaussian cumulative distribution function (CDF) to these uniform samples. This will transform them to our desired Gaussian samples — blue points in the below animation.

The bad news: the Gaussian inverse CDF is not well-defined, so we have to approximate that function, and the simple Taylor series was used. However, this only samples accurately near the Gaussian mean, and under-samples more extreme values at both ends of the distribution.

A: left-side area (uniformly sampled). **Tᵢ:** the iᵗʰ-degree Taylor series approximation of the inverse Gaussian CDF: √2 Erfinv(2A-1)

Therefore, in this part of the project, we will investigate a more “direct” sampling method that does not depend on the approximation of the Gaussian inverse CDF. This method is called the Box-Muller transform, after the two mathematicians who invented the method in 1958: the British George E. P. Box and the American Mervin E. Muller.

**Left:** George E. P. Box (1919–2013). **Right:** Mervin E. Muller (1928–2018)

How does the Box-Muller transform work?

For this project, my goal is to generate Gaussian samples in two dimensions i.e. generating samples whose x and y coordinates are independent standard normals (Gaussian with zero mean and standard deviation of 1). In part 1, I used the inverse Gaussian CDF to generate separately the x and y coordinates from their respective uniform samples (U₁ and U₂):

Generate x and y coordinates from uniform samples (U₁ and U₂) using inverse transform sampling

For the Box-Muller transform, I will also start with the same two uniform samples. However, I will transform these uniform samples into the x and y coordinates using much simpler formulas:

Generate x and y coordinates from uniform samples (U₁ and U₂) using Box-Muller transform

Despite the strong coupling between U₁ and U₂ in each of the two formulas above, the generated x and y coordinates, which are both standard Gaussians, are still surprisingly independent from each other! In the derivation of the Box-Muller transform that follows, I will demonstrate why this is indeed this case.

Derivation of Box-Muller transform

We know that for any two independent random variables x and y, the joint probability density f(x, y) is simply the product of the individual density functions: f(x) and f(y). Furthermore, Pythagorus theorem allows us to combine the x² and y² term in each of the Gaussian density function. This results in the -r²/2 term in the exponential of the joint distribution, where r is the distance from the origin to the 2-D Gaussian sample.

To make it simpler, we then define a variable s that is equal to r²/2. In other words, s is simply the half of squared distance from the origin of our Gaussian sample. Written this way, the joint PDF is simply the product of a constant (1/2π) and an exponential (e⁻ˢ).

This is where it gets interesting, since:

We can think of the 1/2π constant as a uniform distribution between 0 and 2π: Unif(0, 2π).
On the other hand, the exponential term is nothing but an exponential distribution with the parameter λ=1: Expo(1).
Furthermore, since these two terms are multiplied together in the joint distribution, the uniform and the exponential distributions are also independent from each other (see accompanying image).

Geometric interpretation

Geometrically, the uniform distribution between 0 and 2π represents the angle θ that our 2-D Gaussian sample makes with the positive x-axis. This aligns with our intuition about Gaussian samples in 2-D: they are sampled with equal chance at any angle between 0 and 360 degrees, hence the characteristic round “blob” that they often make.

In contrast, these blobs are always concentrated near the origin, which fits well with the fact that s (half of squared distance from origin) follows an exponential distribution: you are more likely to encounter a sample as you move closer to the origin.

This leads us to the underlying principle of the Box-Muller transform:

Instead of sampling independent Gaussians for the x and y coordinates of the samples, we sample their independent uniform angles and exponential half-of-squared distances to origin.

Caveat for changing variables

Technically, when changing random variables of a joint distribution to different ones, we need to multiply the new distribution with the Jacobian of the variable transformation (see here for more details). This ensures that the new distribution is still a valid probability density function i.e. that the “area” under the distribution is still 1. Thankfully, the Jacobian of the transformation from {s, θ} to {x, y} is 1, which greatly simplifies our problem.

Sampling 2-D Gaussians with Box-Muller transform

Once we transform the original problem into sampling a uniform angle and an exponential half-of-squared-distance, the remaining steps are much easier. This is because both the uniform and the exponential distribution have very simple inverse CDF, as outlined in the table below. As a result, we can apply these inverse CDFs to any uniform sample from 0 to 1 to transform it to our desired uniform and exponential sample (see part 1 of the project for an explanation of inverse transform sampling).

The animation below walks through all the steps needed to generate the Gaussian samples in 2-D:

Sample from two separate uniform sample generators (U₁ and U₂)
Apply the inverse CDF of the exponential distribution with λ=1 to U₁ to get half of squared distance from origin of the sample (s). For simplicity, the inverse CDF is modified from -ln(1-U₁) to -ln(U₁). As a result, this modified function is technically no longer the inverse CDF of the exponential, but it will still output samples that are exponentially distributed. This is because U₁ and 1-U₁ are both uniform samples between 0 and 1.
Apply the inverse CDF of the uniform distribution between 0 and 2π to U₂ to get the angle of the sample (θ)
Calculate the distance r from origin for each sample from its half of squared distance: r=√(2s).
Lastly, the x and y coordinates are found by simple trigonometry: r cos(θ) and r sin(θ) respectively. Combining the formulas from step 2 to 5 gives us the formulas seen earlier to transform U₁ and U₂ into x and y.

Coding-wise, generating Gaussian samples using the Box-Muller transform can’t be any easier, as seen in this Python implementation for the 5 steps mentioned above to generate 1000 Gaussian samples in 2-D:

Note that the relevant NumPy math functions — np.log, np.sqrt, np.sin, and np.cos — are all vectorized to work on NumPy arrays (such as u1s and u2s). As a result, the 1000 Gaussian samples, whose coordinates belong to x1s and y1s, can be generated all at the same time.

Box-Muller transform in action: NumPy’s random.standard_normal

Looking under NumPy’s source code, it appears that the np.random.standard_normal function to generate standard Gaussians indeed uses the Box-Muller transform. This is because it refers to the C function legacy_gauss, which uses a method based on this type of transform:

numpy/numpy/random/src/legacy/legacy-distributions.c

There are 3 things to note for this implementation:

It implements the polar form of the Box-Muller transform, which is also called the Marsaglia polar method. This method uses some clever geometry to avoid computing the sine and cosine function of the traditional Box-Muller transform, thus makes it faster. I’m planning to implement this polar method in the near future, which I will add to this blog post.
The NumPy’s implementation discards one of the two generated Gaussians from the Box-Muller transform. As a result, only one Gaussian sample is returned, hence the return f * x2 line in legacy_gauss. Meanwhile, the other Gaussian sample f * x1 is saved for the next run of the function.
NumPy’s newer random number generator class, the appropriately-named Generator, does not use the Box-Muller transform anymore. Instead, according to this changelog, it uses the Ziggurat algorithm (wiki, implementation) to generate the samples, which are much faster than both the Box-Muller transform or the Marsaglia polar method.

Comparison to R’s rnorm

As detailed in part 1, R uses inversion transform sampling to generate its Gaussian samples, as opposed to the Box-Muller transform used by Python’s NumPy. I have no authority to comment on the relative accuracy or computational efficiency of either approach. However, in terms of elegance and readability, Python’s approach is clearly superior to the R’s hard-coded mess in the previous part. This is yet another proof that Python is clearly the better programming language #kidding #notreally.

**Top:** implementation of R’s qnorm. **Bottom:** implementation of NumPy’s random.standard_normal

Nevertheless, in the next part of the project, I will demonstrate another technique to generate Gaussian samples that are even simpler than the Box-Muller, and that every statistics student are already familiar with! Please stay tuned.

Reference

All the derivation for the Box-Muller transform in this blog post is taken from the “Introduction to Probability” book by Blitzstein and Hwang (available for free online, page 373–374).