The Origins of the Normal Distribution

William Sundstrom
7 min readAug 18, 2019

--

I have always enjoyed learning about the history of math and science as well as the material itself. The types of people who come up with truly groundbreaking original thoughts are also often people with all sorts of interesting quirks, and hearing their stories alongside (sometimes quite dry) lectures about theory has really helped me remember these characters. For example, the Erdős–Ko–Rado theorem is very important but, on its surface, not particularly exciting; however, any mention of it reminds me of the stories about the prolific, itinerant, amphetamine-fueled Paul Erdős. In particular, the following story (among many others from The Man Who Loved Only Numbers, by Paul Hoffman) has always stuck with me:

In 1979, Graham bet Erdös $500 that he couldn’t stop taking amphetamines for a month. Erdös accepted the challenge, and went cold turkey for thirty days. After Graham paid up — and wrote the $500 off as a business expense — Erdös said, “You’ve showed me I’m not an addict. But I didn’t get any work done. I’d get up in the morning and stare at a blank piece of paper. I’d have no ideas, just like an ordinary person. You’ve set mathematics back a month.”

So now that I’m reviewing the foundations of probability and statistics, I thought it would be fun to look into the history of the field and try to put everything in context.

The Beginning

The formal study of probability started in the seventeenth century in France. Antoine Gombaud, aka Chevalier de Méré, was a writer and a prominent thinker in the salon scene of that time. An avid gambler, de Méré had come upon a problem that had been known since medieval times, the problem of points:

Suppose two players agree to play a certain number of games, say a best-of-seven series, and are interrupted before they can finish. How should the stake be divided among them if, say, one has won three games and the other has won one? Source

De Méré reached out to another salon for help, and in doing so kicked off the study of probability.

To help him solve this problem, de Méré enlisted Blaise Pascal and Pierre de Fermat, who were already well-known mathematicians. In their discussions of the problem, Pascal came up with what we now call the combination function and a triangle to help easily compute its values:

Pascal’s Triangle from Wikipedia

Pascal also understood that in addition to its usefulness in settling gambling disputes, the triangle could enable one to calculate coefficients for the binomial formula, the uses of which were just starting to be understood. (Speaking of characters, after creating the foundations of the formal study of probability and coming up with the idea of expected value at the age of thirty-one, Pascal had a religious experience and gave up most of his work in mathematics.)

Around fifty years later, Willem ’s Gravesande, a Dutch mathematician, was working on questions about death rates in London and came up with the following answer to one particular question:

From https://www.maa.org/sites/default/files/pdf/upload_library/22/Allendoerfer/stahl96.pdf

Computing this number by hand would clearly be a monumental task, and as the use and understanding of the binomial distribution became more widespread, these computationally difficult problems became more common. The advent of computers was still quite a way off, so how could anyone compute these values? Enter Abraham de Moivre.

De Moivre’s Approximation

In 1733 Abraham de Moivre came up with an easier-to-compute approximation that is still taught in introductory statistics courses as the normal approximation of the binomial distribution.

Source.

So here we have the familiar equation for the cumulative distribution function (CDF) of the normal distribution (except for some constants). The really amazing thing about this is that the normal distribution would not be truly discovered for approximately another seventy years! Monsieur de Moivre’s equation did not have any theoretical significance; it was simply a way to approximate troublesome binomial coefficients.

The Gaussian

A ten Deutsche mark note featuring Gauss and the Gaussian distribution. Source.

In 1801 astronomer Giuseppe Piazzi discovered the dwarf planet Ceres, which at the time was thought to be a possible new planet. Unfortunately, after around a month, Ceres passed behind the sun. This presented a problem, as there hadn’t been enough data collected in that month of observation to suggest where Ceres would reappear. Indeed, after the day Ceres was supposed to reappear, no one could locate it. Twenty-four-year-old Carl Friedrich Gauss came to the rescue with a solution to the problem. His approach involved considering the errors in the measurements of Ceres’ position as follows:

  1. Small errors are more likely than large ones.
  2. For any number E, having error +E is equally as likely as having error -E.
  3. If multiple measurements were taken, the most likely value for the real answer would be the mean of the measurements.

From these simple assumptions, Gauss came up with the probability density function (PDF) of the normal distribution. In his proof, Gauss proposed a new function:

Source

Relying on a method that we now know as the least squares approximation, he used the fact that the mean would be the most likely outcome along with the conditions imposed by the first two assumptions (ϕ’ (x)=0 at x=0 and
ϕ(-x)=ϕ(x) ) to do some clever algebra and conclude that f(x) must have the following form:

The full proof is in this wonderful article: Source.

A simple integral of this equation gives us the PDF of the normal distribution, with the normalizing constants coming from some other simple calculations:

Source.

Laplace and the Central Limit Theorem

https://library.si.edu/image-gallery/72851

Pierre Simon Laplace was another French mathematician working with probability at around the same time. Laplace was interested in the concept of a mean and, specifically, trying to calculate error when undertaking repeated experiments. His research into this topic had brought him close to discovering the equation of the normal distribution. Once Gauss’ results became known to Laplace, he went back to this work and showed that the distribution of the error when one repeatedly sampled and calculated the mean was indeed a Gaussian distribution. Laplace’s further explorations and proofs of this fact, combined with his earlier writings about an inductively reasoned theory of probability, are the foundation of what we now call Bayesian reasoning.

Bringing It All Together

An animation showing how repeated errors (or Bernoulli trials with p=1/2) will converge to a normal distribution. Source: https://www.vivaxsolutions.com/maths/statslab.aspx

To sum things up, the normal, or Gaussian, distribution that we know today came into existence from three completely different directions: Laplace’s investigations of error when sampling the mean, Gauss’ observations about measurement error, and de Moivre’s attempt to approximate the binomial distribution with a very large N . These different questions all converged to the same answer: the normal distribution. I love these stories in math and science in which people from various specialties make discoveries about inherent parts of our universe that all come together.

https://en.wikipedia.org/wiki/Carl_Friedrich_GaussSource: https://www.stereogum.com/2025203/the-number-ones-the-beatles-come-together/franchises/the-number-ones/

--

--