Normal Distribution: Probability Density Function Derivation

6 min readAug 3, 2020

Introduction

We learnt about basic characteristics of normal distribution in Part I of this series. You may visit the video tutorial for this article as follows:

Part II: Normal Distribution

In this article, we look at the probability density function (PDF) for the distribution and derive it. We denote the PDF of a normal distribution given μ and σ as p(x|μ, σ) or sometimes as p(x) for brevity.

Probability Density Function

It looks as follows for one and two variables respectively.

Derivation

Setup

We begin with a thought experiment where there is a 2-D Cartesian plane and we want to aim darts at (0,0). While trying to get perfect aim, some errors occur randomly and the darts strike away from the origin. We want to find out a function that will emulate such behavior.

With this setup, we can proceed to think about our density function in terms of:

(1) r and θ : Radial distance form origin and orientation in 2-D space.

(2) Two orthogonal dimensions : We can assume x-axis and y-axis as the two orthogonal dimensions.

**Denoting a point in polar and cartesian coordinates**

Assumptions

(1) The errors do not depend on the orientation of the 2-D space, i.e., the density function is rotationally invariant. Therefore, we can say:

(2) The two orthogonal directions are independent of each other, i.e., the coordinate along x-axis gives no information about the coordinate in y-axis and vice-versa for the position of the dart.

(3) The errors are more likely to occur close to the origin than far away, i.e., the darts are more likely to land close to origin than far from it.

**Errors are closer to the mean (center) than far away**

Proof

Since both assumptions (1) and (2) define the same density function, we have:

We also know that:

Therefore, we can rewrite:

Substituting x=0 in (Eq. 1) and denoting f(0) as λ, we obtain:

Using (Eq. 1) and (Eq. 2), we can write:

Dividing (Eq. 3) by λ², we can write:

Now we define a function g(t) such that f(t) = λ g(t) and thus we have:

Stepping aside and inspecting function g(t)

We take a moment over here and search our toolbox to see which family of functions might satisfy (Eq. 5).

We wonder, if instead of (Eq. 5), we had something like h(x+y) = h(x)h(y), we could have worked out h(t) = exp(At). This is because then we would have LHS = exp(Ax+Ay) and RHS = exp(Ax)exp(Ay) = exp(Ax+Ay) and thus LHS = RHS.

But right now, we have something we need to work on some more. After examining carefully, we notice that we could, in fact, assign:

This works out well and satisfies (Eq. 5). We now proceed to work on the constant A. We use our assumption (3) over here that states that the darts are more likely to land close to origin than far from it. For the assumption to hold true, A should be a negative number which will make function g taper down as we move away from origin. Thus we can replace A with -h².

You can also see this with the following plots where A=1 and A=-1 and why we need to be a negative number.

So, now we have:

The next part is to determine the values of λ and h.

Part I: Determine the value of h

Since f(x) is a probability density function, therefore, the area under the curve should sum up to 1, i.e.,

We replace hx with u and thus rewrite (Eq. 8) as:

The integral in (Eq. 9) is a well known integral: Gaussian integral that requires multivariable calculus to solve and we will directly substitute it’s value here. Thus (Eq. 9) reduces to:

Therefore, now we have:

Part II: Determine the value of λ

To determine the value of λ, we use the definition of variance for the distribution. We know that in our case, we have E[x] = μ = 0.

where p(x) is the probability density function for x and thus, in our case, p(x)=f(x). So, we have:

The integral in (Eq. 12) can be computed by parts and therefore, we define:

Using the above definitions, we obtain:

We observe that the first term in (Eq. 13) is an odd function and therefore it’s value over a symmetric interval results to 0. Therefore, all that is left is to compute the integral in second term of (Eq. 13). Rearranging a little in the second term, we find that the integral in second term is in fact the PDF of the distribution which has an area of 1 and thus we have: