Pattern Recognition — Chapter 2: Normal distribution

6 min readAug 28, 2017

1. Normal distribution

Normal distribution, Gaussian distribution, is the most well-known distributions and can be applicable to many problems of other types of distribution. According to Central Limit Theorem, random variable generated independently, can be considered normally distributed (bell shape) even the original variables are not normal distributions.

Denoted as:

(*Source: https://en.wikipedia.org/wiki/Normal_distribution)

We focus to the term of exponent in the above equation:

The numerator is Mahalanobis distance, used to measure the distance from a point to the pre-defined distribution or intuitively, the closer the point is to the center of mass, the more likely this point belongs to this distribution. The denominator is standard deviation, used to normalize Mahalanobis distance.

2. Multivariate normal densities

The above distribution is uni-variate normal distribution. Now we consider more than one random variables.

p(x1,x2,…,xn)

We will consider two cases: dependent and independent random variables. Firstly, independent random variable (see Chapter 1):

p(x1,x2,…,xn)=p(x1) p(x2) … p(xn)

As Chapter 1, the joint probability of independent random variables p(x1,x2,…,xn) equals the product of the probabilities of each random variable p(xi). If p(xn) is normally distributed, then:

The below figures illustrate the normal distribution of two independent random variables:

(*Source: http://ballistipedia.com/index.php?title=Closed_Form_Precision)

In the case of dependent random variables, we cannot apply the above equation:

To match the contour of distribution modelling with scatter points, we need to consider the level of statistical independence, which is covariance matrix:

As a result, covariance matrix shapes the contour of distribution modelling:

(*Source: https://blogs.sas.com/content/iml/files/2012/07/mvnormalpdf.png)

Actually, independence is a special case of dependence, which has covariance matrix being diagonal matrix.

3. Mixture densities

We reviewed multivariate distribution, but it is uni-modal. The question is how can we do with two or more mixture mode.

We need to combine uni-modal distribution into mixture densities. As a result, uni-modal distribution only has a single local maxima, while multi-modal distribution or mixture densities supplies several local maxima. Now, the mixture densities equals the combination of the distributions of each mode or the joint probabilities of the features xi and the mode k.

Because the distribution of the features xi depends on the mode k, so the joint probability equals conditional probability.

The above distribution means the sum of the conditional distributions (the probability of features, given the mode k) are weighted. We can assume that conditional distributions are normal distributions. K-means is an example of mixture densities.

4. The product of Gaussian distributions

The product of Gaussian distributions is a Gaussian distribution. We will consider the product of uni- and multi-variate Gaussian distributions.

4.1. The product of two uni-variate Gaussian distribution

We have two Gaussian distributions f(x) and g(x):

and

The product is:

We will focus on and expand exponent:

In general Gaussian distribution equation, there is also the quadratic form of x:

Equation (2)

Because the coefficient of x² of (2) is 1, so we will modify the coefficient in exponent of (1) by dividing both numerator and denominator by:

We receive the exponent of (1):

Now, we compare two exponents of (1) and (2):

As the above figure, the potential of mean and variance of the product are:

But it is not normal distribution because:

The last element of exponent is the square of the potential mean of (1). In addition, the term of scaling is not equal the potential variance of (1), so the normalization does not ensure the sum of all the probabilities to equal 1. That is why it is not PDF (probability density function).

To address the first above problem, we add more the following term to numerator:

Because of the lack of the square of the potential mean of (1), so the additional term has the square of the potential mean. To ensure the additional term not to affect to the value of (1), we subtract the square of the potential mean by itself, which make the term be equal zero.

After simplifying:

As a result, (1) is:

Equation (3)

The second problem can be solved by the equation:

From the above equation, we obtain:

Inserting to (3):

The red circle is scaling factor or the term of normalization. We can simplify this term as a factor:

The result of the product of two distributions is Gaussian PDF. For the purpose of convenience, we re-write mean, variance and normalization factor as:

4.2. The product of n uni-variate Gaussian distributions

To product n distributions, we can produce pairs of distributions iteratively. For example, the product of 4 distributions:

Where N(…, …) is normal distribution, S1…2 is normalization factor of the product of the first and the second distributions, S(1…2)3 is normalization factor of the product of the first two and the third distributions.

As the above equation, we can apply the product of two distributions to the product of n distributions. Again, we will receive the result of the product of n distributions being Gaussian distribution. The mean and variance of distribution are:

and