Completing the Square
A useful technique when dealing with multivariate Gaussian distributions
This post assumes some familiarity with the Gaussian (also called Normal) distribution and matrix operations.
In my article called Maths Behind Machine Learning, I briefly touched on the idea of Gaussian distributions. The famous Gaussian distribution is so ubiquitous in applications of statistics that having some tools to see how it shows its face in some expressions is very useful. One of these tools is by completing the square.
What is the problem statement?
Let’s take the density function p(y) below where y is a vector of n x 1 dimension.
We can show (1) as a multivariate Gaussian distribution with mean as shown in (2):
and covariance as shown in (3):
How do we show this?
We do this in two steps.
First, we will show what a Gaussian distribution probability density function with mean as per expression (2), and covariance as per expression (3) looks like.
Second, we will show what to do to p(y) in (1) to show this is indeed a Gaussian distribution.
Executing Step 1
To recap, in step 1, we will show what a Gaussian distribution probability density function with mean as per expression (2), and covariance as per expression (3) looks like.
For our purposes, we could disregard the normalising factor. What matters really is the expression inside the exponential function or sometimes called the Mahalanobis distance as shown in (4)
We solve (4) and get the following:
If we look closely at (5), we can see the first two terms are something we see in expression (1). We just have to take out the factor 2. This is illustrated better in the boxed expression below.
Executing Step 2
With step 1 in mind, we elaborate what we do to p(y) to show this is indeed a Gaussian distribution.
Let’s look at (1) again or at least just the expression inside the exponential function.
We add and subtract a factor that emerged in (5) which is
in order to get the following calculations
Examine expression (10) and (4) and it can bee seen that this is a Gaussian distribution. The last factor with b are constants (i.e. does not depend on y) so it won’t matter in the context of finding out the form of the probability density function.
So what is the point?
It is very useful to recognise a probability density function of this form
because it can be shown as a Gaussian distribution with mean
and covariance
We like Gaussian distributions also due to the great advantage that many calculations dealing with it become easier.
So there we have it, an added item in our math tool kit!