PCA Part 1: The Lagrange Multipliers.
The PCA algorithm is one of the most important in terms of dimensionality reduction but really understand the process requires some kind of math preparation, so I thought it best to address this issue in a series of articles and this is part 1.
What is expected to you to know for this topic:
- Foundation on multivariable calculus
The first tool that we will focus on is the Lagrange multipliers.
Lagrange multipliers try to solve the following issue: How can we find local maximum or minimum based on some constraints? To make it more concrete, let’s take some example:
Suppose that we want to maximize the perimeter of a rectangle with a constraint that the coordinates of that rectangle live in a unit circle, i.e
Let’s define a function that represents the objective function that we want to do this optimization:
We can write our constraint statement in a form of a function:
Now that we stated the problem properly, we can manage to think about how to optimize this problem.
The intuition of Lagrange multipliers is simple yet very powerful. Given a function that we want to optimize f and a constraint function g we need to find where the gradient of f and g gradient are alined(this point will be the local maximum or minimum) and the scaling factor is called Lagrange multiplier.
*Disclaimer*: In this part, I heavily encourage you to seek some video (Khan Academy or 3Blue1Brown, for example) about this graphic interpretation because I admit that is quite difficult to understand by only looking at this graph if you never saw any of this before.
Applying this concept to our example, we need to solve this set of equations:
Before actually solving these equations, it’s possible to write down all equations in a more compact and vectorial form (This is not helpful when solving the equation by hand, but computers deal nicely with vector format equations).
Where b is the constant constraint on g.
Why this definition is useful? The answer is on the equation below:
If we unpack this equation we see that this represents the same as the set of equations (6), (7), and (8) but in a much more elegant, compact, and machine-friendly form. Calculating the gradient, we have:
And that’s it, we found the solutions for the constrained optimization problem. Despite this example is very simple, it yet it holds essential ideas behind the Lagrange multipliers and their usage.
A little bit of spoiler for what will come next:
*Disclaimer*: Don’t bother too much with the last part, it will become very clear when we further advance in the next topics.
In the context of PCA, Lagrange multipliers are very useful when we will try to optimize the variance of projections over vectors and limiting the solutions to a set of specific constraints.
For PCA, calculating Lagrange multipliers fits the responsibility of calculating the local maximum of:
Where S is the covariance matrix and u is the vector that we need to optimize on.