Mathematical Underpinnings: SVMs + Lagrange Multipliers
This article is going to be dedicated to Lagrange multipliers and I’ll try to give an intuitive geometrical understanding and then in another article we’ll see how this ties in to SVM’s.
Firstly, here is a little excerpt about Lagrange taken from Wikipedia:
Okay where were we?
We want to maximise the margin (equation 1, let’s call it f), subject to all of the points being correctly classified (equation 2, let’s call it g).
Whilst we generalise, let’s consider functions f(x) and g(x), we’ll bring it back to the SVM optimisation problem later.
We can re-arrange our constraint inequality to
Like this, we have two scenarios. One in which g(x)=0 and g(x)>0. The latter will influence the first, which we’ll dive into later, but first we’ll derive the Lagrange function using a geometric interpretation. First let us consider the case where g(x)=0.
CASE 1
Consider the vector x plotted in D-dimensional space.
The red line shows the contour where g(x) = 0. We want to find a point on the constraint surface which maximises f(x). The maximum is found by taking the derivative and setting to zero. What we will show here is that there are two anti-parallel vectors
such that there exists λ that satisfies
CASE 2
These final 3 equations are known as the the Karush-Kuhn-Tucker (KKT) conditions and will be pivotal for understanding the generalisability of SVMs.
Reference: Bishop — Machine Learning and Pattern Recognition. Appendix E.