SVM from scratch using Quadratic Programming

4 min readMay 24, 2020

Introduction

The focus of the article will be on the implementation of SVMs for binary classification over the mathematics involved. Here is an article which explains the math of SVMs for interested readers.

Optimization Problem — Hard Margin

The optimization problem of SVMs when using hard margin (there should be no misclassifcations) can be represented as above where w are the weights and b is the bias meant to be learned. This can be solved using any quadratic programming solver, but we will transform this constrained problem into its dual using Lagrange multipliers as below:

Here lambda are the Lagrange multipliers. So we have transformed our optimization problem from a problem in terms of w to a problem in terms of lambda.

Optimization Problem — Soft Margin

The optimization problem becomes as above for the case of soft margin (certain amount of misclassifications are allowed regulated by slack variable C) We again transform this problem using Lagrange multipliers and get the dual as :

Soft margin can be also solved using gradient descent by minimizing the loss function below :

Quadratic Programming using CVXOPT

CVXOPT is an optimization library in python. We can use qp solver of CVXOPT to solve quadratic problems like our SVM optimization problem. We just need to create matrices P, q, A, G, h and initialize a value for b.

Comparing our optimization problems to the figure above, we can easily deduce the values of these matrices.

Kernels

Solving the soft margin problem seems so simple with gradient descent, whats the use of using such a complicated method such as quadratic programming? Kernels hold the answer, specifically the kernel trick. To apply the kernel trick, we just need to create a kernel matrix and replace the values of x with this matrix and voi la we have transformed our data to a new, more expressive feature space without actually transforming the data.

Below are examples of linear, polynomial and Gaussian kernels:

Lets get coding !!!

It is imperative for the target labels to be -1 and 1 for the code to work. Below are the libraries required.

First lets look at the hard margin problem :

For soft margin :

Lets now solve the quadratic problem and get the values of the support vectors

Since we are using a kernel, we can calculate the value of the bias b, but we cannot calculate the values for the weights since we need the complete transformed data to obtain them. Lets calculate the bias as given below :

Surprisingly, to make predictions we only need the support vectors, the Lagrange multipliers and the bias and not the weights. Here is the code for making predictions :

The above code uses Linear kernel, but works with all types of kernels.

Conclusion

From the code we can get a few interesting insights. QP solver of CVXOPT is blazing fast which makes this SVM as fast. SVMs only require the support vectors and their corresponding Lagrange multipliers to make predictions which make them very memory efficient. I encourage readers to Support vector classifier of sklearn with this classifier and experiment with the code to make it faster.