Support Vector Machine (SVM)

Rishabh Jain
5 min readJun 17, 2020

--

The support vector machine is one of the most important algorithms used in supervised learning techniques. It is based on finding the optimum predicted value/line in the dataset.

What is Support Vector Machine?

A Support Vector Machine (SVM) is a discriminative classifier formally defined by a separating hyperplane. In other words, given labeled training data (supervised learning), the algorithm outputs an optimal hyperplane which categorizes new examples. In two dimensional space, this hyperplane is a line dividing a plane into two parts wherein each class lay on either side.

We use multiple parameters in SVM to optimize the predicted line. What we actually do in we find the best-predicted line which divides the two categories and by creating two hyperplane line around the original line and then find the appropriate line by allowing some points to enter the plane. The hyperplane is decided by creating a line from the minimum distance pts or can also be decided by considering farthest points.

SVM also classifies the non-linear data easily by we can say it is expanding it to a new dimension without doing extra work and after classify comes in an original way.

SVM on Non-Linear Data

Mathematical Implementation

  • Equation of a line.
  • This means taking the dot product of our vectors u and w and checking whether it is greater than or equal to some constant, c.
  • For positive category and negative category.
  • For convenience, we can introduce a new variable y such that y = 1 for positive samples and y = -1 for negative samples.
  • Since our initial goal was to establish a margin that is as wide as possible, we must determine a way to express the distance between the boundaries of the margin.
  • y ≥ 1 for positive samples and y ≤ -1, we can do some algebraic reduction.
  • To maximize equation 8, a function with constraints, we must use LaGrange multipliers.
  • First, we differentiate L with respect to w and find that the vector w is a linear sum of all or some of the samples.
  • Differentiating L with respect to b gives:
  • Plugging our value for w in equation 10 into equation 9, we end up with equation 12.
  • Further reduction gives us
  • If the result for equation 14 is ≥ 0, our sample is in the + class.

Kernel functions can provide us with the dot product of two vectors in a new space without us needing to know the transformation into that space. The most simple kernel function is the linear kernel, shown as equation 15.

Code

Much of the code shown in figure 6 is housekeeping to correctly plot our result, but some areas of interest exist in lines 11 and 26. Line 11 invokes Scikit-learn’s SVM tool, which takes a kernel type and penalty parameter C of the error term as parameters. Line 26 feeds our sample data to the SVM decision function.

Much of the code shown in figure 6 is housekeeping to correctly plot our result, but some areas of interest exist in lines 11 and 26. Line 11 invokes Scikit-learn’s SVM tool, which takes a kernel type and penalty parameter C of the error term as parameters. Line 26 feeds our sample data to the SVM decision function.

Figure 8 shows the result of running the code with the radial basis function as the kernel. The radial basis function separates the two classes of data by setting the samples that represent the outer bounds of each class as the support vectors.

— — — — — — — — — — — — THANK YOU — — — — — — — —

--

--