Support Vector Machine (SVM) With Python

Abhijeet Pujara
Analytics Vidhya
6 min readApr 27, 2020

--

This article covers eight parts:

  1. What is the support vector machine?
  2. About Hyperplane.
  3. How does it work in linear separable
  4. How does it work in linear non-separable
  5. Advantages of Support Vector Machine (SVM)
  6. Disadvantages of Support Vector Machine (SVM)
  7. Applications of Support Vector Machine (SVM)
  8. Support Vector Machine with Python (with code)

What is the support vector machine Algorithm?

Support Vector Machine (SVM) is a supervised machine learning algorithm. That can be employed for both classification and regression purposes.

It uses a technique called the kernel trick to transform your data and then based on these transformations. It finds an optimal boundary between the possible outputs. Simply put, it does some extremely complex data transformations, then figures out how to separate your data based on the labels or outputs you’ve defined.

The learning of the hyperplane in linear SVM is done by transforming the problem using some linear algebra, which is out of the scope of this introduction to SVM.

A powerful insight is that the linear SVM can be rephrased using the inner product of any two given observations, rather than the observations themselves. The inner product between two vectors is the sum of the multiplication of each pair of input values

They are especially effective at classification, numeral prediction, and pattern recognition tasks

HYPERPLANE

A hyperplane in an n-dimensional Euclidean space is a flat, n-1 dimensional subset of that space that divides the space into two disconnected parts.

HYPERPLANE 2D & 3D

Example

The line is our one-dimensional Euclidean space(i.e., let’s say our datasets lie on a line). Now pick a point on the line. This point divides the line into two parts. The line has one dimension, while the point has 0 dimensions. So a point is a hyperplane of the line.

For two dimensions, we saw that the separating line was the hyperplane. Similarly, for three dimensions, a plane with two dimensions divides the 3d space into two parts and thus acts as a hyperplane. Thus for a space of n dimensions, we have a hyperplane of n-1 dimensions separating it into two parts

Hyperplane can be written mathematically a 2-dimensional
For a 2-dimensional

linear separable cases

We assume a binary classification. The intuition of SVM is to put a hyperplane in the middle of the two classes so that the distance to the nearest positive or negative example is maximized.

The SVM discriminant function has the form

f(x) = w >x + b

linear separable

linear non-separable

In the linearly separable case, SVM is trying to find the hyperplane that maximizes the Margin, with the condition that both classes are classified correctly. But in reality, datasets are probably never linearly separable, so the condition of 100% correctly classified by a hyperplane will never be met.

linear non-separable

SVM address non-linearly separable cases by introducing two concepts: Soft Margin and Kernel Tricks.

  • Soft Margin: try to find a line to separate, but tolerate one or few misclassified dots
  • Kernel Trick: try to find a non-linear decision boundary

Recall that in the linearly separable (or soft margin) case, the SVM algorithm works by finding a separation boundary that maximizes the Margin, which is the distance between the boundary and the points closest to it. The distance here is the usual straight line distance between the boundary and the closest point(s). This is called the Euclidean distance in honor of the great geometer of antiquity. The point to note is that this process results in a separation boundary that is a straight line, which, as Figure 5 illustrates, does not always work. In fact, in most cases, it won’t.

Advantages of Support Vector Machine (SVM)

  1. SVM can be used to solve both classification and regression problems. SVM is used for classification problems while SVR (Support Vector Regression) is used for regression problems.
  2. Handles non-linear data efficiently: SVM can efficiently handle non-linear data using the Kernel trick.
  3. It uses a subset of training points in the decision function (called support vectors), so it is also memory efficient.
  4. SVM is suited for extreme case binary classification.
  5. A small change to the data does not greatly affect the hyperplane and hence the SVM. So the SVM model is stable.
  6. SVM is more effective in high dimensional spaces.
  7. SVM’s are very good when we have no idea about the data. or we can say., If you are working with the unbalanced data then SVM is good.
  8. It is effective in cases where the number of dimensions is greater than the number of samples.
  9. The hyperplane is affected by only the support vectors thus outliers have less impact.
  10. Best algorithm when classes are separable.

Disadvantages of Support Vector Machine (SVM)

  1. Choosing an appropriate Kernel function (to handle the non-linear data) is not an easy task. It could be tricky and complex. In case of using a high dimension Kernel, you might generate too many support vectors which reduce the training speed drastically.
  2. For a larger dataset, it requires a large amount of time to process.
  3. Algorithmic complexity and memory requirements of SVM are very high. You need a lot of memory since you have to store all the support vectors in the memory and this number grows abruptly with the training dataset size.
  4. As the support vector classifier works by putting data points, above and below the classifying hyperplane there is no probabilistic explanation for the classification.
  5. It does not perform well in case of overlapped classes.
  6. The SVM model is difficult to understand and interpret by human beings, unlike Decision Trees.
  7. SVM does not perform very well when the data set has more noise i.e. target classes are overlapping.

Applications of Support Vector Machine (SVM)

  1. Almost all the applications where ANN is used.
  2. Handwriting Recognition.
  3. Breast Cancer Diagnosis.
  4. Text and hypertext categorization.
  5. Detecting Steganography in digital images.

SVM With Python

df[df.target==1].head()

df[df.target==2].head()

df[‘flower_name’] =df.target.apply(lambda x: iris.target_names[x])
df.head()

plt.xlabel(‘Sepal Length’)
plt.ylabel(‘Sepal Width’)
plt.scatter(df0[‘sepal length (cm)’], df0[‘sepal width (cm)’],color=”green”,marker=’+’)
plt.scatter(df1[‘sepal length (cm)’], df1[‘sepal width (cm)’],color=”blue”,marker=’.’)

Happy Learning !!!

Happy coding :)

And Don’t forget to clap clap clap…

--

--

Abhijeet Pujara
Analytics Vidhya

Data Science enthusiast. A flexible professional who enjoys learning new skills and quickly adapts to organizational changes.