SVM AND RANDOM FOREST: A Case Study

Pragati21
3 min readJan 27, 2020

--

Fortunately, with libraries such as Scikit Learn, it’s now easy to study structured or unstructured data using scientific methods, algorithms and systems to extract knowledge.

Here we are going to discuss two of the most popular algorithms — Support Vector Machines abbreviated as SVMs and Random Forests.

SUPPORT VECTOR MACHINES

Support Vector Machine is a supervised learning model which can be used for both classification or regression challenges. However, it is mostly used in classification problems where the data is sparse (easy to classify). We perform classification by finding the hyper-plane that differentiates between the two classes very well .

RANDOM FOREST

Random Forest is also one of the most used algorithms in machine learning. It can be used for both classification and regression tasks. The “forest” it builds, is an ensemble of decision trees, usually trained with the “bagging” method. The general idea of the bagging method is to create a combination of learning models which improves the overall result. Basically, Random forest uses multiple decision trees and merges them together to get an accurate and stable prediction.

In this article we are going to discuss SVM VS Random forests by taking an example of Iris dataset (data of flowers). Here we have to predict the species of the flower with certain features, namely, sepal width, sepal length, petal width and petal length.

Iris Dataset

After using both of the above mentioned models, I found that the model accuracy by Random Forest classifier is 96% while for the same dataset SVM gives 97% accuracy.

Model accuracy by Random Forest classifier.
Model accuracy by SVM classifier.

It is because in this dataset, data is sparse and easy to classify, hence SVM works faster and provides better results. However, random forest also gives good results but does not match upto SVM for this particular dataset.

Here, I have plotted the scatter matrix to show the data points-

Here, Black line representing the hyperplane

The choice of algorithm depends upon the desired outcome. Although both of the models are good at their place, but, it very much depends upon the quality of data when it comes to algorithm’s performance.

--

--