Support Vector Machine (SVM) Practical Implementation

Rodrigo Dutcosky
Analytics Vidhya
Published in
9 min readMay 18, 2020

Hello there!

This is the second post on my Machine Learning Practical Implementation series. The objective with this series is to show that you can still benefit from these popular algorithms on your daily analysis even if you don't master the whole mathematical knowledge behind them.

If you haven't check the first post from this series yet, I invite you to do so in the link below.

On this post I'll keep the same pipeline from the last one. I'll first explain a little bit about the algorithm itself and then we'll do a practical implementation using some data from Kaggle.

Support Vector Machines

SVM is a supervised machine learning algorithm commonly used for classifications and numeric predictions. Like many algorithms, SVM iterates multiple times with the purpose to find the best values for something.

In this case, something is the position of a line that will have the best accuracy when classifying linear data. The line position is dependent on where the support vectors are located, and that's where the name comes from.

These support vectors will be placed based on the data points margin boundaries with highest homogeneity. Here's a picture to represent the support vectors calculated position, followed by the created line that best separates classes A and B:

You see the image brings the concept of hyperplane instead of line. This is because the creation of a line is for two dimensional data only. In real life, you'll probably use SVM's to make predictions based on multiple feature inputs (instead of only two). For that to happen, the classification of your data will be created based on hyperplanes.

But let's get back to the two dimensional data to talk about another cool thing that is in SVM's parameters.

The Kernel Trick

If you're using SVM's for data points that are not linearly separable, you can take advantage of Kernel Functions to map your data and set them on a higher dimensional space where a hyperplane can be found and separate the data correctly.

So you see that in the picture in the left, there's no way to set a line that classifies the data points with high accuracy. So a Kernel Function is placed to set the data points into a three dimensional space. Now, a hyperplane can be set in a way where the data can be correctly classified. This is known as the Kernel Trick.

I've seen people refer to Kernel Functions as a Mathematical Miracle, so it's pretty cool to understand at least what we're doing during the algorithm execution.

I'm showing you the Kernel Trick because there are multiple types of Kernels that can be used on SVM implementations. They're used as parameters parsed to the model creation in Python.

Practical Implementation

Knowing what's going on while training your model is extremely important and will help you parse the best parameters to the functions. But at the end of the day, this represents only a couple lines of code.

What I intend to show you with this practical implementation is that the preprocessing stage of the data you're going to input for your predictions is also very crucial and takes a lot more coding than the model training itself.

To implement SVM, here's what we're going to do:

  • Choose a raw set of data.
  • Set an objective.
  • Explore our data.
  • Prepare the data for the Model.
  • Train a couple different Models.
  • Make new predictions with the highest performance Model.

Let's get to work!

Choose a raw set of data

For this project, I choose a dataset from Kaggle that contains student grades in three different exams, along with features that describes information about the students itself.

This dataset can be found in the following link:

Here's a look of the dataset features:

Set an Objective

As you can see, there's no target feature to provide to our model. So we have to create one. The model we're going to train will predict if the student will pass or fail the math test. The minimum score I'm setting to pass is 70.

In the following lines of code, I'm importing this data into a pandas DataFrame, changing the name of the features and create a new one called math_passed based on each student math score.

Note: the only reason why I'm changing the feature names is because I don't like working with names separated by spaces. Not having these spaces can help you code and is a good practice, on my perspective.

All set. Now our data is ready to roll and our target feature is created.

Explore our Data

This current DataFrame has 1.000 registers and 8 features (not considering our target).

Some common things to do on the preprocessing stage is checking if you have missing values, outliers, and explore each feature values correlation to our target.

I didn't find any NaN values. Outliers wouldn't really be the case here, since the only numeric values we have are the student's grades. To double check that, I could use max() & min() functions on each grade feature, but finding any grade outside the 0 to 100 range would be unusual.

When it comes to exploratory analysis, I always like to put my data into visualization. I strongly recommend plots to help you on that stage.

To understand the relation between our features, I created the block of code below:

This way I can simply change the value of the created variable F for each individually feature in the dataset and have a nice cleaner understanding of it's values. Here's a couple examples of the output plot:

Well, we can see that female students have a higher fail rate on math tests. Group E is the only group where more students pass the test than fail. You get the idea.

Maybe the most important thing to take from this initial analysis is that 409 students passed the test, while 591 didn't.

Thats not great for the students but it's a green flag for our model, since we're working with a balanced target feature. We won't have to worry about our model learn more from one class than the other!

Prepare the Data for the Model.

After some exploratory analysis I decided to do the following:

  1. I'll bring the results from the writing test as a new boolean feature (the same way I did to create our target). When analyzing the correlation between these features I found out that a lot of students that failed math test also failed another's.
  2. I'll turn all my categorical features into numeric. We know our model is expecting numeric data. I'll do that with the function LabelEncoder() from sklearn package.

Let me wrap everything done so far into the same block of code:

Great! Now everything's numeric. Look that female turned into zero while male turned into one, and so goes on…

The first batch of preprocessing is ready to be modeled. Notice the bolded word because we can't guarantee this dataset will make our model perform well. We may need to get back on preprocessing stage and start all over.

Preprocessing is a cycle!

Train the Model

Finally we got to the Model Training stage. As I told you before, this part will take us only a couple lines of code. First let's split our DataFrame into train and test.

No need to do that manually in any way! The train_test_split() will do the heavy work for us. Let's make that split 70/30.

With our train & test datasets ready, I'll finally train the first version of my Model. If you want to check all existing parameters on sklearn SVM function, you can check them on the official documentation right here:

I'll parse the Kernel type as linear. Then I'll train my Model with the fit() function and make predictions with predict().

That's it. All that "support vector creation, hyperplane setting, bla, bla.." from the beginning just worked out in a second or two. Notice I imported some extra functions on the code so I can compute the performance from this first Model.

Precision means how many students the model labeled correctly (independent of passing or failing).

Recall calculates how many of actual positives the model captured by labeling as positive (or true positive).

The closest Precision & Recall parameters be to each other the better.

Accuracy can be simply read as: 76 out of 100 predictions were made right.

So 76% accurate uh? Let's train another Model using the RBF Kernel Function and check if that work's better.

Alright. So we raised our model accuracy by 1.3% just changing SMV's Kernel parameter.

If you want to learn more about SVM Kernel Functions, here's a good article for that:

Remember that PreProcessing Stage is a Cycle?

Well, I'm not satisfied with 77.3% accuracy. So I'll get all the way back to my initial dataset and do the following:

  1. Remember the actual writing/reading scores? Last time I kept only a boolean feature representing if the student passed or failed on that test. Now I'll bring both actual scores into play (the one with 0 to 100 range)
  2. I'll still encode all my categorical values. But since we have two features with such different range of values, I'll scale all my data using the StandardScaler() function.

Very Important Note: the data scaling is going to happen after I split my data into train & test. Take really attention on that because the scaling process takes all data values in consideration!

Below you can check the whole code with comments as it goes:

Our final Model version had an accuracy of 88.6%.

For some reason, I found SVM's seems to perform better with scaled data…

Making New Predictions with the Trained Model

Model number three had the best result so we're going with him to make those new predictions.

Let's say we have a new student taking the math test and she's a female, group B, has bachelor’s degree, takes standard lunch time, didn't took preparation for the test, scored 72 on reading and 74 on writing test.

Now, very caution on what I am going to say next:

All new input data to be predicted has to be given to the trained model the way it got the data to be trained on first place!

There's no reason to think we can give raw data to our model and it's going to perform as we tested when the data used to train it had a bunch of transformations within.

This means.. female has to be turned into zero, group B has to become number one, etc. Besides that, this array of values has to be scaled as well.

To correctly make a new prediction this is how we would do it:

Model's prediction is that this student is not going to pass math test.. too bad.

Before you go

I stopped trying to reach a good performance when the Model accuracy was on 88%. This means that's the best SVM Model that can be created with that data? NO!

If you always try to be improving your Model and trying to reach better results you'll just never stop pre processing and training.

I didn't get the chance of giving this advice on K-means, because it's an unsupervised machine learning method. But it's good to create more than one version of the Model specially to have them as comparison basis. Don't waist your whole week/month trying to reach the highest performance on a Model that does not need to have high performance predictions (like students passing/failing math tests).

Hope you enjoyed the reading.

Rodrigo Deboni Dutcosky.

--

--