Machine Learning Train, Test & Model Evaluation Techniques — Easy way!

Published in

Deep Data Science

5 min readMar 5, 2020

In this post we will discuss train-test split, k-cross validation, accuracy, precision, recall and F1-Score in a simplified way!

Train and Test Techniques Overview:

There are three main techniques are being used for the training and selection of the models for classification problem.

Train-Test Split — Train test split is a simple and fast technique to divide the dataset into 2 parts: training data and testing data. Usually, the splits are 80%–20%, 75%–25% and 70%–30% which means the first portion is for training the models and the second portion is for testing. Consider 80%–20% split, where 80% of the data will be used for training and 20% for the testing. There is one issue with the train-test split is that models can give different accuracy for different splits dataset for training and testing. This technique can be used with scikit-learn’s train_test_split method.

Python Code for train-test split

2. Train-Validation-Test Split — This technique is very similar to Train-Test Split technique and generally used in deep learning where we do not want to show the data to the model while training. Therefore, model is trained on training data, validated on the validation dataset. Parameters of the model are tuned according to result of validation data. Then we finally test our model on the test data.

3. K-fold Cross-Validation — Cross-Validation technique overcomes the challenge of the train-test split techniques and gives a more generalized accuracy of the model trained on a dataset. In K-fold cross-validation, the dataset is divided into k sets. There are k iterations of training and testing the dataset. In each iteration there, k-1 sets are used for training and 1 set is used for testing and at the end, the average of all the results is taken to provide a more general result. For example, if we use 10-cross validation. Dataset will be divided in 10 sets; models will be trained for 10 iterations; in each iteration 9 sets (90% data ) will be used for training and 1 set (10%) for testing. There is one issue with this technique is that after 1 iteration, the data set starts repeating itself.

Model Evaluation Techniques- understanding in a very easy way!!!

Let’s understand in way that will make it hard to forget.

Imagine there’s a bucket which has 10 diamonds and 10 stones. You built a robot to find diamonds from the bucket. You asked the robot to get the diamonds out of the bucket and leave the stones. Robot digs its arm in the bucket, tries to separate the diamonds from the stones and comes up with the 11 (what it thought) diamonds. Now when you examined you found that the diamonds, robot came up with, are actually 9 diamonds but 2 stones. Also, what it left in the bucket thought they are stones, are actually 8 stones but 1 diamond. (Mind the numbers!)

“You’re guessing it correctly that in this scenario; there are 2 classes — diamond — our positive class, stone — our negative class; robot — our classification model and finding diamonds from the bucket is our classification task.”

You knew that Robot did some misclassification but was accurate for 9 diamonds and 8 stones out of 20 (total). Hence you calculated the accuracy of the robot = (number of correct predictions/ total number of prediction)

(9+8) / 20 = 17/20 = 0.85 or 85%

Now, you want to know how precise the robot is? or in other words, wanted to find out “Out of all the predictions for diamonds, how many are correct predictions for diamonds?

Let’s see: robot predicted total 11 diamonds; but there were actually 9 diamonds: hence robot’s precision = 9/(9+2) = 9/11 = ~0.82

Furthermore, you wanted to find out “How many are correct predictions for diamond out of all the diamonds present?”

Let’s see: robot predicted 9 diamonds but total diamonds were 10; hence recall = 9/(9+1) = 9/10 = 0.90

More formal definitions:

Accuracy — Ratio of correctly predicted instances to the total instances. Accuracy answers below question: How many instances are correctly classified?Accuracy = Number of Correct predictions / Total number of Predictions

Our aim is to achieve high accuracy.

Precision — Ratio of correct predictions for a class (true positive) to total predictions for that class ( sum of true positive and false positive). Precision answers below question:

Example: Out of all the instances which were predicted as Class positive, how many of them actually belonged to Class positive?

Our aim is to achieve high precision.

Recall — Ratio of correct predictions (true positive) to all the actual instances (sum of true positive and false negative). Recall answers below question:

Example: Out of all the instances that actually belong to Class positive, how many are predicted as Class positive?

Our aim is to achieve high recall.

F1-Score — It’s a bit complicated to keep track of the precision and recall, specially when we are dealing with many classes(more than 2). we need a single measure. That’s when F1-score comes in, it’s a harmonic mean of precision and recall.

F1-Score = 2X (Precision X Recall )/ (Precision + Recall)

We can also use a factor of beta to emphasize on one of the precision or recall more than the other. More here

We always aim for high F1-score

Machine Learning Train, Test & Model Evaluation Techniques — Easy way!

Model Evaluation Techniques- understanding in a very easy way!!!

Written by Adarsh Verma