Accuracy, Precision, and Recall in Machine Learning Classification

Asiri Amal Karunanayake
Nov 1 · 3 min read

Old school accuracy is no longer applicable to Machine Learning Classification problems in most of the time. But why? Here is the explanation.


Skewed Data Sets

It means some records have more availability than others in the same data set. So, most of the data sets are unbalanced by the number of records. Imagine you have a Snake classification data set. Some of the snakes could have more population than the others. So, the data availability might be biased over the population.

A Skewed Data Set

Accuracy

This equation includes all labels(targets). Imagine the classification has three targets named “A”, “B”, and “C” skewed with 200, 30, and 20 records. If the predictions give 180, 20, and 10. Eventually, the accuracy will be 84%. But you can see the accuracy does not give an image of how bad “B” and “C” predictions are because of those have individual accuracy with 66% and 50%. You might think the machine learning model has 84% accuracy and it is suited to the predictions but it is not.

Confusion Matrix

Don’t be confused, Confusion Matrix reduces the confusion of the controversy about the model 😊.

After this onwards, every label type considered a single part of the problem. Before talking about Confusion Matrix there are some keywords you should have to understand. Those are “ True “, “ False “, “ Negative “, and “ Positive “.

Let’s talk that with this an example, Imagine the above accuracy scenario and take “A”(as I said earlier individual label type will be evaluated). When the model says it is “A” it is called a “Positive” outcome or the prediction says it is “ notA” it will be a “Negative” outcome. And also if the “A” is recognized as “notA” or vice versa will be a “False” outcome. So, four types of classes can be recognized.

Formation of four Classes (FP, TP, TN, FN)

Precision and Recall

Precision returns Positive Prediction Accuracy for the label and Recall returns the True Positive Rate of the label.

Because of Precision and recall trade-off. Some techniques like F1 value can be also calculated. Most of the time we want to set figure out how to set Precision value and Recall value. If anyone asks “I want this Precision value” you should ask back “At what Recall value”. This controversy is another thing that should be discussed later.

Asiri Amal Karunanayake

Written by

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade