Published in

Evaluating classification models simplified.

Photo by Giorgio Trovato on Unsplash


Photo by Mufid Majnun on Unsplash


Positive Class- Example: If a group of patients are being tested for COVID-19 and the doctor says that 70% of those tested the test result came out as positive, what does this mean? It simply means that the patients have been confirmed to have COVID-19.

Confusion Matrix

A confusion matrix is a table that describes the performance of a classification model. It displays results of True Positives, True Negatives, False Positives, and False Negatives. Therefore you can simply say that it is a table that shows how good our model is at predicting examples of various classes.

source — Author
source- Author


Accuracy means you want to calculate the ratio of the samples you predicted correctly to the entire samples you have.

Accuracy: Source- Author


It is stated as, out of all the positive samples (Class Label 1), predicted by a model, how many are actually positive. example: out of all the people we predicted to be sick(both TP and FP), how many are actually sick(TP).

Precision: Source- Author


For Recall, we observe both sides, but with a focus on those that are actually sick on either side(TP and FN). We include observations from both the sick side[1] and not sick side[0] where we have both TP and FN.

source — Author

F1- Score

If precision increases, recall decreases and vice versa. Therefore, a new measure was introduced called F1-score. F1-Score calculates both Precision and Recall and gives a single score that conveys the balance between Precision and Recall. The best score is 1.0, whereas the worst score is 0.0.

What metrics should someone focus on?

The answer to this is highly dependent on your business objective and it’s always good to consult a domain expert to guide you with this. Both Precision and Recall are important depending on the business problem at hand.

Example 1:

A model predicting diabetes in patients as positive (present) or negative (not present), is expected to detect diabetic patients so they can be treated.

  • Will the objective be to reduce False Positives or False Negatives? False Negatives are risky since many people might be left untreated and eventually cause severe health issues.

Example 2:

A model predicting whether an email is a spam or not spam.

  • Will the objective be to reduce False Positives or False Negatives? False Positives are riskier to have since you could have important genuine emails that will go to spam and therefore miss out on important information. False Negatives are more acceptable in this case since a spam email going to the inbox might not cause any loss of important information. In this case, we want to reduce False Positives and hence look at Precision

Example 3:

Detecting whether a transaction is fraudulent or not.

  • Will the objective be to reduce False Positives or False Negatives?FP (You predicted they are fraudulent and are not fraudulent).FN(You predicted they are not fraudulent and are actually fraudulent).

Confusion Matrix

My confusion matrix is computed by comparing what I have predicted(y_pred) and what was the actual value(y_test)

Accuracy, Precision, Recall, and F1-score

We now get the accuracy score, precision recall, and f1-score as shown in the image below.



Data Scientists must think like an artist when finding a solution when creating a piece of code. ⚪️ Artists enjoy working on interesting problems, even if there is no obvious answer ⚪️ 🔵 Follow to join our 28K+ Unique DAILY Readers 🟠

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Joan Ngugi

Big Data & Analytics, Data Science, Machine Learning, Data Engineering |