Explain It To Me : Confusion Matrix

Remove any confusion from the confusion matrix and understand all metrics important for classification

Jonathan Kristanto
Bina Nusantara IT Division
5 min readDec 21, 2021

--

Illustration by LinkedIn

You’ve just started doing Machine Learning and you’re building your first classification model. You’ve cleaned your data, train the model and finally got the prediction. Great! Now it’s the time for you to measure your model’s performance.

But just seeing lots of number and label is confusing and tiring. So you wonder how do we make our result to be visually friendly and easy to understand? If you have encounter this point then this is the time for you to learn about Confusion Matrix

Confusion matrix plot the predicted and true values of a classification problem and show the amount of correct and incorrect predictions.

When dealing with confusion matrix, there are some terms that you need to understand, which are True Negative (TN), False Negative (FN), False Positive (FP) and True Positive (TP). I’ll explain this terms using a hotdog analogy.

A hotdog turns out can come in so many different types. Illustration by Author

Notes : Most of the confusion matrix out there are having TP value at the upper-left corner and TN value at the lower-right corner. Each version is equally valid, it’s just a matter of how you plot your label. I choose this version since this is the one being use by scikit-learn, a python library which offer a very convenient function to generate confusion matrix. If you want to learn more go to their documentation.

True Negative (TN)

You predicted negative ⛔ and it’s true✅

Analogy : You predicted a hotdog, and True since it is a pizza

False Negative (FN)

You predicted negative ⛔ but it’s false❌

Analogy : You predicted not a hotdog, and False since it is actually a Chicago-Style Hotdog.

False Positive (FP)

You predicted positive ➕ but it’s false❌

Analogy: You predicted a hotdog, and False, since it is a corgi wearing hotdog costume

True Positive (TP)

You predictive positive ➕ and it’s true✅

Analogy : You predicted a hotdog, and True.

Metrics Derived from Confusion Matrix

Accuracy

Accuracy show how many correct prediction your model generate. From all classes (positive and negative), how many of them we have predicted correctly.

Accuracy is a good basic metric and works great in a balanced datasets. However, in an unbalanced datasets accuracy can become misleading.

Recall

Recall simply explained is from all the positive class, how many are we able to predict. That’s why this metrics is also called sensitivity.

Cases where we want to have a high-recall model is when false negative can’t be tolerated. For example we develop a cancer detection model. You want the model to label everything that could potentially be a cancer, and let the doctor check them. If we mislabel potential cancer as just a benign one the effect could be fatal.

Precision

Precision is the measure for the correctness of our positive predictions. In simple term, if we label something as positive, how sure can you be that this is actually a positive.

Cases where we want to have high-precision is when false negative can be tolerated. For example we develop a spam detection model. It will be okay if we have some spam (False Negative) come into our main inbox, but we wouldn’t want our important email to be sent to spam folder (False Positive).

F1-Score

In an ideal world, we would like to have a model which have high-precision and high-recall. However, this isn’t possible since precision and recall go against one another.

If we use the cancer detection model analogy, if we have a model with super high-recall, we might give cancer treatment to patients who don’t actually suffer from it. Similarly if we aim for a model with super high-precision, we would end up not giving treatment to patient that actually have cancer since we’re afraid to make any mistakes.

F1-Score offers a measure that indicate when a model have good prediction and good recall values as well. This way, we can achieve harmony between the two metrics (fun fact: F1-score is the harmonic mean of the precision and recall)

Generate the metrics by python

After understanding the basic, now let’s look how we can get these metrics using scikit-learn. Scikit-learn has a metrics module that contain formula for all of these metrics, and you just need to call the function to use it! Even more powerful you can have a concise summary of all 4 metrics with just a single function 🤩

You can refer to the documentation if you want to learn more.

Summary

Through this article you’ve learn about :

  • What is Confusion Matrix
  • True Positive (TP), False Positive (FP), False Negative (FP), and True Negative (TN)
  • 4 main metrics of classification: Accuracy, Recall, Precision, and F1-Score
  • Using python to calculate all the metrics

I hope you can gain basic understanding about confusion matrix and the important metrics for classification task. Remember, never stop to learn & stay awesome!

--

--