Metrics for Classification- Confusion Matrix

Adnan Khan
Geek Culture
Published in
4 min readJun 18, 2021


All you should know about the confusion matrix to evaluate your classification model.

True Positive, False Positive, True Negative, False Negative? Tired of searching them out again and again and still don’t remember it? No problem that happened to most of the beginners but once you have the better understanding about the intuition behind it you get comfortable with these terminologies.

So let’s get started with Confusion Matrix which is the base. Better the understanding of Confusion Matrix the more easy will be the understanding of all the metrics. What actually is Confusion Matrix?

Confusion Matrix

As it name refers it confuses the people :D (That was joke) I will try to make it simple instead of writing the original definition

A confusion matrix is basically a metrics to evaluate your classification ML model. It’s a square matrix whose dimensions are totally depended upon the number of classes.

Let suppose you’re classifying the animals as cat vs dog. In that case the dimension will be 2 and if you increase the categories to cat vs dog vs monkey then dimension will increase to 3. With the increase in the number of classes in target variable the dimensions will keep increasing accordingly.


So what actually is True Positive, False Positive, True Negative, False Negative? Lets understand it through the example of cat vs dog

So True positive is when class if positive and model classified it as positive, True Negative is when class is negative and model classified it as negative. Similarly the positive class classified by model as negative is False Positive and negative class classified by model as positive is False Negative. I hope you have the understanding of these terminologies by now. If still not go through it again as it’s base.

In Python, That’s how you can create a confusion matrix.

Confusion Matrix implementation in Python


The ratio of classes classified correctly.

The formula for Accuracy is


By using Scikit-Learn you can compute accuracy of the model like this:

Accuracy implementation in Python

But the question here’s that is the accuracy better metric to evaluate model. Yes until you have completely balanced classes, However incase of unbalanced classes using accuracy score isn’t the good option which cause the problem called as High Accuracy Paradox.

High Accuracy Paradox?

Accuracy is misleading when dealing with imbalance classes.

Consider the Medical scenario this time suppose out of 100 patients 5 of them have disease and 95 don’t. Model predictions are below

So what’s an accuracy in that case, accuracy=96%, Accuracy cause the trouble here. Don’t use it incase of imbalance classes.


Precision is accuracy of predicted positive outcomes.

It completely ignore the Negative(True Negative, False Negative) portion in confusion matrix.

Formula for calculating precision is,

Precision formula

Precision value close to 0 is the indication of Poor performing model and close to 1 indicates better performing model.

Python implementation of precision

Precision implementation in Python

Value of precision near to 0 is consider model as poor model where as precision value near 1 shows model is better.


That’s what she said 😅 Lets try to understand what Recall actually is correctly identified true positive. It’s also referred as Sensitivity.

Mathematical Formula

Using Scikit-Learn you can find recall:

Recall implementation in Python

High Recall represent well performing model and Low Recall represent poor performing model.


It’s simply the Harmonic mean of Precision and Recall.


When we have high precision and recall we will have higher F1-Score and F1-Score will be low when either or both of them are low.


These’re the few of the metrics which helps in evaluating the ML classification model beside that some other metrics are available as well which aren’t mentioned here.

We have discussed what really confusion matrix is and how we can utilize confusion matrix to get some other metrics like Accuracy score, Precision, Recall and F1-Score.



Adnan Khan
Geek Culture

Data Scientist who loves to teach machine through data.