Confusion Matrix in Machine Learning

7 min readDec 14, 2018

A confusion matrix is a tabular representation to describe the performance of a classification model on a set of test data for which the actual values are known.
It allows easy identification of confusion between classes that is if one class is mislabeled as the other class.

1. Confusion Matrix For Binary Classification

Binary Classification is the task of classifying the elements of a given set into two classes. Assuming class labels as Class 1 (Positive class) and Class 0 (Negative class). Confusion Matrix for 2 classes would be of 2 * 2 dimension (as shown below).

a = Number of points such that Actual class = 0 and Predicted class = 0
b = Number of points such that Actual class = 1 and Predicted class = 0
c = Number of points such that Actual class = 0 and Predicted class = 1
d = Number of points such that Actual class = 1 and Predicted class = 1
a + c = Total number of negative points (class 0).
b + d = Total number of positive points (class 1).

Confusion Matrix for Binary Classification

Terminologies

TN : Actual class is negative, and is predicted to be ‘negative’ . The predicted class label is correct thus ‘True Negative’ .
TP : Actual class is positive, and is predicted to be ‘positive’ . The predicted class label is correct thus ‘True Positive’ .
FN : Actual class is positive, but is predicted ‘negative’. The predicted class label is wrong thus ‘False Negative’ .
FP : Actual class is negative, but is predicted ‘positive’. The predicted class label is wrong thus ‘False Positive’ .

2. Confusion Matrix For Multiclass Classification

Multiclass classification is the problem of classifying instances into one of three or more classes. Supposing we have N classes then Confusion Matrix would be of N * N dimension.

How to build the Confusion Matrix for Multiclass Classification?

Here is a step by step procedure to build the Confusion Matrix for Multiclass Classification:

In first case of step 2, model predicted class label as A and actual class label is also A. Thus it becomes the True Positive for class A.

In second case of step 2, model predicted class label as B but actual class label is A. Thus it becomes the False Negative for class A.

After filling up the values in the same way we get the confusion matrix for our model.

Performance Measures computed from the Confusion Matrix

True Positive for each class in Multiclass Classification:

2. True Negative in Multiclass Classification:

3. False Positive in Multiclass Classification:

4. False Negative in Multiclass Classification:

True positive should be high for a model to be good. Therefore, high values of principal diagonal elements and low values of off-principal diagonal elements tells that the model is good.

Accuracy

It is one metric for evaluating classification models. It can be calculated as total number of correct predictions divided by total number of data points. Formally, accuracy has the following definition:

Accuracy can also be calculated in terms of positives and negatives as follows:

Recall:
Recall can be defined as the ratio of the total number of correctly classified positive points divide to the total number of positive points. High Recall indicates the class is correctly recognized (small number of FN).

Recall is given by :

Precision:
To get the value of precision we divide the total number of correctly classified positive points by the total number of predicted positive points. High Precision indicates a point labeled as positive is indeed positive (small number of FP).
Precision is given by the relation:

High recall, low precision:This means that most of the positive points are correctly recognized (low FN) but there are a lot of false positives.

Low recall, high precision:This shows that we miss a lot of positive points (high FN) but those we predict as positive are indeed positive (low FP)

F-measure ( or F1-score):
Since we have two measures (Precision and Recall) it helps to have a measurement that represents both of them. We calculate an F-measure which uses Harmonic Mean in place of Arithmetic Mean as it punishes the extreme values more.
The F-Measure (or F1-score) will always be nearer to the smaller value of Precision or Recall.

Let’s consider an example:

Suppose we have two models. Confusion Matrix for the models are:

Accuracy for Model 1:

Accuracy for Model 2:

If we compare the accuracy for model 1 and model 2. Model 1 has lower accuracy while the model 2 has higher accuracy. This is due to imbalanced dataset. Model 2 being biased towards class A gives many correct prediction for class A but fails for others.

Thus, Accuracy alone doesn’t tell the full story when you’re working with a class-imbalanced data set, like this one, where there is a significant disparity between the number of points for each class labels.

However if we use F1-score then:

Precision for Model 1:

Recall for Model 1:

F1-SCORE for Model 1:

After calculation we get:

We could see that Model 1 is better for multiclass classification on given imbalanced data and F1-Score is the metric to quantify its performance.

These were the main terminologies related to confusion matrix.

Implementation of Confusion Matrix

Let's get started with the coding part. Guidelines to implement Confusion Matrix:

Importing libraries that are to be used: pandas to create the data frame, seaborn and matplotlib.pyplot for visualization.
Creating an array to store the test data predictions. Assuming all predictions to be correct for this particular example. Each row of this array represents the Actual Class Labels (input to the model) while the column represents the Predicted Class Labels (predictions by the model).
Creating the dataframe using pandas library, containing the values of the array.
Visualizing the confusion matrix with heatmap() of seaborn library. It is a great tool to visualize the confusion matrix because it provides darker color to the cells with lower value and brighter color to the cell with higher value of the matrix.

Basic implementation of confusion matrix without any misclassifications:

Observation: Principal Diagonal has the brighter colors as it has higher values and all the zeros are colored black.

Basic implementation of Confusion Matrix with some misclassifications:

Let’s see a little more realistic example of confusion matrix. Each row of this matrix represents the Actual Class Labels (input to the model) while the column represents the Predicted Class Labels (predictions by the model).

This is the confusion matrix with some confusion (misclassifications). We could see there are brighter colors in this matrix other than the principal diagonal which symbolizes presence of error made by the model.

Limitation of a Confusion Matrix without normalization: By looking in the above matrix we can not find that the model is good at predicting which Class Label. There may be any number of data points present in a class like in Class A there are 10 data points and model predicted 9 correct but in Class B there are 20 data points and model predicted 15 correct. So there were 5 data points misclassified in Class B and there was only 1 data point misclassified in Class A. So, by just looking up the number we can not justify that the model is good for which class predictions. To remove this limitation we could use Confusion Matrix with normalization

Confusion Matrix with normalization:

As stated above, the limitation of Confusion Matrix can be removed by normalization. Confusion Matrix with normalization provides better visualization.