Confusion Matrix is no more confusing.

r.aruna devi
Analytics Vidhya
Published in
6 min readJul 15, 2020

Before we switch into the topic, lets understand why we need to consider Confusion matrix and metrics ?

Metrics plays a major role in evaluating the performance of the model.

Metrics from Confusion Matrix.

  • Confusion Matrix (Precision, Recall, F score, Accuracy)

Confusion Matrix is no more Confusing.

Consider a dataset has two classes say Class A and B. There may be two cases where your dataset is Balanced and Imbalanced. Balanced dataset means that, records for class A and B are balanced. Say Class A has 50% of data and class B has 50% of data or 55–45% of data. Imbalanced dataset has records of 90–10% of Class A and B or 80–20 and 70–30% of data.

Metrics to consider will be different for both Balanced and Imbalanced dataset.

Confusion Matrix comes with rows and columns of Actual and Predicted. The terminologies used are True Positive, True Negative, False positive, False Negative.

Lets split the words as True and positive separately.

Positive : Class A ; Negative : Not a Class A(Class B)

True : Predicted is right ; False : Predicted is wrong

Image from Google.

True Positive : Positive : Model predicted as Class A, True what model predicted is correct. Concludes as : Actual is Class A, Model Predicted as Class A.

True Negative : Negative : Model predicted as Class B, True what model predicted is correct.

Concludes as : Actual is Class B, Model Predicted as Class B.

False Positive(Type-1 Error) : Positive : Model predicted as Class A, False what model predicted is wrong. Concludes as : Actual is Class B, Model Predicted as Class A.

False Negative(Type-2 Error) : Negative : Model predicted as Class B, False : what model predicted is wrong. Concludes as : Actual is Class A, Model predicted as Class B.

Image from Google.
  • Confusion matrix with Actual Values an top and what model has predicted is expressed in terms of rows.
  • Diagonal rows represent the True Positive and True Negative.
  • FP and FN are the Type 1 and Type 2 Error, Which means there is a contradict between the Actual and Predicted values.
  • Train model with any Classification or Ensemble model.
  • Predict the test dataset by model.predict(y_test)
  • Print the Confusion Matrix.
import pandas as pd
from sklearn.metrics import confusion_matrix
data = {'y_Actual': [1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0],
'y_Predicted': [1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0]
}
df = pd.DataFrame(data, columns=['y_Actual','y_Predicted'])confusion_matrix = pd.crosstab(df['y_Actual'], df['y_Predicted'], rownames=['Actual'], colnames=['Predicted'], margins=True)print(confusion_matrix)
----------------------------------------------------------
Output:
Predicted 0 1 All
Actual
0 6 1 7
1 2 3 5
All 8 4 12

Print the Classification Report to know Precision, Recall, F1 score, Accuracy.

from sklearn.metrics import classification_reportprint("-- Classification Report --\n\n",classification_report(df['y_Actual'], df['y_Predicted']))
----------------------------------------------------------
Output:
-- Classification Report --

precision recall f1-score support

0 0.75 0.86 0.80 7
1 0.75 0.60 0.67 5

accuracy 0.75 12
macro avg 0.75 0.73 0.73 12
weighted avg 0.75 0.75 0.74 12

Lets Dive into the results:

As said, Diagonals are the TP and TN

TP : Actual is 0 and model predicted as 0 which meets the value 6, Model predicts 8 records as 0. Out of 8, 6 records are correctly predicted.

TN: Actual is 1 and model predicted it as 1 which meets the value 3, Model predicts 4 records as 1. Out of 4, 3 records are correctly predicted.

Lets Look into Type1 and Type2 Error:

FP : Predicted as 0 but it is Actually 1, which meets with the value 2.

FN : Predicted as 1 but it is Actually 0, which meets the value 1.

TP = 6 ; TN = 3 ; FP = 2 ; FN = 1

If Confusion matrix is not confusing to you, We can proceed further. From CM, we will understand metrics like Precision, Recall, F score, Accuracy.

Accuracy is how many predictions are actually correct(TP+TN) from the overall predicted records(total records).

Accuracy = (6+3)/(6+3+2+1)

If Accuracy is more than 90%, do suspect your dataset. Check whether your dataset is balanced or imbalanced.

Image from Google.

Precision : From model predicted, How many are actually correct.

Precision for 0 = 6/(6+2)

Precision for 1 = 3/(3+1)

Recall : From Overall Actual records, how many predicted are correct.

Recall for 0 = 6/(6+1)

Recall for 1 = 3/(3+2)

F-Beta Score is FP and FN plays an important role.

β value can range from 0 to any number depends upon the importance of FP and FN.

  • When FP and FN are equally important, set β=1, then says
    F1 Score.
  • When FP is more important than FN, set β=0.5, then says F0.5 score.
  • When FN is more important than FP, set β=2, then F2 score.
Image from google.

If your dataset has highly imbalanced classes. Dont ever look for Accuracy. Say your model has 95–5% Split for Class A and B. Model might predict 98% of data as Class A, at that time Accuracy will by 97%. You will get only 2 % data correctly been predicted for Class B.

Look for Precision, Recall mainly F Score for Imbalanced Dataset.

Click to know more on How to Handle Imbalanced Dataset?

Simple Trick to remember what Precision and Recall.

  • Precision : Starts with letter P so, out of Predicted records how many are correct. Recall : Opposite to Precision. From Overall Class A, how many are correct.
  • You have made a list of 10 things to buy from Shop. But out forgot the list of things when you enter the shop and end up buying only 7 things.

Precision is : Out of 7 things bought what are the things matching with your list. Recall is Out of 10 things what are the matching things you bought.

When to prioritize precision over recall and vice versa?

It depends on the business use case.

Recall should be optimized over precision when there is a high cost associated with a False Negative, i.e. system predicts benign when tumor is in fact malignant.

Precision should be optimized over recall when there is a high cost associated with a False Positive, i.e. spam detection.

Thanks for Reading. Continue Learning.

--

--