Accuracy and Error Rate from CONFUSION MATRIX?

Himanshu Mehndiratta
Analytics Vidhya
Published in
3 min readFeb 20, 2021

So, Hello there! Welcome back. Today we will understand what is a confusion matrix and why do we need it?

If you have worked on any classification problem before, then you might have a question in your mind, i.e how to check whether my model is performing up to the mark or not.

But, how will you decide that your model is a good one or it is just classifying without having any sense. So,here Confusion Matrix comes to the rescue.

Let’s take an example of a binary classification problem. A.binary classification problem is a one in which we are trying to classify only two elements/objects.

Let’s understand this with the help of an example.

Example Case: We are trying to predict whether some word is a spam word or not. The 4 cases here will be:

  1. The Actual value is Spam and the Predicted Value is Spam.
  2. The Actual Value is Spam and the Predicted Value is Non-Spam.
  3. The Actual Value is Non-Spam and the Predicted Value is Non-Spam.
  4. The Actual Value is Non-Spam and the Predicted Value is Spam.

So, for writing this information into a easily readable format for our computer, we are converting this into a 2*2 matrix( as this is a binary classification problem ).

This is confusion matrix where on the left side, we have the actual values and on the top side, we have the predicted values, these values can be interchanged also.

Just LOOK at the matrix once, you will be able to understand that we are trying to understand how many of our predictions were correct and how many of them were wrong.

In the cell (1,1) or (Actual Spam,Predicted Spam) — The value of TP is 45. First of all, what do we mean by TP . TP stands for True Positive.

True Positive — You predicted that something is positive and it actually is.

Here, positive refers to Spam, and negative refers to Non-Spam.

In the cell (1,2) or (Actual Spam,Predicted Non-Spam) — The value of FN is 20. What is FN now? FN stands for False Negative.

False Negative — You predicted something is negative and your prediction is false.

In the cell (2,1) or (Actual Non-Spam,Predicted Spam) — The value of FP is 5. What is FP now? FN stands for False Positive.

False Positive — You predicted something positive and your prediction is false.

In the cell (2,2) or (Actual Non-Spam, Predicted Non-Spam) — The value of TN is 30. What is TN now? TN stands for True Negative.

True Negative — You predicted something negative and your prediction is true.

False Positive is a TYPE-1 Error and False Negative is a TYPE-2 Error. To understand what is a type-1error and type-2 error, we need to go into the mathematics, which we will cover in a separate statistics article.

Now, this is all about confusion matrix. Now we will see, why do we need it.

There are several metrics that can be calculated with Confusion Matrix. i.e

  1. Error Rate
  2. Accuracy
  3. Precision
  4. Recall (Sensitivity)
  5. Specificity
  6. F score etc.

Let’s focus on the first two metrics.

Error Rate — What percentage of our prediction are wrong.

Accuracy — What percentage of our predictions are right.

Here,

TP = 45

TN = 30

FP = 5

FN = 20

Accuracy = (45 + 30) / (45 + 30 + 5 + 20 ) = 75/100 = 0.75

Error Rate = 1-Accuracy = 1-0.75 = 0.25

I think other metrics deserves a new article. So, that’s it for this article. I hope you enjoyed reading this Article :)

--

--

Himanshu Mehndiratta
Analytics Vidhya

So, I am Himanshu. Currently an NLP Research Intern @ Engati. I like working on challenging projects and mostly related to data.