**Simplifying The Confusion Matrix**

Knowing that the machine learning field is very vast and it has various concepts to understand in it, a very rare and unique concept of statistical classification problem comes in the ideology i.e. a confusion matrix, also known as the error matrix.

This article aims in understanding the confusion matrix in a very simple manner.

Let’s try and understand the confusion matrix on a very basic level.

Confusion matrix is used to summarize, describe or evaluate the performance of a Binary classification task or model.

The Key concept of confusion matrix is that it calculates the **no. of correct & incorrect predictions** which is further summarized with the **no. of count values** and breakdown into each classes.

It eventually shows the path in which classification model is confused when it makes predictions.

The Pure Definition of Confusion Matrix is:

*A confusion matrix is a table that outlines different predictions and test results and contrasts them with real-world values. Confusion matrices are used in statistics, **data mining**, **machine learning** models and other artificial intelligence (**AI**) applications. A confusion matrix can also be called an error matrix.*

*By Margaret Rouse..*

Confusion matrices are used to make the in-depth analysis of statistical data faster and the results easier to read through clear data visualization.

Below is a simple example of a confusion matrix:

Here, it comes with 2 rows & 2 columns.

Consisting of,

**True Positives, True Negatives**

**False Positives, False Negatives.**

Here we have kept the predictions as rows and actual values as columns

Diving little deeper in each of the terms:

• Positive (P) : Actual is positive (for example: is an apple).

• Negative (N): Actual is not positive (for example: is not an apple).

• True Positive (TP): Actual is positive, and is predicted to be positive.

• False Negative (FN): Actual is positive, but is predicted negative.

• True Negative (TN): Actual is negative, and is predicted to be negative.

• False Positive (FP): Actual is negative, but is predicted positive.

**Let’s understand it with an example of HIV Test:**

Seeing the above diagram we can say

**True Positives (TP):**We**tested**for Positive (Will Have) & they**actual**have the disease.**True Negatives (TN):**We**tested**for Negative (Will Not Have) & they**actual**don’t have the disease.**False Positives (FP):**We**tested**for Positive (Will Have) & they**actual**don’t have the disease. (Also known as a “Type I error.”)**False Negatives (FN):**We**tested**for Negative (Will Not Have) & they**actual**have the disease. (Also known as a “Type II error.”)

Now let’s take a numerical example of dataschool.io and understand the confusion matrix and the list of rates that are calculated from a confusion matrix for a binary classifier

Few Other terms to be know:

**High recall, low precision:** Indicates that most of the positive examples are correctly recognized (low FN) but there are a lot of false positives.

**Low recall, high precision:** Indicates that we miss a lot of positive examples (high FN) but those we predict as positive are indeed positive (low FP).

**F-score**

It is the Harmonic mean of the two values which we have i.e. Precision and Recall.

It considers both the Precision and Recall of the procedure to compute the score.

Higher the F-score, the better will be the predictive power of the classification procedure.

A score of 1 means the classification procedure is perfect. Lowest possible F-score is 0.

*Python Code implementation Example:*

Thank you for Reading…