Confusion Matrix In Machine Learning and Statistics

Shivam Mishra
Analytics Vidhya
Published in
4 min readJul 27, 2020

Type1 and Type 2 Error,Recall,Precision and FBeta-measure

It is one of the most confusing topic for beginner.

So, Let’s start to learn it in a very easy way.

Table of content:-

  1. What is Confusion matrix in Machine Learning and Statistics?

a. Type 1 Error

b. Type 2 Error

2. Precision

3. Recall

4. FBeta-Measure

5. When to use Precision, Recall and F Beta measure?

  1. What is Confusion matrix in Machine Learning and Statistics?

In the field of machine learning and specifically the problem of statistical classification, a confusion matrix, also known as an error matrix,[8] is a specific table layout that allows visualization of the performance of an algorithm, typically a supervised learning one (in unsupervised learning it is usually called a matching matrix). Each row of the matrix represents the instances in a predicted class while each column represents the instances in an actual class (or vice versa).[9] The name stems from the fact that it makes it easy to see if the system is confusing two classes (i.e. commonly mislabeling one as another).~Thanks to Wikipedia for the amazing defination.

Confusion matrix
Confusion matrix

Here TP represents the True Positive , FP represents the False Positive, FN represents the False Negative and TN represents the True Negative.Let’s dive into deep.

True Positive(TP):-

If Predicted value matches Actual value is known as TP, i.e.,the Actual value was positive and the model Predicted a positive value.

True Negative(TN):-

If Predicted value matches Actual value is known as TN, i.e.,the Actual value was negative and the model Predicted a negative value.

False Positive(FP) — Type 1 Error:-

If Predicted value does not matches Actual value is known as FP,i.e.,the Actual value was negative but the model Predicted a positive value.

False Negative(FN) — Type 2 Error:-

If Predicted value does not matches Actual value is known as FN,i.e.,the Actual value was Positive but the model Predicted a Negative value.

Let’s understand it in a very easy way through Statistical Hypothesis.

Let’s frame a Hypothesis:-

H0: She loves you

H1: She doesn’t love you

From this matrix,

TP:-Our positive actual value matches the positive model predicted value.so,here we don’t reject the null hypothesis .Hence conclusion is she loves you.So enjoy.

TN:-Our negative actual value matches the negative model predicted value.so,here we reject the null hypothesis .Hence conclusion is she doesn’t love you.

FP:-Here we reject the Null Hypothesis when it is true. Hence it creates a type 1 Error.

FN:-Here we don’t reject the Null hypothesis when it is false.

2.Precision:-

Precision is a metric that quantifies the number of correct positive predictions made.

It is calculated as the ratio of correctly predicted positive examples divided by the total number of positive examples that were predicted.

  • Precision = TruePositives / (TruePositives + FalsePositives)

The result is a value between 0.0 for no precision and 1.0 for full or perfect precision.

Example:-

Let’s frame a hypothesis:-

H0: A person having a cancer

H1: A person doesn’t having a cancer

Here, precision creates a type1 error (FP),i.e, Model predicted that a person having a cancer but in reality a person doesn’t having a cancer, so he/she will go for further medical process.

3.Recall:-

Recall is a metric that quantifies the number of correct positive predictions made out of all positive predictions that could have been made.

It is calculated as the ratio of correctly predicted positive examples divided by the total number of positive examples that could be predicted.

  • Recall = TruePositives / (TruePositives + FalseNegatives)

The result is a value between 0.0 for no recall and 1.0 for full or perfect recall.

Example:-

Let’s frame a hypothesis:-

H0: A person having a cancer

H1: A person doesn’t having a cancer

Here, Recall creates a type2 error (FP),i.e, Model predicted that a person doesn’t having a cancer but in reality a person having a cancer, so he/she will not go for further medical process.

4. FBeta-Measure:-

Precision and recall measure the two types of errors that could be made for the positive class.

  • F-Measure = (2 * Precision * Recall) / (Precision + Recall)

This is the harmonic mean of the two fractions. ~ Thanks to Jason.

The result is a value between 0.0 for the worst F-measure and 1.0 for a perfect F-measure.

The Fbeta-measure measure is an abstraction of the F-measure where the balance of precision and recall in the calculation of the harmonic mean is controlled by a coefficient called beta.

  • Fbeta = ((1 + beta²) * Precision * Recall) / (beta² * Precision + Recall)

The choice of the beta parameter will be used in the name of the Fbeta-measure. The value of Beta can be vary.

5. When to use Precision, Recall and F Beta measure?

When False positive is more important we should use Precision.

When False negative is more important we should use Recall.

When Precision and Recall both are more important we should use FBeta measure.

Contact me through:-

LinkedIn:- https://www.linkedin.com/in/shivam-mishra-a03815185/

Email:- shivammishra2186@yahoo.com

Twitter:- https://twitter.com/ishivammishra17

--

--

Shivam Mishra
Analytics Vidhya

I am a student of masters. I like to support our data science community.