# Evaluating Metrics for Classification Machine Learning Models(Learners at medium Level)

This topic is for those people who have some basic knowledge about machine learning.

Before going into the topic, lemme tell you something. The metrics that we’ll see now will be discussed from the perspective of binary classification. So first, Try to get a crystal clear understanding of these concepts so that after this you can understand my next blogs on **Evaluation metrics for Multi-label and Multi-class classification(**you can understand what’s this multi-label and multi-class classification and things when I’m discussing about them)**.**

**Binary classification**:- Binary Classification means having only 2 classes({0 or 1},{Male or Female},{yes or no} …. )in the labels.

Before diving into the Metrics, you need to be in a position to build your own classification model.

For now, I’ll build a model for you.

import pandas as pd

from sklearn.datasets import make_classification

from sklearn.model_selection import train_test_splitfrom sklearn.tree import DecisionTreeClassifierX,y = make_classification(n_samples=200,n_features=10,random_state=20)X_train,X_test,y_train,y_test = sklearn.model_selection.train_test_split(X,y,random_state = 43)model = DecisionTreeClassifier()

model.fit(X_train,y_train)

For now, we have a model.

now let’s learn some classification metrics.

To understand the classification metrics, you need to understand the basic thing called **Confusion Matrix**

**Confusion Matrix**:- Confusion matrix is a matrix that will convey your model’s right and wrong predictions on data.

Let’s understand what’s this positive and negative and things,

Let’s consider the problem statement of predicting whether the given image is of **Apple** or **Orange. **Let’s take Apple as +ve sample and Orange as -ve sample(or you can consider vice-versa too). Consider we have 40 Apple (+ve) samples and 10 Orange(-ve) samples. Let’s think our model predicted 35 out of 40 +ve samples correctly, 5 out of 10 -ve samples correctly.

In the above image,

**TP(True Positives)**:- Positives which are Truly predicted as Positives. **FP(False Positives)**:- Negatives which are falsely predicted as Positives. **FN(False Negatives)**:- Positives that are falsely predicted as Negatives. **TN(True Negatives)**:- Negatives that are truly predicted as Negatives.

So now, TP(True Positives) = 35, FP(False Positives) = 5, FN(False Negatives) = 5, and TN(True Negatives) = 5.(Try to get clear about this things first).

Read again if you didn’t get it.

Till now, you studied about Confusion matrix in the case of binary classification.

Let’s see in the case of Multi-Class Classification.

if you see in the above image,

Pgg represents No. of predictions our model predicted **Greyhound** as **Greyhound** correctly. Pmg represents No. of predictions our model predicted **Greyhound** as **Mastiff**. Pgm represents No. of predictions of our model predicted **Mastiff** as **Greyhound**(Now try to understand other terms Psg, Pmm, Psm, Pgs, Pms, Pss).

we need to build our model in such a way that the diagonal elements(TP & TN) should be always as high as possible.

`from sklearn.metrics import confusion_matrix`

print(confusion_matrix(y_test,y_pred))

Now, you can confidently say that “**I know Confusion Matrix😎**”.

Now let’s dive into the metrics😉.

Before this, be perfect about modeling and predicting(go through the above code snippet).

**accuracy_score**:- This metric is used when you have an equal No. of +ve and -ve samples. This metric will say how many samples are correctly classified in the test dataset. We can calculate the accuracy_score from the confusion matrix only.

accuracy_score = (TP + TN) / (TP + TN + FP + FN)

`from sklearn.metrics import accuracy_score `

print(accuracy_score(y_test,y_pred)

2. **Precision**:- This metric can be used when we have an unequal No.of +ve and -ve samples. This metric will say, how much accurately +ve values got predicted in Actual +ve values(Read this again to understand). We can calculate the Precision from the confusion matrix only. But, the thing is precision will only take care of False positives and gives the efficiency with respect to False positives without considering false negatives.

- Let’s denote the precision with
**P**(for future purpose).

Precision(P) = TP / (TP + FP)

`from sklearn.metrics import Precision_score`

print(precision_score(y_test,y_pred))

3. **Recall**:- This metric will say, how much accurately +ve values got correctly predicted in the predicted +ve values(Read this again). we can calculate the recall from the confusion matrix. The thing with recall is it will only take care of False negatives and gives the efficiency with respect to False negatives only, without considering false negatives.

- Let’s denote the Recall with
**R**(for future purpose)

Recall(R) = TP / (TP + FN)

`from sklearn.metrics import recall_score`

print(round(recall_score(y_test,y_pred),2))

4. **F1 Score:- **F1 Score is something which will be calculated with precision and recall. F1 score is the harmonic mean of Precision and Recall. F1 Score score is something that gives efficiency by taking false +ves and false Negatives into account. F1 score can be defined as “**the harmonic mean of Precision(P) and recall(R)**”.

F1 Score = 2*P*R / (P + R)

`from sklearn.metrics import f1_score`

print(round(f1_score(y_test,y_pred),2))

**Note**:- Instead of looking at Precision and recall separately, we can look F1 score which says about both Precision and recall. whenever we have imbalanced datasets, we need to look at the F1 score instead of accuracy_score.

we’ll discuss the ROC_AUC score in another blog because you need some understanding of some things to study about that. so we’ll have a deep understanding of ROC_AUC in another blog.

# Conclusion:-

- Till now, what we have studied are the important concepts and also helps you in understanding Multi-class and Multi-label classification where the same things of precision, recall, and things get repeated with some different implementations according to them.
- Accuracy will be seen when we have balanced datasets. F1 score will be seen if we have imbalanced datasets.

___________________________________________________________________

As a reminder again, my next 2 blogs will discuss **ROC_AUC** and **Evaluation metrics for Multi-label and Multi-class classification.**

Follow me here:- http://medium.com/@iamvishnu.varapally

Happy Learning!😁