Evaluating Classification Models, Confusion Matrix

Barış Cengiz
CodeX
Published in
3 min readApr 24, 2022

In classification problems, accuracy of a model can be the main performance indicator in many cases. But there are some cases accuracy is not enough to evaluate the performance of a model.

When the data you are working with is unbalanced, accuracy score can be misguiding. Confusion Matrix is useful in these cases.

Confusion Matrix

Confusion Matrix

Confusion Matrix displays 4 cases; TP, FP, FN and TN.

True Positive (T.P): Correctly predicted Positive values.

False Positive (F.P): Negative (0) values predicted as Positive (1)

False Negative (F.N): Positive (1) values predicted as Negative (0)

True Negative (T.N): Correctly predicted Negative values.

As it can be seen, there are two types of mistakes in classification: False Positive and False Negative.

False Positive: Type-I Error, not so critical. For example: Model predicted a transaction as Fraud but it’s not fraud. A simple apology should suffice in this case.

False Negative: Type-II error, critical. For example: Model predicted a transaction as Not Fraud but it is actually fraud. In this case your company/bank can lose money, credibility etc.

Evaluation Scores

Accuracy: Ratio of correctly predicted values in the data.

(T.P + T.N) / (T.P + T.N + F.P + F.N)

Precision: Ratio of correct predictions in Positive-predicted values. Measures how good the model is when the prediction is positive.

(T.P) / (T.P + F.P)

Recall: Ratio of correctly predicted values in the Positive Class of the data. Measures how good the model is at correctly predicting positive classes.

(T.P) / (T.P + F.N)

F1-Score: Weighted Avg. of Precision and Recall scores.

2*(Precision*Recall) / (Precision + Recall)

When the data is unbalanced (Ex: 900 Negative, 50 Positive), Accuracy score can be misleading. Because the model might be good at predicting Negative values but not that good in predicting Positive values.

It is also recommended to check precision, recall and F1-scores in every classification model regardless of the data being balanced or unbalanced.

Example:

A dataset contains 1000 patients’ information. The goal of the model is predicting whether the patient has cancer. 990 of them don’t have cancer and 10 of the have cancer.

Let’s imagine the output looks like below:

Example Conf. Matrix

As it can bee seen above, model classified 95 of them as cancer and 905 as not cancer.

Accuracy: (5 + 900) / (1000) = 0.905

Accuracy score is pretty high. Model looks accurate but this is an unbalanced dataset. Let’s further investigate;

Precision: T.P / (T.P + F.P) = 5 / 95: 0.053

Recall: T.P / (T.P + F.N) = 5 / 10: 0.5

It looks like the model was not really good at predicting Positive values. It’s important to check all the metrics.

Thanks for reading.

Barış Cengiz

--

--

Barış Cengiz
CodeX
Writer for

Avionics System engineer learning and sharing about data science, machine learning and artificial intelligence.