Classification and Regression Evaluation Metrics — Part 1

Balamurali M
3 min readAug 10, 2018

--

We need to evaluate our machine learning algorithms with the help of various metrics. There are some commonly used metrics for regression and classification problems. We will see cover some of these evaluation error metrics.

In this Part1, we will see some of the classification evaluation metrics. (I will post Part 2 article later where I will explain about regression metrics).

The best way to analyse any key concept or problem in machine learning is to code & implement and analyse the results. I have written the below classification example in Python. We will analyse the results and along with it go through the key concepts.

To summarize this code:

  1. Generate a random matrix with 100 rows and 20 columns with values of either 0 or 1. First 19 columns will be the explanatory variables and the 20th column will be the response variable
  2. Split the matrix into training and testing data sets. First 80 rows the training and the last 20 rows for testing
  3. Perform the classification with the support vector machines
  4. Use the confusion matrix and accuracy score to evaluate the results. (I have used the sklearn metrics library from the scikit-learn.)

Since the matrix we use is a random generated one, for every program run, the matrix values and the results will change. The data we analyse here will be for a specific run.

I ran the code and got the below results:

  1. Actual Class values: [1 0 0 0 1 1 1 1 1 0 0 0 1 1 0 1 1 0 1 0]
  2. Predicted Class values: [1 0 0 0 0 1 0 0 0 1 0 1 1 0 0 1 1 0 1 1]
  3. Confusion Matrix:

[[6 3]
[5 6]]

4. True Negative, False Positive, False Negative, True Positive-6, 3, 5, 6 respectively

5. Accuracy Score: 0.6

We will now try to understand some key concepts and interpret the above results.

a) True Negatives are the rejections correctly classified as negative.

In our example, the second, third, fourth, tenth, eleventh, twelfth, fifteenth, eighteenth and twentieth elements are actually zero. Out of these the second, third, fourth, eleventh, fifteenth, eighteenth are correctly predicted as zeros. There are 6 true negatives.

b) False Positives are the incorrectly classified positives

In our example, the tenth, twelfth and twentieth elements are actually zero but predicted as ones. There are 3 false positives. False Positive is Type I error.

c) False Negatives are the incorrectly classified negatives

In our example, the fifth, seventh, eighth, ninth, fourteenth elements are actually one but predicted as zeros. There are 5 false negatives. False Negative is Type II error.

d) True Positives are correctly classified positives

In our example, the first, sixth, thirteenth, sixteenth, seventeenth and nineteenth elements are actually one and predicted as one. There are 6 True Positives.

The Confusion Matrix is a matrix where each row represents the actual class instances while each column represents the predicted class instances (or vice versa)

The Results we got earlier was:

[[6 3]
[5 6]]

We will put these values in the Confusion Matrix as shown below

Accuracy is calculated as (TP + TN)/(TP + TN + FP + FN)

In our example, this will be (6+6)/(6+6+5+3) = 0.6

This is exactly the result we got earlier.

You will also hear the terms sensitivity and specificity very frequently.

Sensitivity or True Positive Rate : TP/(TP + FN). In our example 6/(6+5) = 0.55. This is the proportion of the actual positives that are correctly identified as such.

Specificity or True Negative Rate: TN/(TN + FP). In our example 6/(6+3) = 0.67. This is the proportion of the actual negatives that are correctly identified as such.

Hope this article was helpful to you. Thank you.

--

--