Regression and performance metrics — Accuracy, precision, RMSE and what not!

Ullas Kakanadan

Published in

Analytics Vidhya

7 min readMar 13, 2020

In the previous blog, we discussed Linear and Logistic Regression.

Well, well, I know that it’s been ages since my last blog…

Go and revise! ( Click here)

On a personal level, I am sorry. I was busy with a few other commitments.

Snap! Back to the business…

How do I check the performance of a model?

You must have heard, for sure, that no model is correct. So better the model, better it fits the data. It’s as simple as that. There are many ways. This post will be periodically updated (At least, that’s what I think so).

Linear/Multiple Regression

First, let’s discuss for the Linear/Multiple Linear Regression

Regression Equation

In the above regression equation, ‘E’ signifies the error that is introduced when the model does not actually fit the data. The objective of a good model is to minimize this error.

Here is how we compare different models:

R-Squared

2. Adjusted R-Squared

3. MSE

4. RMSE

R-Squared

The proportion of variance in the dependent variable that is predicted from the independent variables. The value ranges from 0 to 1.

1 shows that the regression explains the relation perfectly. 0 the opposite.

If R2 (Read it as R-Squared)= 0.43 for the above regression equation, then it means that 43% of the variability in y is explained by the variables x1 and x2.

But there is a flaw. As the number of terms increases, the R2 may stay constant or increase (There are statistical proofs to this — Not discussed here). This happens even if there is no good relationship between the variables and the dependent variable.

So now what? Adjust it!

Adjusted R-Squared

An adjusted value that will consider the relationship between the variables. It will decrease the value for variables that do not improve the existing model.

As simple as it is. You have more variables, go for adj-R2.

Ok?

RMSE

Mean Squared Error is nothing but the average of the squares of the difference between observed and predicted values.

n -> number of terms. y(i) signifies the observed value for the ith term and y(^) speaks about the predicted value for that particular term. The difference is the Error term. You sum up the squares of all the error terms and divide it by the degree of freedom (number of independent variables).

Why square, buddy? So that the error gets accumulated rather than cancel each other.

Ok! But what does this got to do with RMSE? It’s the root of the MSE.

Ohoho… wait!! You first square it and then root it. Well, you must be crazy? Fine. Listen. After you square it, you need to scale down to the original units.

Dollar Square doesn’t make sense. Dollars do.

From all of these, you must have had now some idea about how the models can be compared in a regression.

What about Logistic Regression?

To test the performance of a classification model, a confusion matrix can be used.

In simple words, it is a matrix comprising instances of predicted and actual events.

Representation of the same:

The following four terms are the major ones:

True Positives — Any positive event that has been correctly predicted. E.g.: Passing an examination (positive event) and it was predicted correctly.

True Negatives — Any negative event that has been correctly predicted. E.g.: Not passing an examination (negative event) and it was predicted correctly as ‘failed’.

False Positives — Any negative event that has been predicted positive. E.g.: The patient is not having any disease. But the test results said (predicted) the person is infected. This can have the ill-effects of ‘unnecessarily undergoing unwanted’ medical treatment.

False Negative — Any positive event that has been predicted as negative. E.g.: There is a threat to a public place and the intelligence team fails to identify it i.e. reporting it as not a threat. This can be a serious problem.

The terms speak about the predictions explicitly and implicitly about the actual event.

(Truly/Falsely) predicted the event as (Positive/Negative).

If it’s true, the actual events are the same as prediction and if it’s false the actual events are the opposite.

There are other commonly used terms (often asked in the interviews):

Type I Error — False Positives.

Type II Error — False Negatives.

Sensitivity = Recall = True positive rate = How often did the model predict the positive event correctly. The ratio of correctly predicted positive events to the total positive events.

In Science, sensitivity is the ability to correctly identify positive cases.

Specificity = True Negative rate = How often did the model predict the negative events correctly. The ratio of correctly predicted negative events to the total negative events.

False Positive rate = 1- Specificity = How often the model classified negative events as positive. The ration of incorrectly termed positive events to the total negative events.

Precision = How often the model predicted the event to be positive and it turned out to be true. It would be the ratio of True Positive to cases that were predicted positive.

Accuracy = How often the model predicted correctly. The ratio of the true cases to all the cases.

Are you still confused? Maybe between precision and sensitivity.

Just remember:

Sensitivity is from all actual positive events how good are you in predicting positive events.

Precision is from all positively predicted events how much correctly you predicted.

Accuracy is how correctly you predict all events.

Specificity is how good are you in predicting negative events.

Now what?

How do we know which one to go for? Well, if you are more interested in False Positive cases, go for precision. If False Negative cases are what you are looking for then recall is a good measure.

Accuracy is better if you have a biased scenario. If it’s unbiased — Precision and Recall.

F1 score would help if you are bothered about both values. F1 — Formula One??

No!! It’s a harmonic mean, in other words, a weighted average of precision and recall.

Remember Logistic regression outcomes the probability of the class. There has to be some deciding factor (or value) that decides,” Hey buddy! You are less than 0.5, you go there.” This deciding factor, 0.5 (default) is the threshold value. This threshold helps the logistic regression to classify. After classification, the four major terms are calculated. And the confusion matrix is constructed.

Receiver Operator Characteristic Curve helps in deciding the best threshold value. Let me say a more familiar name of it — ROC Curve. Oh yeah!

ROC curve is basically a graph plotted between True positive rate and False positive rate. So, it is a graph between how often a model predicts positive events as positive and how often a model predicts negative events as positive.

The model along the dashed line would be the worst classifier. It cannot discriminate between the classes. The Area Under the Curve (AUC) would be 0.5 in this case. The model along the green line parallel to the x-axis at the top is the best model. The AUC would be 1. It perfectly classifies positive and negative events.

Any model (curve) between these two will have an area greater than 0.5 and less than 1. This results in overlapping of classes and hence, Type 1 and Type 2 errors are introduced.

For different thresholds, Sensitivity and FPR (1- Specificity) is calculated. Low FP means higher True negatives. A curve is plotted. Depending on how many False Positives to accept, the threshold is selected.

For comparing models, the one with the larger AUC gives you the best. In the above figure, the red one is better than the blue.

Now is that all?

Well on some cases, for example when you have too many negative cases you may go for Precision in case of False Positive. You must know what your aim is first.

That’s all folks for now!

Regression and performance metrics — Accuracy, precision, RMSE and what not!

Linear/Multiple Regression

What about Logistic Regression?

Written by Ullas Kakanadan