R Applications — Part 5: Quantile Regression

Burak Dilber
Data Science Earth
Published in
6 min readFeb 23, 2021

Quantile Regression method was proposed by Koenker and Bassett in 1978. Since linear regression models are not flexible against extreme values; quantile regression models are preferred in datasets with extreme values. In addition, since the quantile regression model is effective against outliers, it does not require assumptions as in linear regression. You can reach detailed information about linear regression from the links below.

So how does quantile regression work? As it is known, linear regression is also called the least squares regression and uses “mean” as the measure of location when estimating. Quantile regression is also referred to as the least absolute value regression, and the measure of location used is “median.” The model for quantile regression is shown below.

As can be seen, it is very similar to the linear regression model. The difference from linear regression is that the dependent variable depends on the quartile value. So what is this quartile value? As we said before, the quantile regression uses the median and here the quartile value can be specified as “0.5” to make the estimation of the dependent variable over the median.

One of the differences of quantile regression from linear regression is that it can make predictions for the dependent variable by using different quartile values. Especially considering that the dependent variable is of great importance in estimating not only the mean of the other quartile values, it will be very useful to use quantile regression in studies performed by some fields such as engineering.

When using linear regression in a dataset with extreme values, it can be seen that these values increase the error. Quantile regression is a flexible method against extreme values. A researcher can change the model according to the state of the extreme values (for example, it can work with different quartile values such as 0.1, 0.5, 0.9).

APPLICATION

The package and function used in R for quantile regression are shown below.

library(quantreg)
rq()

Quantile regression model can be created in the “quantreg” package by using the “rq” function. Let’s model a data set with linear regression and quantile regression.

The data set we will use is the “anscombe” data set in the “datasets” package and the codes are shown below.

library(datasets)
data(“anscombe”)

Thus, we introduced the data set to the system using the R programming language. The output of the data set can be seen below.

Anscombe Data Set

The Anscombe data set was created by Anscombe in 1973 and is a small sample width data set. It consists of 11 observations and 8 variables. Of course, not all variables will be used when doing the analysis here. The relationship between the linear regression and the quantile regression difference and the variables consisting of extreme values will be modeled.

Let’s look at the relationship between “x1” and “y1”:

plot(anscombe$x1,anscombe$y1)
Relationship Between Variables

It seems that there is a linear relationship between the variables. Let’s model this relationship using linear regression and quantile regression.

model_lm<-lm(y1~x1,data = anscombe)
model_qr<-rq(y1~x1,data = anscombe,tau=0.5)

As can be seen, the use of functions in R is similar. The difference of quantile regression is the “tau” parameter. The quartile value to be used should be determined with this parameter. Here it is determined as 0.5 (median). Let’s examine the regression lines on the plot:

orderx<-order(anscombe$x1)
lines(anscombe$x1[orderx],model_lm$fitted.values[orderx])
lines(anscombe$x1[orderx],model_qr$fitted.values[orderx],col=”blue”)
Comparisons of Quantile Regression and Linear Regression

The blue regression line shown in the graph represents quantile regression. The other belongs to linear regression. As seen in the figure, the regression lines are seen very close to each other. Let’s examine the MSE values:

mean((anscombe$y1-model_lm$fitted.values)²)
[1] 1.251154
mean((anscombe$y1-model_qr$fitted.values)²)
[1] 1.258682

We see that MSE values are very close to each other.

We have said that the quantile regression is resistant to outliers. Now let’s continue our application by modeling the relationship between variables that are also in outlier:

plot(anscombe$x1,anscombe$y3)
Relationship Between Variables

The figure shows the relationship between the “x1” and “y3” variables of the anscombe data set. As can be seen, there is one outlier observation. Let’s model this relationship with linear regression and quantile regression:

model_lm<-lm(y3~x1,data = anscombe)
model_qr<-rq(y3~x1,data = anscombe,tau=0.5)
orderx<-order(anscombe$x1)
lines(anscombe$x1[orderx],model_lm$fitted.values[orderx])
lines(anscombe$x1[orderx],model_qr$fitted.values[orderx],col=”blue”)
Comparisons of Quantile Regression and Linear Regression

As can be seen in the figure, while the linear regression line is affected by outliers, the quantile regression line is not affected by outliers.

Also; We have stated that quantile regression can be used for different quartile values. Let’s make another example of this:

plot(anscombe$x1,anscombe$y1)
model_qr_01<-rq(y1~x1,data = anscombe,tau=0.1)
lines(anscombe$x1[orderx],model_qr_01$fitted.values[orderx],col=”red”)
model_qr_05<-rq(y1~x1,data = anscombe,tau=0.5)
lines(anscombe$x1[orderx],model_qr_05$fitted.values[orderx],col=”green”)
model_qr_09<-rq(y1~x1,data = anscombe,tau=0.9)
lines(anscombe$x1[orderx],model_qr_09$fitted.values[orderx],col=”blue”)
Quantile Regression (0.1, 0.5 and 0.9 quartile values)

Here, the quantile regression lines for the different quartiles are shown. The regression line indicated in red indicates 0.1 quartile value and it can be seen that it uses the lower part of the data structure. The value of 0.5 quartile is indicated in green and is used for the median of the data set. Finally, the value of 0.9 quartile is shown in blue and it can be said that it uses the upper part of the data structure.

Finally, let’s look at how a quantile regression model is created for the median:

summary(model_qr_05)
Summary for Quantile Regression (for Median)

As can be seen, we can see the summary output of the quantile regression model using the “summary ()” function in R. Here the “coefficients” column shows the estimated values of the parameters. We can create our model using the predicted values for these parameters.

Thus, we modeled the relationship between variables for the median value of quantile regression.

CONCLUSION

Here, we examined the quantile regression models, which is an alternative approach to linear regression. Quantile regression is a preferred method since it does not require assumptions against extreme values as in flexible and linear regression. In addition, we have seen how to construct the quantile regression model by using the data set in different quartile values. See you in my next article.

Have a nice day :)

REFERENCES

  • Anscombe, Francis J. (1973), Graphs in statistical analysis. The American Statistician, 27, 17–21. doi: 10.2307/2682899.
  • Çınar, U. K. (2019), En Küçük Kareler Regresyonuna Alternatif Bir Yöntem: Kantil Regresyon, Avrasya Uluslararası Araştırmalar Dergisi, 7, 57–71.
  • Furno, M and Vistocco, D (2014), Quantile Regression — Estimation and Simulation. Wiley Series.
  • Koenker, R. and G. Bassett (1978), Regression Quantiles, Econometrica, 46, 33–50.

--

--