OLS, median, and quantile regression: modelling tourism expenditures

OLS regression analysis is the working horse of statistical modelling in research and data science. In this post, I am going to present median and quantile regression as alternatives to the plain-vanilla OLS regression model. Our example case is how the total expenditures of a tourism trip, y, may be thought of — and hence modelled — as the result or “function” of two x-variables: trip length in days (i.e., length of stay) and destination choice for the trip. There are to two alternatives for the destination variable: Nordic destination or beyond Nordic destination. We have data on 444 Norwegian students, and you can download the data from here (under the heading “Support Material”). As usual, I eschew equations, formulas, and abstract reasoning anyway I can.

Modelling total trip expenditures by OLS regression

OLS regression is all about how the mean (average) of a y-variable changes with one-unit increases in the x-variables. In our case, this implies how the mean of total trip expenditures changes when we compare, say, a four-day trip with a five-day trip. Or when comparing a Nordic trip (coded 0) with a beyond Nordic trip (coded 1). Figure 1 shows how trip expenditures and trip length are associated without taking the destination choice variable into account. The slope — or regression coefficient, which is the term I prefer — for the trip length variable is almost 34: on average, a five-day trip incurs 34 Euros more in total expenditures than a four-day trip. (If you are rusty on OLS regression analysis, please see my primer here.)

Figure 1.

Figure 2, in contrast, is based on a multiple OLS regression including both the trip length and the destination choice variable (see also Table 1 below). The slope or regression coefficient for the trip length variable is 29 (i.e., down from 34 in Figure 1), whereas the regression coefficient for the destination choice variable is 437: on average, trips to beyond Nordic destinations incur 437 Euros more than trips to Nordic destinations. In short, longer stays and trips beyond the Nordic countries cost more than shorter stays and Nordic trips.

Figure 2.

Modelling total trip expenditures by median regression

Median regression is all about how the median of a y-variable changes with one-unit increases in the x-variables. In our case, this implies how the median of trip expenditures changes when we compare, say, a four-day trip with a five-day trip or a Nordic trip with a beyond Nordic trip. Table 1 shows the median regression coefficients in question along with the OLS regression coefficients mentioned above. In general, but regarding the median, median regression coefficients should be interpreted as OLS coefficients. We note that the median regression coefficients are somewhat smaller in magnitude than their OLS counterparts. That is, trip length and destination choice appear to matter a bit less for the median of expenditures as compared to the mean of expenditures. This pattern is also shown in Figure 3, where we note that the median regression lines are less steep.

Table 1.
Figure 3.

Modelling total trip expenditures by quantile regression

The median splits the total trip expenditure variable at the middle of its distribution — that is, at the 50th percentile or quantile. Yet we can split the expenditure variable at any such quantile, and this, in essence, is what quantile regression does. The quantile regression variant of the former regressions appears in Figure 4.

Figure 4.

Five things are noteworthy with respect to Figure 4. First, the horizontal and solid lines are the OLS regression coefficients: 29 (trip length) and 437 (destination choice). Second, the horizontal and dashed lines are the 95% confidence intervals for these OLS coefficients. Third, the curved lines show the various quantile regression coefficients across the distribution of the trip expenditures variable. Fourth, and most importantly, the impact of the trip length variable and the destination choice variable appear to get larger as we move from the lower quantiles and toward the higher quantiles. That is, trip length and destination choice seem to matter less as x-variables for less costly trips (i.e., on the left-hand side of the plots) than they do for more costly trips (i.e., on the right-hand side of the plots). Five, the read areas mark the 95% confidence intervals for the quantile regression coefficients. In other words, when there is (much) overlap between the red areas and the dashed lines, we should be cautious in interpreting the quantile coefficients as different from the OLS coefficients.

Takeaways

OLS regression answers what happens to the (conditional) mean of y when x increases by one unit. Median regression answers what happens to the (conditional) median of y when x increases by one unit. Quantile regression finds the similar regression coefficients for the quantiles across the whole distribution of y. Sometimes these three techniques tell the same story about the data. In others, like above, they tell a somewhat different story.

About me

I’m Christer Thrane, a sociologist and professor at Inland University College, Norway. I have written two textbooks on applied regression modeling and applied statistical modeling. Both are published by Routledge, and you find them here and here. I am on ResearchGate here, and you also reach me at christer.thrane@inn.no

--

--

Christer Thrane (christer.thrane@inn.no)

I am Christer Thrane, a sociologist and professor at Inland University College, Norway. You find me on ResearchGate. I do lots of regression modeling ... :-)