ML19: The “Linear” in Linear Regression

Does this “linear” represent linear function or linear map?

Yu-Cheng (Morton) Kuo
Analytics Vidhya
6 min readJan 6, 2021

--

Keywords: linear regression, linear function, Calculus, linear map, linear transformation, Linear Algebra

A linear regression with higher-degree terms (degree>1), interaction terms, regularization, and stepwise process is cheap, timesaving, interpretable and pretty performant. It’s a great starting point and baseline model for all ML/DS projects.

Outline
(1)
Introduction
(2)
Critical Evidence
(3)
Answer: Linear Map
(4)
Linear Regression: A Cheap, Timesaving, and Performant Model
(5)
Books Misunderstanding the “Linear” in Linear Regression
(6)
Conclusion
(7)
References

(1) Introduction

1. Linear Function

  • A concept in Calculus.
  • Referring to polynomial of degree 1 or 0, e.g., y = ax + b.
  • Note that in some contexts, a linear map is also called a linear function [1] though it’s rare actually.
Figure 1: “Linear function” from Wikipedia. [2]

2. Linear Map

  • A concept in Linear Algebra.
  • A linear map (also called a linear mapping, linear transformation or, in some contexts, linear function) is a mapping V → W between two modules (for example, two vector spaces) that preserves the operations of addition and scalar multiplication. If a linear map is a bijection then it is called a linear isomorphism. [1]
  • We see from figure 1 that the “linear” has dual meanings in Mathematics. Then, what does the “linear” in linear regression represent for?
Figure 2: “Linear map” from Wikipedia. [1]

(2) Critical Evidence

Let’s look up the notable textbook for graduates in Statistics department all around the world — Applied Linear Regression (4th ed.) [3]— for answer. It’s a bummer that even this textbook doesn’t provide explicit explanation to whether the “linear” in linear regression is linear function or linear map; however, we could find some clues in the textbook.

1. Evidence No.1

Figure 3: Applied Linear Regression (4th ed.), P vii. [3]

In the table of contents, the title of chapter 2 is “simple linear regression.”

2. Evidence No.2

Figure 4: Applied Linear Regression (4th ed.), P 51. [3]

Here we see the “multiple regression” is actually “multiple linear regression.” Observe that there’s no such term called “multiple linear function” in Calculus.

3. Evidence No.3

Figure 5: Applied Linear Regression (4th ed.), P 55. [3]

The description here is exactly the same concept as linear map.

(3) Answer: Linear Map

Consequently, we come by the answer that the “linear” in linear regression is precisely the linear map in Linear Algebra!

Additionally, linear regression has two branches — simple linear regression & multiple linear regression.

(4) Linear Regression: A Cheap, Timesaving, and Performant Model

1. Linear Regression with Higher-Degree Terms (degree>1) and Interaction Terms

Quite a few ML/DS books and articles on the Internet misunderstand linear regression and take it as a straight line, i.e., a polynomial of degree 1 or 0; therefore, they miss out on the power of linear regression.

In fact, linear regression can have higher-degree terms (degree>1) and interaction terms, which help fit the data more precisely than a simple straight line does.

2. Cheap, Timesaving, Interpretable and Pretty Performant

The linear regression with higher-degree terms (degree>1) and interaction terms, is a cheap, timesaving, interpretable and pretty performant model. Linear regression is the most basic and the best model to start with in a ML/DS project.

3. Starting Point & Baseline Model

Taking linear regression as a starting point, we can discover the characteristics of the data and choose crucial features before building up more complex models (e.g., SVM, RF, XGBT, ANN, CNN, RNN), which may cost far more than linear regression.

Moreover, we can take this complicated linear regression model above as the baseline model to evaluate the performance of each complex models (e.g., SVM, RF, XGBT, ANN, CNN, RNN). After all, according to Occam’s razor (law of parsimony), why bother employing complex and time-consuming models with accuracies close to the vanilla model — linear regression?

Plus, leveraging regularization (lasso, ridge, elastic net) can help us mitigate overfitting of linear regression and yield better linear regression model.

(5) Books Misunderstanding the “Linear” in Linear Regression

It’s a shame that most of the ML/DS books and articles on the Internet only discuss simple linear regression. Among them, there are few books “explicitly” misunderstanding linear regression (I highly suspects many authors simply though of linear regression as a straight line but I don’t have enough evidence) and their descriptions are as follows:

On the other hand, there are books “explicitly” realize the true power of linear regression, mentioning the higher-degree terms (degree>1) or interaction terms in linear regression:

(6) Conclusion

  1. The “linear” in linear regression refers to linear map in linear algebra rather than linear function (polynomial of degree 1 or 0) in Calculus.
  2. A linear regression with higher-degree terms (degree>1), interaction terms, regularization, and stepwise process would definitely outperforms linear regression such as z = ax + by + c.
  3. A linear regression with higher-degree terms (degree>1), interaction terms, regularization, and stepwise process is cheap, timesaving, interpretable and pretty performant. It’s a great starting point for all ML/DS projects for discovering the characteristics of the data and choose crucial features before building up more complex models (e.g., SVM, RF, XGBT, ANN, CNN, RNN).
  4. Moreover, we can take this complicated linear regression model above as the baseline model to evaluate the performance of each complex models (e.g., SVM, RF, XGBT, ANN, CNN, RNN). The so-called baseline model is not supposed to be like z = ax + by + c or y= ax + b, which are too vanilla.
  5. The reader may check ML20 & ML21 for hands-on linear regression implementation using R & Python respectively.

(7) References

[1] Wikipedia (Unidentified). Linear map. Retrieved from https://en.wikipedia.org/wiki/Linear_map

[2] Wikipedia (Unidentified). Linear function. Retrieved from https://en.wikipedia.org/wiki/Linear_function

[3] Weisberg, S. (2014). Applied Linear Regression (4th ed.). New Jersey, NJ: John Wiley & Sons.

[4] Albon, C. (2018). Machine Learning with Python Cookbook: Practical Solutions from Preprocessing to Deep Learning. California, CA: O’Reilly Media.

[5] VanderPlas, J. (2017). Python Data Science Handbook: Essential Tools for Working with Data. California, CA: O’Reilly Media.

[6] Hackeling, G. (2017). Mastering Machine Learning with scikit-learn (2nd ed.). Birmingham, UK: Packt Publishing.

[7] Kane, F. (2017). Hands-on Data Science and Python Machine Learning. Birmingham, UK: Packt Publishing.

[8] Joshi, P. (2016). Python Machine Learning Cookbook. Birmingham, UK: Packt Publishing.

--

--

Yu-Cheng (Morton) Kuo
Analytics Vidhya

CS/DS blog with C/C++/Embedded Systems/Python. Embedded Software Engineer. Email: yc.kuo.28@gmail.com