Analytics Vidhya
Published in

Analytics Vidhya

ML19: The “Linear” in Linear Regression

Does this “linear” represent linear function or linear map?

(1) Introduction

1. Linear Function

  • A concept in Calculus.
  • Referring to polynomial of degree 1 or 0, e.g., y = ax + b.
  • Note that in some contexts, a linear map is also called a linear function [1] though it’s rare actually.
Figure 1: “Linear function” from Wikipedia. [2]

2. Linear Map

  • A concept in Linear Algebra.
  • A linear map (also called a linear mapping, linear transformation or, in some contexts, linear function) is a mapping V → W between two modules (for example, two vector spaces) that preserves the operations of addition and scalar multiplication. If a linear map is a bijection then it is called a linear isomorphism. [1]
  • We see from figure 1 that the “linear” has dual meanings in Mathematics. Then, what does the “linear” in linear regression represent for?
Figure 2: “Linear map” from Wikipedia. [1]

(2) Critical Evidence

Let’s look up the notable textbook for graduates in Statistics department all around the world — Applied Linear Regression (4th ed.) [3]— for answer. It’s a bummer that even this textbook doesn’t provide explicit explanation to whether the “linear” in linear regression is linear function or linear map; however, we could find some clues in the textbook.

1. Evidence No.1

Figure 3: Applied Linear Regression (4th ed.), P vii. [3]

2. Evidence No.2

Figure 4: Applied Linear Regression (4th ed.), P 51. [3]

3. Evidence No.3

Figure 5: Applied Linear Regression (4th ed.), P 55. [3]

(3) Answer: Linear Map

Consequently, we come by the answer that the “linear” in linear regression is precisely the linear map in Linear Algebra!

(4) Linear Regression: A Cheap, Timesaving, and Performant Model

1. Linear Regression with Higher-Degree Terms (degree>1) and Interaction Terms

Quite a few ML/DS books and articles on the Internet misunderstand linear regression and take it as a straight line, i.e., a polynomial of degree 1 or 0; therefore, they miss out on the power of linear regression.

2. Cheap, Timesaving, Interpretable and Pretty Performant

The linear regression with higher-degree terms (degree>1) and interaction terms, is a cheap, timesaving, interpretable and pretty performant model. Linear regression is the most basic and the best model to start with in a ML/DS project.

3. Starting Point & Baseline Model

Taking linear regression as a starting point, we can discover the characteristics of the data and choose crucial features before building up more complex models (e.g., SVM, RF, XGBT, ANN, CNN, RNN), which may cost far more than linear regression.

(5) Books Misunderstanding the “Linear” in Linear Regression

It’s a shame that most of the ML/DS books and articles on the Internet only discuss simple linear regression. Among them, there are few books “explicitly” misunderstanding linear regression (I highly suspects many authors simply though of linear regression as a straight line but I don’t have enough evidence) and their descriptions are as follows:

1. Kane, F. (2017). Hands-on Data Science and Python Machine Learning. Birmingham, UK: Packt Publishing.
"All it (linear regression) is, is fitting a straight line to a set of data points."
2. Joshi, P. (2016). Python Machine Learning Cookbook. Birmingham, UK: Packt Publishing.
"You might say that there might be a curvy line out there that fits these points better, but linear regression doesn't allow this."
1. Albon, C. (2018). Machine Learning with Python Cookbook: Practical Solutions from Preprocessing to Deep Learning. California, CA: O’Reilly Media.2. VanderPlas, J. (2017). Python Data Science Handbook: Essential Tools for Working with Data. California, CA: O’Reilly Media.3. Hackeling, G. (2017). Mastering Machine Learning with scikit-learn (2nd ed.).  Birmingham, UK: Packt Publishing.

(6) Conclusion

  1. The “linear” in linear regression refers to linear map in linear algebra rather than linear function (polynomial of degree 1 or 0) in Calculus.
  2. A linear regression with higher-degree terms (degree>1), interaction terms, regularization, and stepwise process would definitely outperforms linear regression such as z = ax + by + c.
  3. A linear regression with higher-degree terms (degree>1), interaction terms, regularization, and stepwise process is cheap, timesaving, interpretable and pretty performant. It’s a great starting point for all ML/DS projects for discovering the characteristics of the data and choose crucial features before building up more complex models (e.g., SVM, RF, XGBT, ANN, CNN, RNN).
  4. Moreover, we can take this complicated linear regression model above as the baseline model to evaluate the performance of each complex models (e.g., SVM, RF, XGBT, ANN, CNN, RNN). The so-called baseline model is not supposed to be like z = ax + by + c or y= ax + b, which are too vanilla.
  5. The reader may check ML20 & ML21 for hands-on linear regression implementation using R & Python respectively.

(7) References

[1] Wikipedia (Unidentified). Linear map. Retrieved from https://en.wikipedia.org/wiki/Linear_map

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Yu-Cheng Kuo

Yu-Cheng Kuo

62 Followers

ML/DS using Python & R. A Taiwanese earned MBA from NCCU and BS from NTHU with MATH major & ECON minor. Email: yc.kuo.28@gmail.com