# ML19: The “Linear” in Linear Regression

## Does this “linear” represent linear function or linear map?

**Keywords**: linear regression, linear function, Calculus, linear map, linear transformation, Linear Algebra

A linear regression with *higher-degree terms (degree>1), interaction terms, regularization, *and* stepwise process* is **cheap, timesaving, interpretable and pretty performant**. It’s a great **starting point** and **baseline model** for all ML/DS projects.

Outline

(1)Introduction

(2)Critical Evidence

(3)Answer: Linear Map

(4)Linear Regression: A Cheap, Timesaving, and Performant Model

(5)Books Misunderstanding the “Linear” in Linear Regression

(6)Conclusion

(7)References

# (1) Introduction

## 1. Linear Function

- A concept in
*Calculus*. - Referring to polynomial of degree 1 or 0, e.g., y = ax + b.
- Note that in some contexts, a
*linear map*is also called a*linear function*[1] though it’s rare actually.

## 2. Linear Map

- A concept in
*Linear Algebra*. - A
*linear map*(also called a*linear mapping*,*linear transformation*or, in some contexts,*linear function*) is a mapping V → W between two modules (for example, two vector spaces) that preserves the operations of addition and scalar multiplication. If a*linear map*is a bijection then it is called a linear isomorphism. [1] - We see from figure 1 that the “linear” has dual meanings in Mathematics. Then, what does the “linear” in linear regression represent for?

# (2) Critical Evidence

Let’s look up the notable textbook for graduates in Statistics department all around the world —** ***Applied Linear Regression** **(4th ed.)* [3]— for answer. It’s a bummer that even this textbook doesn’t provide explicit explanation to whether the “linear” in linear regression is linear function or linear map; however, we could find some clues in the textbook.

## 1. Evidence No.1

In the table of contents, the title of chapter 2 is “simple linear regression.”

## 2. Evidence No.2

Here we see the “multiple regression” is actually “multiple linear regression.” Observe that there’s no such term called “multiple linear function” in Calculus.

## 3. Evidence No.3

The description here is exactly the same concept as linear map.

# (3) *Answer: Linear Map*

Consequently, we come by the answer that the “linear” in linear regression is precisely the linear map in Linear Algebra!

Additionally, linear regression has two branches — simple linear regression & multiple linear regression.

# (4) *Linear Regression: A Cheap, Timesaving, and Performant Model*

## 1. Linear Regression with Higher-Degree Terms (degree>1) and Interaction Terms

Quite a few ML/DS books and articles on the Internet misunderstand linear regression and take it as a straight line, i.e., a polynomial of degree 1 or 0; therefore, they miss out on the power of linear regression.

In fact, linear regression *can have higher-degree terms (degree>1) and interaction terms*, which help fit the data more precisely than a simple straight line does.

## 2. Cheap, Timesaving, Interpretable and Pretty Performant

The linear regression with higher-degree terms (degree>1) and interaction terms, is a *cheap, timesaving, interpretable and pretty performant model*. Linear regression is the most basic and the best model to start with in a ML/DS project.

## 3. Starting Point & Baseline Model

Taking linear regression as a starting point, we can *discover the characteristics of the data and choose crucial features* before building up more complex models (e.g., SVM, RF, XGBT, ANN, CNN, RNN), which may cost far more than linear regression.

Moreover, we can *take this complicated linear regression model above as the baseline model to evaluate the performance of each complex models* (e.g., SVM, RF, XGBT, ANN, CNN, RNN). After all, according to Occam’s razor (law of parsimony), why bother employing complex and time-consuming models with accuracies close to the vanilla model — linear regression?

Plus, leveraging regularization (lasso, ridge, elastic net) can help us mitigate overfitting of linear regression and yield better linear regression model.

# (5) Books Misunderstanding the “Linear” in Linear Regression

It’s a shame that most of the ML/DS books and articles on the Internet only discuss simple linear regression. Among them, there are few books “explicitly” misunderstanding linear regression (I highly suspects many authors simply though of linear regression as a straight line but I don’t have enough evidence) and their descriptions are as follows:

1.Kane, F. (2017). Hands-on Data Science and Python Machine Learning. Birmingham, UK: Packt Publishing.

"All it (linear regression) is, is fitting a straight line to a set of data points."2.Joshi, P. (2016). Python Machine Learning Cookbook. Birmingham, UK: Packt Publishing.

"You might say that there might be a curvy line out there that fits these points better, but linear regression doesn't allow this."

On the other hand, there are books “explicitly” realize the true power of linear regression, mentioning the higher-degree terms (degree>1) or interaction terms in linear regression:

1.Albon, C. (2018). Machine Learning with Python Cookbook: Practical Solutions from Preprocessing to Deep Learning. California, CA: O’Reilly Media.2.VanderPlas, J. (2017). Python Data Science Handbook: Essential Tools for Working with Data.California, CA: O’Reilly Media.3.Hackeling, G. (2017). Mastering Machine Learning with scikit-learn (2nd ed.). Birmingham, UK: Packt Publishing.

# (6) Conclusion

- The “linear” in linear regression refers to
in*linear map**linear algebra*rather than*linear function*(polynomial of degree 1 or 0) in*Calculus*. - A linear regression with
*higher-degree terms (degree>1), interaction terms, regularization,*and*stepwise process*would**definitely outperforms**linear regression such as z = ax + by + c. - A linear regression with
*higher-degree terms (degree>1), interaction terms, regularization,*and*stepwise process*is**cheap, timesaving, interpretable and pretty performant**. It’s a great**starting point**for all ML/DS projects for*discovering the characteristics of the data and choose crucial features*before building up more complex models (e.g., SVM, RF, XGBT, ANN, CNN, RNN). - Moreover, we can
*take this complicated linear regression model above as the**baseline model**to evaluate the performance of each complex models*(e.g., SVM, RF, XGBT, ANN, CNN, RNN). The so-called baseline model is not supposed to be like z = ax + by + c or y= ax + b, which are too vanilla. - The reader may check ML20 & ML21 for hands-on linear regression implementation using R & Python respectively.

# (7) References

[1] Wikipedia (Unidentified). Linear map. Retrieved from https://en.wikipedia.org/wiki/Linear_map

[2] Wikipedia (Unidentified). Linear function. Retrieved from https://en.wikipedia.org/wiki/Linear_function

[3] Weisberg, S. (2014). Applied Linear Regression (4th ed.). New Jersey, NJ: John Wiley & Sons.

[4] Albon, C. (2018). Machine Learning with Python Cookbook: Practical Solutions from Preprocessing to Deep Learning. California, CA: O’Reilly Media.

[5] VanderPlas, J. (2017). Python Data Science Handbook: Essential Tools for Working with Data. California, CA: O’Reilly Media.

[6] Hackeling, G. (2017). Mastering Machine Learning with scikit-learn (2nd ed.). Birmingham, UK: Packt Publishing.

[7] Kane, F. (2017). Hands-on Data Science and Python Machine Learning. Birmingham, UK: Packt Publishing.

[8] Joshi, P. (2016). Python Machine Learning Cookbook. Birmingham, UK: Packt Publishing.