Moving Beyond Linearity — ISLR Series Chapter 7
Linear Regression is a great and simple model especially adding in some regularizers. However one of the limitations of linear regression is that we are assuming that the relationship between the features and the target variables are close to linear. This is a big assumption and it is not always the case. For these scenarios, Chapter 7 goes over some additional techniques to linear or logistic regression we can use to tackle situations where the relationship is not linear.
A simple linear model looks like this:
What polynomial regression does is that it adds additional terms to this model by raising the features to a power. Each term gets its own coefficient. The model with the additional polynomial terms looks like this:
If we look at the gray circles in the figure above, clearly the relationship between age (feature) and wage (target) is not linear. But if we use a polynomial regression with a degree of 4 (d=4) then we get the blue solid line (the blue dashed line is the 95% confidence interval). The equation of the blue line becomes (x0 = age):
Since the d=4, the model adds an additional term for each power from 1 to 4. And each term gets its own coefficient. Now we just treat this polynomial model the same way as any other regression model.
The concept behind stepwise functions is to divide the features into bins. And for each bin, multiply a different coefficient and a different constant, C.
The feature X is divided into K bins. I is an indicator function that returns a boolean if the feature meets the cutpoint c.
For stepwise functions, the model looks like:
Reading the stepwise graphically is straightforward. If the Age (X value) is less than about 35, the predicted wage will be 100. If the Age is greater than 35 and less than about 65, predict the wage to be ~ 125. If Age is greater than 65 the wage should be around 110.
Another alternative is to use piecewise polynomials. We can think of piecewise polynomials as a combination of stepwise and polynomial regression in that it divides the features into regions (similar to stepwise) and then apply a different polynomial to each of that region (polynomial regression). This is different from polynomial regression because polynomial regression applies a polynomial function to the ENTIRE data. This is different from a stepwise regression because it applies a function in each of the regions instead of just a constant. An example of a piecewise polynomial regression that separates the features into two regions is below:
The model uses a different polynomial function if the feature xi, is less than the cutoff point c, than when xi is greater than or equal to c. To perform piecewise polynomials effectively, some constraints can be applied such as continuity and choosing the correct number of knots or cut off points. As usual, cross validation can help figure out which is the best model for the data.
Generalized Additive Models
Generalized Additive Models or GAMs sums (pun intended) up all these functions into one family of models. Previously we were discussing single features but what if we had multiple features? GAMs solves this issue by maintaining additivity linear models. As a reminder, additivity just states that the target is a sum of each of the features. The following is a typical model format that a GAM follows:
For each of the features in a multiple regression model, it will apply a different model while maintaining additivity.
Using GAMs extends the family of linear models to include relationships that are not linear in nature. One drawback that GAMs do not fix is the interaction effect.