Polynomial Regression in R

At first glance, polynomial fits would appear to involve nonlinear regression. In fact, polynomial fits are just linear fits involving predictors of the form x1, x2, …, xd.

Example 1: Polynomial fit

First, let’s generate a data set:

set.seed(16)
x <- 0:50
y <- 2.3 - 15.1*x + 1.2*x^2 + rnorm(length(x), 20, 50)
plot(x, y)

(rnorm just generates random, normally distributed noise.)

Now let’s fit a model using linear regression:

fit <- lm(y ~ 1 + x + I(x^2))
points(x, predict(fit), type="l")

Here’s a summary of the fit:

> summary(fit)
Call:
lm(formula = y ~ 1 + x + I(x^2))
Residuals:
Min 1Q Median 3Q Max
-92.173 -28.968 3.673 24.953 97.269
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 33.84216 18.36178 1.843 0.0715 .
x -15.28705 1.69836 -9.001 7.07e-12 ***
I(x^2) 1.20126 0.03285 36.569 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 45.44 on 48 degrees of freedom
Multiple R-squared: 0.996, Adjusted R-squared: 0.9959
F-statistic: 6034 on 2 and 48 DF, p-value: < 2.2e-16

This example shows that we can apply linear regression outside of cases involving linear data-generating mechanisms. Indeed the word linear in linear regression doesn’t have anything to do with the nature of or relationship between the regressors themselves. Instead it refers to the relationship between the predicted variable and the regressors.

Example 2: Fractional exponents

We can even generalize this beyond polynomials to include non-polynomial expressions involving (say) fractional exponents.

set.seed(16)
x <- 0:100
y <- 5 + 5*x^0.5 - 1.2*x^0.7 + rnorm(length(x), 0, 1)
plot(x, y)
fit <- lm(y ~ 1 + I(x^0.5) + I(x^0.7))
points(x, predict(fit), type="l")

Summary:

> summary(fit)
Call:
lm(formula = y ~ 1 + I(x^0.5) + I(x^0.7))
Residuals:
Min 1Q Median 3Q Max
-2.3337 -0.7138 0.0494 0.7518 3.2104
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.3829 0.6394 8.418 3.23e-13 ***
I(x^0.5) 4.8859 0.4149 11.775 < 2e-16 ***
I(x^0.7) -1.1694 0.1477 -7.917 3.81e-12 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.024 on 98 degrees of freedom
Multiple R-squared: 0.9396, Adjusted R-squared: 0.9384
F-statistic: 762.6 on 2 and 98 DF, p-value: < 2.2e-16

Looking ahead

The models above are either polynomial or polynomial-like, and so linear regression works. There are however some cases where instead of dealing with models having coefficient parameters, we have models where one or more exponents is itself a model parameter. An example would be

|r| ~ a * L^b

where a and b are the parameters to be estimated. There we'll need actual nonlinear regression and I'll write about that in a follow-up post.

Originally published on my old blog on July 6, 2016.