Polynomial Regression in R

Willie Wheeler
Aug 1, 2017 · 3 min read

At first glance, polynomial fits would appear to involve nonlinear regression. In fact, polynomial fits are just linear fits involving predictors of the form x1, x2, …, xd.

Example 1: Polynomial fit

First, let’s generate a data set:

set.seed(16)
x <- 0:50
y <- 2.3 - 15.1*x + 1.2*x^2 + rnorm(length(x), 20, 50)
plot(x, y)

(rnorm just generates random, normally distributed noise.)

Now let’s fit a model using linear regression:

fit <- lm(y ~ 1 + x + I(x^2))
points(x, predict(fit), type="l")
Image for post
Image for post

Here’s a summary of the fit:

> summary(fit)Call:
lm(formula = y ~ 1 + x + I(x^2))
Residuals:
Min 1Q Median 3Q Max
-92.173 -28.968 3.673 24.953 97.269
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 33.84216 18.36178 1.843 0.0715 .
x -15.28705 1.69836 -9.001 7.07e-12 ***
I(x^2) 1.20126 0.03285 36.569 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 45.44 on 48 degrees of freedom
Multiple R-squared: 0.996, Adjusted R-squared: 0.9959
F-statistic: 6034 on 2 and 48 DF, p-value: < 2.2e-16

This example shows that we can apply linear regression outside of cases involving linear data-generating mechanisms. Indeed the word linear in linear regression doesn’t have anything to do with the nature of or relationship between the regressors themselves. Instead it refers to the relationship between the predicted variable and the regressors.

Example 2: Fractional exponents

We can even generalize this beyond polynomials to include non-polynomial expressions involving (say) fractional exponents.

set.seed(16)
x <- 0:100
y <- 5 + 5*x^0.5 - 1.2*x^0.7 + rnorm(length(x), 0, 1)
plot(x, y)
fit <- lm(y ~ 1 + I(x^0.5) + I(x^0.7))
points(x, predict(fit), type="l")
Image for post
Image for post

Summary:

> summary(fit)Call:
lm(formula = y ~ 1 + I(x^0.5) + I(x^0.7))
Residuals:
Min 1Q Median 3Q Max
-2.3337 -0.7138 0.0494 0.7518 3.2104
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.3829 0.6394 8.418 3.23e-13 ***
I(x^0.5) 4.8859 0.4149 11.775 < 2e-16 ***
I(x^0.7) -1.1694 0.1477 -7.917 3.81e-12 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.024 on 98 degrees of freedom
Multiple R-squared: 0.9396, Adjusted R-squared: 0.9384
F-statistic: 762.6 on 2 and 98 DF, p-value: < 2.2e-16

Looking ahead

The models above are either polynomial or polynomial-like, and so linear regression works. There are however some cases where instead of dealing with models having coefficient parameters, we have models where one or more exponents is itself a model parameter. An example would be

|r| ~ a * L^b

where a and b are the parameters to be estimated. There we'll need actual nonlinear regression and I'll write about that in a follow-up post.

Originally published on my old blog on July 6, 2016.

wwblog

Willie Wheeler's personal blog

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch

Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore

Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store