# Polynomial Regression in R

At first glance, polynomial fits would appear to involve nonlinear regression. In fact, polynomial fits are just linear fits involving predictors of the form *x*1, *x*2, …, *xd*.

### Example 1: Polynomial fit

First, let’s generate a data set:

`set.seed(16)`

x <- 0:50

y <- 2.3 - 15.1*x + 1.2*x^2 + rnorm(length(x), 20, 50)

plot(x, y)

(`rnorm`

just generates random, normally distributed noise.)

Now let’s fit a model using linear regression:

`fit <- lm(y ~ 1 + x + I(x^2))`

points(x, predict(fit), type="l")

Here’s a summary of the fit:

`> summary(fit)`

`Call:`

lm(formula = y ~ 1 + x + I(x^2))

`Residuals:`

Min 1Q Median 3Q Max

-92.173 -28.968 3.673 24.953 97.269

`Coefficients:`

Estimate Std. Error t value Pr(>|t|)

(Intercept) 33.84216 18.36178 1.843 0.0715 .

x -15.28705 1.69836 -9.001 7.07e-12 ***

I(x^2) 1.20126 0.03285 36.569 < 2e-16 ***

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

`Residual standard error: 45.44 on 48 degrees of freedom`

Multiple R-squared: 0.996, Adjusted R-squared: 0.9959

F-statistic: 6034 on 2 and 48 DF, p-value: < 2.2e-16

This example shows that we can apply linear regression outside of cases involving linear data-generating mechanisms. Indeed the word *linear* in linear regression doesn’t have anything to do with the nature of or relationship between the regressors themselves. Instead it refers to the relationship between the predicted variable and the regressors.

### Example 2: Fractional exponents

We can even generalize this beyond polynomials to include non-polynomial expressions involving (say) fractional exponents.

`set.seed(16)`

x <- 0:100

y <- 5 + 5*x^0.5 - 1.2*x^0.7 + rnorm(length(x), 0, 1)

plot(x, y)

fit <- lm(y ~ 1 + I(x^0.5) + I(x^0.7))

points(x, predict(fit), type="l")

Summary:

`> summary(fit)`

`Call:`

lm(formula = y ~ 1 + I(x^0.5) + I(x^0.7))

`Residuals:`

Min 1Q Median 3Q Max

-2.3337 -0.7138 0.0494 0.7518 3.2104

`Coefficients:`

Estimate Std. Error t value Pr(>|t|)

(Intercept) 5.3829 0.6394 8.418 3.23e-13 ***

I(x^0.5) 4.8859 0.4149 11.775 < 2e-16 ***

I(x^0.7) -1.1694 0.1477 -7.917 3.81e-12 ***

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

`Residual standard error: 1.024 on 98 degrees of freedom`

Multiple R-squared: 0.9396, Adjusted R-squared: 0.9384

F-statistic: 762.6 on 2 and 98 DF, p-value: < 2.2e-16

### Looking ahead

The models above are either polynomial or polynomial-like, and so linear regression works. There are however some cases where instead of dealing with models having coefficient parameters, we have models where one or more exponents is itself a model parameter. An example would be

`|r| ~ a * L^b`

where `a`

and `b`

are the parameters to be estimated. There we'll need actual nonlinear regression and I'll write about that in a follow-up post.

*Originally published on my old blog on July 6, 2016.*