# Polynomial Regression in R

Aug 1, 2017 · 3 min read

At first glance, polynomial fits would appear to involve nonlinear regression. In fact, polynomial fits are just linear fits involving predictors of the form x1, x2, …, xd.

# Example 1: Polynomial fit

First, let’s generate a data set:

`set.seed(16)x <- 0:50y <- 2.3 - 15.1*x + 1.2*x^2 + rnorm(length(x), 20, 50)plot(x, y)`

(`rnorm` just generates random, normally distributed noise.)

Now let’s fit a model using linear regression:

`fit <- lm(y ~ 1 + x + I(x^2))points(x, predict(fit), type="l")`

Here’s a summary of the fit:

`> summary(fit)Call:lm(formula = y ~ 1 + x + I(x^2))Residuals:        Min      1Q  Median      3Q     Max    -92.173 -28.968   3.673  24.953  97.269Coefficients:                          Estimate Std. Error t value Pr(>|t|)             (Intercept)  33.84216   18.36178   1.843   0.0715 .             x           -15.28705    1.69836  -9.001 7.07e-12 ***             I(x^2)        1.20126    0.03285  36.569  < 2e-16 ***             ---             Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1Residual standard error: 45.44 on 48 degrees of freedomMultiple R-squared:  0.996,    Adjusted R-squared:  0.9959F-statistic:  6034 on 2 and 48 DF,  p-value: < 2.2e-16`

This example shows that we can apply linear regression outside of cases involving linear data-generating mechanisms. Indeed the word linear in linear regression doesn’t have anything to do with the nature of or relationship between the regressors themselves. Instead it refers to the relationship between the predicted variable and the regressors.

# Example 2: Fractional exponents

We can even generalize this beyond polynomials to include non-polynomial expressions involving (say) fractional exponents.

`set.seed(16)x <- 0:100y <- 5 + 5*x^0.5 - 1.2*x^0.7 + rnorm(length(x), 0, 1)plot(x, y)fit <- lm(y ~ 1 + I(x^0.5) + I(x^0.7))points(x, predict(fit), type="l")`

Summary:

`> summary(fit)Call:lm(formula = y ~ 1 + I(x^0.5) + I(x^0.7))Residuals:        Min      1Q  Median      3Q     Max    -2.3337 -0.7138  0.0494  0.7518  3.2104Coefficients:                        Estimate Std. Error t value Pr(>|t|)            (Intercept)   5.3829     0.6394   8.418 3.23e-13 ***            I(x^0.5)      4.8859     0.4149  11.775  < 2e-16 ***            I(x^0.7)     -1.1694     0.1477  -7.917 3.81e-12 ***            ---            Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1Residual standard error: 1.024 on 98 degrees of freedomMultiple R-squared:  0.9396,    Adjusted R-squared:  0.9384F-statistic: 762.6 on 2 and 98 DF,  p-value: < 2.2e-16`

The models above are either polynomial or polynomial-like, and so linear regression works. There are however some cases where instead of dealing with models having coefficient parameters, we have models where one or more exponents is itself a model parameter. An example would be

`|r| ~ a * L^b`

where `a` and `b` are the parameters to be estimated. There we'll need actual nonlinear regression and I'll write about that in a follow-up post.

Originally published on my old blog on July 6, 2016.

Written by

Written by