Let’s Regress with R using Multiple Regression

Jennifer Williams
Human Systems Data
Published in
4 min readMar 28, 2017

Multiple regression has been around since the early 1900’s and is how we show the relationship between several predictor and criterion variables or the independent and dependent variables. Multiple regression is widely used in the social and natural sciences for purposes of research. This week we are exploring multiple regression with R. I will admit that I have trouble picking up on the R code but will attempt to walk us through it.

The Base R code comes with “packages” that have datasets for the user to practice with, today we are using the dataset mtcars which will be based on the tutorials from tutorialspoint.com. Starting with the lm() function in R which stands for linear model, we need to create a model or a formula (ex: lm(y ~ x1 + x2 + x3, data = …). First, we will start with the dataset mtcars which is a dataset about different types of cars and some of their comparison information.

mpg cyl disp hp drat wt qsec vs am gear carb

Here we will look at only some of the variables for our model — mpg (miles per gallon), disp (displacement), hp (horse power) and wt (weight) of each car.

R code — input <- mtcars [, c (“mpg”, “disp”, “hp”, “wt”)]

This is telling us that we will be using these variables for our data input. We now need to create our model for the relationship of the variables and their coefficients.

R code — model <- lm(mpg ~ disp + hp + wt, data = input)

When we execute this code by doing ctrl + enter we can take a look at our model that we have created for mpg.

R code — print(model)

Coefficients

(Intercept) disp hp wt

37.105505 -0.000937 -0.031157 -3.800891

This shows us the intercept and output for disp hp, and wt. With multiple regression we need to know what the slope and intercept are to build our equation. So we will need to find our coefficient values.

R code — a <- coef(model) [1]

print (a)

(Intercept) 37.10551

Here we are making a which is the intercept of our equation we are building. Now we will get the values of our coefficients for the equation.

R code — Xdisp <- coef(model) [2]

Xhp <- coef(model) [3]

Xwt <- coef(model) [4]

Let’s look at these values now:

R code — print(Xdisp)

print(Xhp)

print(Xwt)

disp -0.0009370091

hp -0.03115655

wt -3.800891

Now that we have established our values for the coefficients we make our formula in R to execute these values.

R code — lm(formula = mpg ~ disp + hp + wt, data = input)

Coefficients:

(Intercept) disp hp wt

37.105505 -0.000937 -0.031157 -3.800891

When we run this code in R we see our intercept and coefficient values for our equation. So we haven’t done this work for nothing, it is leading us to creating our regression model. Now we will input the values for our equation.

y = a + Xdisp*x1 + Xhp*x2 + Xwt*x3

So let’s input the values we received from the R code we ran earlier.

y = 37.15 + (-.000937)*x1 + (-.031157)*x2 + (-3.800891)*x3

We have set up an regression equation that can predict miles per gallon when we know the values of displacement, horsepower and weight of a car. So now we can input our data from mtcars to predict the new value of y.

Let’s see the mpg for the Datsun 710 by replacing x1, x2 and x3 with values from the mtcars dataset. Then we can get a prediction for mpg.

y = 37.15 + (-.000937)*108 + (-.031157)*93 + (-3.800891)*2.320

mpg = 25.33313588

So now I have decided to take one step further and take the mtcars dataset and do an analysis of the same data using mpg, disp, hp and wt.

data(mtcars)

lm(formula = mpg ~ disp + hp + wt, data = mtcars)

Here we need to make a bestfit line for our anova using mtcars.

bestfit <- lm(mpg ~ disp + hp + wt, data = mtcars)

anova(bestfit)

Analysis of Variance Table

Response: mpg

Df Sum Sq Mean Sq F value Pr(>F)

disp 1 808.89 808.89 116.1536 1.797e-11 ***

hp 1 33.67 33.67 4.8342 0.036329 *

wt 1 88.50 88.50 12.7087 0.001331 **

Residuals 28 194.99 6.96

— -

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Now lets plot the bestfit for our analysis.

plot(bestfit)

This shows our data plotted with a line of bestfit for our data for mpg to disp, hp and wt. This week’s R assignment on multiple regression was informative and I hope I was able to explain it well.

References:

https://www.tutorialspoint.com/r/r_multiple_regression.htm

--

--