Multiple Regression Venture

Sainjeev Srikantha
Human Systems Data
Published in
3 min readMar 29, 2017

A linear regression shows how one variable effects another. Like the name suggests it shows how the changing x intercept influences the changing y intercept. If this is starting to sound like math it’s because it is. Remember slope? Y=mX + b, where Y is the value we wish to find, m is our slope or change, X is our variable, and b is our intercept, the value of Y when X is zero. Now lets switch gears and dive into data analysis. In research the X intercept is our independent variable, or influencing variable, and the Y intercept is our dependent variable, or measure. So now that we know this, we can use our slope or linear regression equation to find how the independent variable influences the dependent variable, and even make predictions of how the independent variable will effect the dependent variable (Linear Regression, 2017).

Now lets get a little more complicated. A multiple regression is similar to how a linear regression except instead of one independent variable effecting a dependent variable, there are multiple independent variables influencing a dependent variable . The equation for multiple regressions look like this, Y= a + b1X1 +b2X2 where Y is our dependent variable, b1 and b2 are our different coefficients and X1 and X2 are our different independent variables (R Multiple Regression, 2017).

After reading through R Multiple Regression tutorial and following along how to perform a multiple regression on the mtcars data set in R, I was ready to do my own multiple regression, I think. I decided to use the data set esoph in R. Data set esoph shows the data of 88 subjects and shows the number of cancer cases, number of control patients, age group, amount of alcohol consumed, and amount of tobacco consumed per subject (Crawley, 2015). My goal is to see how the amount of alcohol consumed and the amount of tobacco consumed influence the number of cancer cases.

Using R Studio I will attempt a multiple regression.

load(esoph)
View(esoph)

This command loaded the esoph data sets

Data <- esoph[,c(“ncases”, “alcgp”,”tobgp” )]
print(head(Data))
print(Data)
View(Data)

Using the esoph data set I created a new data frame looking at only the number of cancer cases and amount of alcohol and tobacco consumed

Now that the variables are isolated I created a relationship between the variables in order to find how alcohol consumption and tobacco consumption effects the number of cancer cases.

relationship <- glm(ncases ~ factor(alcgp) + factor(tobgp), data = Data)print(relationship)Call: glm(formula = ncases ~ factor(alcgp) + factor(tobgp), data = Data)Coefficients:
(Intercept) factor(alcgp).L factor(alcgp).Q factor(alcgp).C factor(tobgp).L factor(tobgp).Q factor(tobgp).C
2.2158 0.3504 -1.1429 0.7824 -1.2943 0.4095 0.1733
Degrees of Freedom: 87 Total (i.e. Null); 81 Residual
Null Deviance: 659.5
Residual Deviance: 571 AIC: 430.3

The relationship between alcohol and tobacco consumption on number of cancer cases gives us the intercept and coefficients for a multiple regression equation. I used the coefficient values ending in L, factor(alcgp).L, because those are the coefficients for a linear line.

The equation I came up with is:

Y = 2.2158 + ( 0.3504)x1 + (-1.2943)x2

Now using this equation I can predict the number of cancer cases. One thing that I realized about this data set is that the independent variables are categorical and range from 1–4 and translate to an amount of alcohol or tobacco consumed. Using the categorical numbers, 1–4, for the independent variables we can predict our dependent variable.

Works Cited

Crawley, M. J. (2015). The R book. Chichester: Wiley.

Linear Regression. (n.d.). Retrieved March 29, 2017, from http://www.stat.yale.edu/Courses/1997-98/101/linreg.htm

T. (n.d.). R Multiple Regression. Retrieved March 29, 2017, from https://www.tutorialspoint.com/r/r_multiple_regression.htm

--

--