Regression showdown: R vs SPSS

Jacob Willinger
Human Systems Data
Published in
5 min readMar 29, 2017

Straying from a conventional reading this week, our instructions were to work through an exercise on how to run a multiple regression in R, found here: https://www.tutorialspoint.com/r/r_multiple_regression.htm.

Per suggestion of our instructor, I used this as an opportunity to compare and contrast the regression process between R and SPSS, focusing exclusively on the utility of the outputs.

I chose not to focus on comparing the process leading up to the output because of the fundamental differences between the two programs. R requires active input manipulation and an understanding of the core functionality (as well as a predisposition for patience). If you know R well, you can manipulate your data and outputs in ways that SPSS never could. SPSS, on the other hand, is a much more straightforward approach, so much so that in order to run a regression, the user literally selects “Regression” from the analysis menu. This makes it much friendlier to the user, especially when wanting to quickly input and analyze data.

Thus, these key differences are why I felt it would be best to focus solely on the output of a basic regression in R vs. the output of a basic regression in SPSS, since this is the first real common ground the two can have.

R regression

Following the instructions in the exercise, the output they show is actually slightly different than the output that actually occurs in R.

Exercise output:

Call:
lm(formula = mpg ~ disp + hp + wt, data = input)

Coefficients:
(Intercept) disp hp wt
37.105505 -0.000937 -0.031157 -3.800891

# # # # The Coefficient Values # # #
(Intercept)
37.10551
disp
-0.0009370091
hp
-0.03115655
wt
-3.800891

Actual R output:

Call:
lm(formula = mpg ~ disp + hp + wt, data = mtcars12)
Coefficients:
(Intercept) disp hp wt
37.105505 -0.000937 -0.031157 -3.800891
>
> cat("# # # # The Coefficient Values # # # ","\n")
# # # # The Coefficient Values # # #
>
> mtmodel2 <- coef(mtmodel)[1]
> print(mtmodel2)
(Intercept)
37.10551
>
> Xdisp <- coef(mtmodel)[2]
> Xhp <- coef(mtmodel)[3]
> Xwt <- coef(mtmodel)[4]
>
> print(Xdisp)
disp
-0.0009370091
> print(Xhp)
hp
-0.03115655
> print(Xwt)
wt
-3.800891

While the output values are the same, this console output includes some of the actual functional inputs from the original code. This may seem menial and can easily be shored up with a little bit of manual cutting, but if we’re speaking to easy readability, it certainly makes it more difficult. That said, there is fortunately a very easy way to remove this issue. Instead of Ctrl + Shift + Enter, the appropriate command is Ctrl + Shift + S, which will create the output without the command lines.

So since we can easily mitigate the above problem, let’s again consider the original exercise output. I’m curious what utility is gained by listing the coefficients a second time as well as vertically under “####The Coefficient Values###”. My guess was that this was a way to explain a way to format the results, so this is probably a non-issue as well.

The biggest problem then, it seems, is that there is only baseline information based exclusively on what you put in. You have your coefficient values as well as your intercept, but that’s it. Since the running theme is that we need context for our data, unless you have the R skills to elicit the other information, it seems that a shortcoming of a basic R regression is that there just isn’t enough information to be truly representative.

SPSS Regression

A basic SPSS regression is able to answer this problem. It is easy enough to setup the regression and add in some additional statistics. And although I said I wouldn’t harp on the process, here we can see the ease in adding additional information to your output:

After running the regression, the main part of your output will look like this:

Needless to say, the output that comes with a stock SPSS regression is handily more informative than R. You have your regression coefficients, the standard error, the t-statistic, significance levels, and confidence intervals, among others. This gives some much needed context to your data and helps you better interpret and understand it.

On a different note, I immediately noticed that the coefficient values were different for the SPSS output compared to the R output:

R: 37.106, -0.000937, -.03115655, -3.800891

SPSS: 35.157, -0.014, -0.020, -2.866

My first thought that there was an error in the values when transferring the data into SPSS, but then I looked at the standard error values. Combining the B and Std. Error values in SPSS would put you almost exactly at the values for R. Thus, it seems that R’s output does not account for standard error in a way that is useful to the user since the R output doesn’t tell you what it is or if it’s even involved at all. This seems like a major pitfall for the R output, but once again, I’m sure there is a way to handle this with proper R wizardry.

All of this considered, it’s not to say the SPSS output is ideal. There is a significant amount of data to sort through and unless you know where to look and how to read the tables correctly, you may be confused on how to extract the regression model from it.

Overall, and on a very basic level, it seems the immediate benefit of the SPSS output is easily accessed and much needed context. The R output has the benefit of simplicity. It tells you what your intercept is; it tells you what your coefficients are. In fact, the way it outputs it here is pretty much writing your equation for you. Just add in Y = and the +’s:

Coefficients:
(Intercept) disp hp wt
37.105505 -0.000937 -0.031157 -3.800891

I will note once again how different the tools are, and while I will likely keep SPSS as my go-to, I would be remiss if I did not mention that if I had the prowess and understanding for R, you probably couldn’t ever tear me away from it.

References

R — Multiple Regression. Retrieved from Tutorials Point: Simple Learning: https://www.tutorialspoint.com/r/r_multiple_regression.htm

--

--