ANOVA with MTCARS

Hello again Human Systems Students!

Last week we discussed multiple regression in R using the mtcars dataset. Mtcars tells us about the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973–74 models). This week’s reading is on ANOVA or analysis of variance which is a statistical method used for testing differences between two or more variables (or means).

A data frame with 32 observations on 11 variables.

[, 1] mpg Miles/(US) gallon 
[, 2] cyl Number of cylinders
[, 3] disp Displacement (cu.in.)
[, 4] hp Gross horsepower
[, 5] drat Rear axle ratio
[, 6] wt Weight (1000 lbs)
[, 7] qsec 1/4 mile time
[, 8] vs V/S
[, 9] am Transmission (0 = automatic, 1 = manual)
[,10] gear Number of forward gears
[,11] carb Number of carburetors

Let us look at a graphic or plot of our mtcars dataset by running the following R code:

require(graphics)
pairs(mtcars, main = “mtcars data”)
coplot(mpg ~ disp | as.factor(cyl), data = mtcars,
+ panel = panel.smooth, rows = 1)

This shows us that the mpg increases when both the engine size (disp) is smaller and the cylinders (cyl) are smaller. Once we increase cylinder size to 8 the mpg is still very low.

Let’s load our datasets from the library in R

library(datasets)

Now let’s start with the process by running a one-way ANOVA on mpg (miles per gallon) being dependent on the am (automatic transmission).

fit <- aov(mpg ~ am, data = mtcars)
summary(fit)

Here we have taken the variable for mpg to am –

Df Sum Sq Mean Sq F value Pr(>F)
am 1 405.2 405.2 16.86 0.000285 ***
Residuals 30 720.9 24.0
 — -
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Here we now see the df = 1 for both am and disp and that our f-values are f = 39.13 and 40.62 which makes our p-values significant for am and disp to mpg.

We can now try three variables and look at a three-way ANOVA with mpg to am, disp and wt.

fit <- aov(mpg ~ am + disp + wt, data = mtcars)
summary(fit)

Df Sum Sq Mean Sq F value Pr(>F) am 1 405.2 405.2 46.011 2.29e-07 ***
disp 1 420.6 420.6 47.767 1.64e-07 ***
wt 1 53.7 53.7 6.101 0.0199 *
Residuals 28 246.6 8.8
 — -
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Here we see the df = 1 for each am, disp and wt and that all three of our variables are significant with mpg. Now let’s visualize our results by using an interaction plot.

attach(mtcars)
cyl <- factor(cyl) gear <- factor(gear)
interaction.plot(cyl, gear, mpg, type = “b”, col = c(1:3), leg.bty = “o”, leg.bg = “beige”, lwd = 2, pch = c(18,24,22), xlab = “Number of Cyclinders”, ylab = “Mean Miles Per Gallon”, main
= “Interaction Plot”)

This graph shows us the two-way interaction between the number of cylinders to the mpg with the number of gears listed to the right on the graph.

cyl <- factor(cyl)
plotmeans(mpg ~ cyl, xlab = “Number of Cylinders”,
ylab = “Miles Per Gallon”, main = “Mean Plot\nwith 95% CI”)

Here we are plotting the number of cylinders to the miles per gallon and showing the means with error bars. There are n=11 cars between 25–30 mpg, n=7 cars around 20 mpg and n=14 cars below 15 mpg.

qplot(x = am, y = mpg, data = mtcars, geom = “point”)

Here we can see am to mpg, the higher am is the higher our mpg increases up to 35.

Now let’s look at the outliers for our dataset mtcars.

library(mvoutlier) outliers <-
aq.plot(mtcars [c (“mpg”, “disp”, “hp”, “drat”, “wt”, “qsec”)])
outliers

This shows us a list of the outliers in the dataset mtcars. Although the graph is extremely busy and you need to look at it very closely you can see the Maserati Bora, Ford Pantera L, Camero Z28, Duster 360 are not closely related to the other data points on the graphs.

So, I have given examples of a few different types of ANOVA’s and shown some of the results visually with a plot or graph. I am still trying to understand statistics and especially in R, which makes sense since numbers are not my forte. I am grateful that Google and others who have uploaded their code online can be my friend during this journey of learning R.

References:

Henderson and Velleman (1981), Building multiple regression models interactively. Biometrics, 37, 391–411.

Kabacoff, R. (2012). Quick-R. http://www.statmethods.net/stats/anova.html

Kabacoff, R. (2012). Quick-R. http://www.statmethods.net/stats/anovaAssumptions.html

Show your support

Clapping shows how much you appreciated Jennifer Williams’s story.