ANOVA in R: Basic & Practical Math Wizardry

Mikey B.
Human Systems Data
Published in
3 min readApr 4, 2017

The ANOVA, or ANalysis Of VAriance, is one of the more common statistical tests used in research. As the name implies, the ANOVA is a comparison of different means (typically 3+) to determine whether or not they are different. ANOVA is comparable to the t-test, which compares two variables. In fact, you may even be able to substitute a bunch of t-tests for an ANOVA. The complication may leave you with error, however, so it’s probably best to stick with the ANOVA.

Now, wouldn’t you know it? There is a whole variety of ways to run an ANOVA. If you want to exercise your math wizardry with this fancy new spell, the first thing you will need to identify is what type of ANOVA fits your study. The best place to start is to count the number of factors you are working with. You can think of factors as the independent variables of the study. For a single factor, you run a one-way ANOVA. For two factors, a two-way ANOVA. If you want to see how two variables vary together, you can run an ANCOVA, or ANalysis Of CoVAriance. If you are running a within-subjects study, you can run a one- or two-way within subjects ANOVA. You can make things really complicated with a two-way between AND two-way within ANOVA. So many options!

This is the basic model function for running a one-way ANOVA in R:

fit <- aov(y ~ A, data=name)

R just makes it seem so easy, right? Here, “fit” is the name of the test you are running. Fit is a very common name for this because you are fitting a model here, but you can really name your analysis whatever you want. “aov” is exactly what you think: Analysis Of Variance. “y” represents the dependent variable, “A” is the factor, and “name” is the name of your dataset.

huckleberryfin <- aov(dependent ~ factor, data=mynumbersandstuffs)

If you have two factors, the formula will look something like this:

fit <- aov(y ~ A*B, data=name)

Let’s draw on an example I’ve used in other posts. It is a look at data from the 2014–15 NBA season. Here we are looking at the success or failure of shots as our “Y” with the distance of the closest defender as our factor. So this is our formula:

BBallAoV <- aov(success ~ CLOSE_DEF_DIST, data = subset.df)

From here, all we need to do is ask for a summary:

summary(BBallAoV)

And we get this:

Interpreting these numbers is a whole different blog post. For now, congratulations! You’ve just learned how to do some very practical math wizardry!

--

--