An ANOVA is an analysis of variance used to compare the means of groups within data (One-way Analysis of Variance (ANOVA), 2011). This sounds a little more complicated than it really is, so lets break it down. What is an analysis of variance of means? This is simply a difference in the averages when comparing certain groups of data. In terms of statistics how is this useful. ANOVAs are used to compare the null hypothesis and the alternative hypothesis. A null hypothesis is when the researcher states that all factors are equal, and an alternative hypothesis is when there is a difference between the factors (What is ANOVA?, 2017). This might sound complicated, but the example below will clear everything up.
There are many different types of ANOVA including one-way, balanced, and general linear model. Each type dictates a different way a data set may be designed. A one-way ANOVA contains one fixed factor which can have either an unbalanced or balanced number of observations per treatment. A treatment means the different groups that are being observed. A balanced ANOVA has any amount of fixed and random factors and compared factors. A general linear model ANOVA is similar to a balanced ANOVA but allows for continuous variables (What is ANOVA?,2017). With these different devices in our statistical toolbox lets look at an example.
ANOVA Example: Lets say that you were asked to see which running shoe made an athlete run faster. The null hypothesis would be that all the shoes will yield the same mile time, and your alternative hypothesis would state that the shoes will yield different mile times. Measuring the mile time of athletes wearing four different shoes you compile a data set. In order to test for your hypothesis you run a one-way ANOVA and find that the different shoes yield different mile times. Knowing this we can reject the null hypothesis, and dive further into the data to see which shoe is the fastest.
Did you feel let down by the example? Maybe its because it’s quite underwhelming. We find that there is a difference in mile times from the different shows, but we do not find out which shoe is the fastest. In order to find out this detailed information of the differences in shoes a multiple comparison method needs to be used. An example of one is Tukey’s method. Tukey’s method is used in ANOVA to make confidence intervals for paired differences between the different factor level means (What is Tukey’s, 207). A confidence interval gives a range of values which are likely to represent a population (What are confidence intervals?,2017). Using Tukey’s we can compare the confidence levels and factor level means of each shoe’s mile time and determine the fastest shoe.
An ANOVA may seem like the clear path, but there are some limitations. In order to run one you must have a continuous response variable (dependent variable) and at least one categorical factor with more than one level (independent variable with multiple groups). They also require data from normally distributed populations. Another thing to keep in consideration is that ANOVAs are used to compare the difference in means of groups, and further analyzing is required to create a concrete answer (What is ANOVA?,2017).
After researching what an ANOVA is, I tested my skills in running a one-way ANOVA in R Studio. The data set I used is called Insect Sprays and is available in RStudio (The R Datasets Package, 2017). I am using the data set to compare multiple types of insect sprays to see which one kills the most amount of insects.
‘data.frame’: 72 obs. of 2 variables:
$ count: num 10 7 20 14 14 12 10 23 17 20 …
$ spray: Factor w/ 6 levels “A”,”B”,”C”,”D”,..: 1 1 1 1 1 1 1 1 1 1 …
Looking at the data set we can see that there is a countiuous variable measuring the dependent variable and an independent variable spray with multiple levels, which are the different types of insect spray.
> fit <- aov(count~spray,data = InsectSprays)
Df Sum Sq Mean Sq F value Pr(>F)
spray 5 2669 533.8 34.7 <2e-16 ***
Residuals 66 1015 15.4
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Running an ANOVA shows that there is a difference in means by looking at the p value.
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = count ~ spray, data = InsectSprays)
diff lwr upr p adj
B-A 0.8333333 -3.866075 5.532742 0.9951810
C-A -12.4166667 -17.116075 -7.717258 0.0000000
D-A -9.5833333 -14.282742 -4.883925 0.0000014
E-A -11.0000000 -15.699409 -6.300591 0.0000000
F-A 2.1666667 -2.532742 6.866075 0.7542147
C-B -13.2500000 -17.949409 -8.550591 0.0000000
D-B -10.4166667 -15.116075 -5.717258 0.0000002
E-B -11.8333333 -16.532742 -7.133925 0.0000000
F-B 1.3333333 -3.366075 6.032742 0.9603075
D-C 2.8333333 -1.866075 7.532742 0.4920707
E-C 1.4166667 -3.282742 6.116075 0.9488669
F-C 14.5833333 9.883925 19.282742 0.0000000
E-D -1.4166667 -6.116075 3.282742 0.9488669
F-D 11.7500000 7.050591 16.449409 0.0000000
F-E 13.1666667 8.467258 17.866075 0.0000000
Running a Tukey test we can see the comparisons of each insect spray and look at the differences between them.
7.1.4. What are confidence intervals? (n.d.). Retrieved April 05, 2017, from http://www.itl.nist.gov/div898/handbook/prc/section1/prc14.htm
One-way Analysis of Variance (ANOVA). (2011, May 11). Retrieved April 05, 2017, from https://www.r-bloggers.com/one-way-analysis-of-variance-anova/
The R Datasets Package. (n.d.). Retrieved April 05, 2017, from https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/00Index.html
What is ANOVA? (n.d.). Retrieved April 05, 2017, from http://support.minitab.com/en-us/minitab/17/topic-library/modeling-statistics/anova/basics/what-is-anova/
What is Tukey’s method for multiple comparisons? (n.d.). Retrieved April 05, 2017, from http://support.minitab.com/en-us/minitab/17/topic-library/modeling-statistics/anova/multiple-comparisons/what-is-tukey-s-method/