Overview of ANOVA Analysis
Assumption of ANOVA
First of all, before running an ANOVA there are several assumptions that need to be met.
1) The populations from which the samples are drawn need to be normally distributed.
2) The populations from which the samples are drawn need to have equal variances.
a) The “Normality” assumptions are tested for outliers, skewness, and kurtosis. These tests can be selected in SPSS.
i) Outliers: In SPSS the Kolmogorov-Smirnov and the Shapiro-Wilk test will reveal if the data contains outliers. One can also graph the data and visually inspect for outliers. If the data contains outliers, then it is generally recommended to remove these from the data set.
ii) Skewness: In SPSS the data distribution is not skewed if the reported Skewness statistics is between -1.0 and 1.0.
iii) Kurtosis: In SPSS the data distribution is not skewed if the reported Kurtatic statistics is between -2.0 and 2.0.
3) Additionally, groups that are being compared need to have equal variance.
a) Homogeneity = variance is equal in the two populations that are being compared.
i) The homogeneity of variance is tested with the Levine statistics.
There are some factors that offset ANOVA violations. For example, ANOVA is robust to violations of normality when the sample size is large. Additionally, ANOVA is robust to violations of homogeneity of variance if the sample sizes in each group is equal.
However, if the data is not normally distributed then then one can try the following:
· Remove outliers.
· Log transform the data.
· Apply a correction to the data.
· Run a different test (e.g., Kruskal-Wallis ANOVA, Mann-Whitley).
· Inspect if data has a bio-modal distribution (e.g., male/female, young/old). If so, analysis can be run for separate groups.
Which test to run?
1) ANOVA versus t-test?
a) When comparing 2 groups with a continuous dependent variable use a t-test.
2) One-way Between Subjects ANOVA
a) When comparing more than 2 groups with a continuous dependent variable use one-way ANOVA test.
3) One-way Repeated Measures ANOVA
a) When each participants completed each condition (e.g. each participant took the placebo and the actual drug).
4) Factorial ANOVA
i) This analysis is used when there is more than 1 independent variable.
(1) The terms 2-way or 3-way refers to how many independent variables there are in the analysis.
ii) Additionally, with factorial designs there are additional sources of variability to consider.
(1) Main effects is the mean differences among the levels of a particular factor.
(2) Interaction occurs when the effect of one factor is influenced by the levels of another.
a) ANOVA with two or more dependent variable.
An Example in R: Running an One-way Between Subjects ANOVA
I used my movie study data to run the one-way between subjects ANOVA. Here, participants used a movie database to determine “Opening Gross Movie Earnings” of the movie RoboCop. Participants either worked individually, in groups, or in teams. Running this analysis would reveal which group (Individual/Group/Team) would be best at predicting opening gross earnings amount for the movie RoboCop.
setwd(“C:/Users/rvbuc_000/Documents/R/win-library/3.3”) #set working directory
ANOVA.df <- read.csv(“ANOVA.csv”, header = TRUE, stringsAsFactors = FALSE)
aov.ex1 = aov(Prediction~Factor,data=ANOVA.df) #do the analysis of variance
summary(aov.ex1) #show the summary table
print(model.tables(aov.ex1,”means”),digits=3) #report the means and the number of subjects/cell
boxplot(Prediction~Factor,data=ANOVA.df) #graphical summary
The Box plot depicts the results of the One-way Between Subjects ANOVA.
The analysis revealed that there was a significant difference in prediction outcome. The differences in scores occurred because both, groups and teams, outperformed individuals in this prediction. Here, groups and teams were closest to the actual “Opening Weekend Gross” earnings amount for the movie ‘RoboCop’.