ANOVA

Problem

You want to compare multiple groups using an ANOVA.

Solution

Suppose this is your data:

  1. data <- read.table(header=TRUE, text='
  2. subject sex age before after
  3. 1 F old 9.5 7.1
  4. 2 M old 10.3 11.0
  5. 3 M old 7.5 5.8
  6. 4 F old 12.4 8.8
  7. 5 M old 10.2 8.6
  8. 6 M old 11.0 8.0
  9. 7 M young 9.1 3.0
  10. 8 F young 7.9 5.2
  11. 9 F old 6.6 3.4
  12. 10 M young 7.7 4.0
  13. 11 M young 9.4 5.3
  14. 12 M old 11.6 11.3
  15. 13 M young 9.9 4.6
  16. 14 F young 8.6 6.4
  17. 15 F young 14.3 13.5
  18. 16 F old 9.2 4.7
  19. 17 M young 9.8 5.1
  20. 18 F old 9.9 7.3
  21. 19 F young 13.0 9.5
  22. 20 M young 10.2 5.4
  23. 21 M young 9.0 3.7
  24. 22 F young 7.9 6.2
  25. 23 M old 10.1 10.0
  26. 24 M young 9.0 1.7
  27. 25 M young 8.6 2.9
  28. 26 M young 9.4 3.2
  29. 27 M young 9.7 4.7
  30. 28 M young 9.3 4.9
  31. 29 F young 10.7 9.8
  32. 30 M old 9.3 9.4
  33. ')
  34. # Make sure subject column is a factor, so that it's not treated as a continuous
  35. # variable.
  36. data$subject <- factor(data$subject)

One way between ANOVA

  1. # One way between:
  2. # IV: sex
  3. # DV: before
  4. aov1 <- aov(before ~ sex, data=data)
  5. summary(aov1)
  6. #> Df Sum Sq Mean Sq F value Pr(>F)
  7. #> sex 1 1.53 1.529 0.573 0.455
  8. #> Residuals 28 74.70 2.668
  9. # Show the means
  10. model.tables(aov1, "means")
  11. #> Tables of means
  12. #> Grand mean
  13. #>
  14. #> 9.703333
  15. #>
  16. #> sex
  17. #> F M
  18. #> 10 9.532
  19. #> rep 11 19.000

Two way between ANOVA

  1. # 2x2 between:
  2. # IV: sex
  3. # IV: age
  4. # DV: after
  5. # These two calls are equivalent
  6. aov2 <- aov(after ~ sex*age, data=data)
  7. aov2 <- aov(after ~ sex + age + sex:age, data=data)
  8. summary(aov2)
  9. #> Df Sum Sq Mean Sq F value Pr(>F)
  10. #> sex 1 16.08 16.08 4.038 0.0550 .
  11. #> age 1 38.96 38.96 9.786 0.0043 **
  12. #> sex:age 1 89.61 89.61 22.509 6.6e-05 ***
  13. #> Residuals 26 103.51 3.98
  14. #> ---
  15. #> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  16. # Show the means
  17. model.tables(aov2, "means")
  18. #> Tables of means
  19. #> Grand mean
  20. #>
  21. #> 6.483333
  22. #>
  23. #> sex
  24. #> F M
  25. #> 7.445 5.926
  26. #> rep 11.000 19.000
  27. #>
  28. #> age
  29. #> old young
  30. #> 7.874 5.556
  31. #> rep 12.000 18.000
  32. #>
  33. #> sex:age
  34. #> age
  35. #> sex old young
  36. #> F 6.260 8.433
  37. #> rep 5.000 6.000
  38. #> M 9.157 4.042
  39. #> rep 7.000 12.000

Tukey HSD post-hoc test

  1. TukeyHSD(aov2)
  2. #> Tukey multiple comparisons of means
  3. #> 95% family-wise confidence level
  4. #>
  5. #> Fit: aov(formula = after ~ sex + age + sex:age, data = data)
  6. #>
  7. #> $sex
  8. #> diff lwr upr p adj
  9. #> M-F -1.519139 -3.073025 0.03474709 0.0549625
  10. #>
  11. #> $age
  12. #> diff lwr upr p adj
  13. #> young-old -2.31785 -3.846349 -0.7893498 0.0044215
  14. #>
  15. #> $`sex:age`
  16. #> diff lwr upr p adj
  17. #> M:old-F:old 2.8971429 -0.3079526 6.1022384 0.0869856
  18. #> F:young-F:old 2.1733333 -1.1411824 5.4878491 0.2966111
  19. #> M:young-F:old -2.2183333 -5.1319553 0.6952887 0.1832890
  20. #> F:young-M:old -0.7238095 -3.7691188 2.3214997 0.9138789
  21. #> M:young-M:old -5.1154762 -7.7187601 -2.5121923 0.0000676
  22. #> M:young-F:young -4.3916667 -7.1285380 -1.6547953 0.0008841

ANOVAs with within-subjects variables

For ANOVAs with within-subjects variables, the data must be in long format. The data supplied above is in wide format, so we have to convert it first. (See ../../Manipulating data/Converting data between wide and long format for more information.)

Also, for ANOVAs with a within-subjects variable, there must be an identifier column. In this case, it is subject. This identifier variable must be a factor. If it is a numeric type, the function will interpret it incorrectly and it won’t work properly.

  1. library(tidyr)
  2. # This is what the original (wide) data looks like
  3. # subject sex age before after
  4. # 1 F old 9.5 7.1
  5. # 2 M old 10.3 11.0
  6. # 3 M old 7.5 5.8
  7. # Convert it to long format
  8. data_long <- gather(data, time, value, before:after)
  9. # Look at first few rows
  10. head(data_long)
  11. #> subject sex age time value
  12. #> 1 1 F old before 9.5
  13. #> 2 2 M old before 10.3
  14. #> 3 3 M old before 7.5
  15. #> 4 4 F old before 12.4
  16. #> 5 5 M old before 10.2
  17. #> 6 6 M old before 11.0
  18. # Make sure subject column is a factor
  19. data_long$subject <- factor(data_long$subject)

One-way within ANOVA

First, convert the data to long format and make sure subject is a factor, as shown above. If subject is a numeric column, and not a factor, your results will be wrong!

  1. # IV (within): time
  2. # DV: value
  3. aov_time <- aov(value ~ time + Error(subject/time), data=data_long)
  4. summary(aov_time)
  5. #>
  6. #> Error: subject
  7. #> Df Sum Sq Mean Sq F value Pr(>F)
  8. #> Residuals 29 261.2 9.009
  9. #>
  10. #> Error: subject:time
  11. #> Df Sum Sq Mean Sq F value Pr(>F)
  12. #> time 1 155.53 155.53 71.43 2.59e-09 ***
  13. #> Residuals 29 63.14 2.18
  14. #> ---
  15. #> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  16. # This won't work here for some reason (?)
  17. model.tables(aov_time, "means")
  18. #> Tables of means
  19. #> Grand mean
  20. #>
  21. #> 8.093333
  22. #>
  23. #> time
  24. #> time
  25. #> after before
  26. #> 6.483 9.703

Mixed design ANOVA

First, convert the data to long format and make sure subject is a factor, as shown above.

  1. # 2x2 mixed:
  2. # IV between: age
  3. # IV within: time
  4. # DV: value
  5. aov_age_time <- aov(value ~ age*time + Error(subject/time), data=data_long)
  6. summary(aov_age_time)
  7. #>
  8. #> Error: subject
  9. #> Df Sum Sq Mean Sq F value Pr(>F)
  10. #> age 1 24.44 24.440 2.89 0.1
  11. #> Residuals 28 236.81 8.457
  12. #>
  13. #> Error: subject:time
  14. #> Df Sum Sq Mean Sq F value Pr(>F)
  15. #> time 1 155.53 155.53 98.14 1.18e-10 ***
  16. #> age:time 1 18.77 18.77 11.84 0.00184 **
  17. #> Residuals 28 44.37 1.58
  18. #> ---
  19. #> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  20. # This won't work here because the data is unbalanced
  21. model.tables(aov_age_time, "means")
  22. #> Tables of means
  23. #> Grand mean
  24. #>
  25. #> 8.093333
  26. #>
  27. #> age
  28. #> old young
  29. #> 8.875 7.572
  30. #> rep 24.000 36.000
  31. #>
  32. #> time
  33. #> after before
  34. #> 6.483 9.703
  35. #> rep 30.000 30.000
  36. #>
  37. #> age:time
  38. #> time
  39. #> age after before
  40. #> old 7.950 9.800
  41. #> rep 12.000 12.000
  42. #> young 5.506 9.639
  43. #> rep 18.000 18.000

More ANOVAs with within-subjects variables

These examples don’t operate on the data above, but they should illustrate how to do things.First, convert the data to long format and make sure subject is a factor, as shown above.

  1. # Two within variables
  2. aov.ww <- aov(y ~ w1*w2 + Error(subject/(w1*w2)), data=data_long)
  3. # One between variable and two within variables
  4. aov.bww <- aov(y ~ b1*w1*w2 + Error(subject/(w1*w2)) + b1, data=data_long)
  5. # Two between variables and one within variables
  6. aov.bww <- aov(y ~ b1*b2*w1 + Error(subject/(w1)) + b1*b2, data=data_long)

For more information on ANOVA in R, see: