Converting between data frames and contingency tables

Problem

You want to do convert between a data frame of cases, a data frame of counts of each type of case, and a contingency table.

Solution

These three data structures represent the same information, but in different formats:

  • cases: A data frame where each row represents one case.
  • ctable: A contingency table.
  • counts A data frame of counts, where each row represents the count of each combination.
  1. # Each row represents one case
  2. cases <- data.frame(
  3. Sex=c("M", "M", "F", "F", "F"),
  4. Color=c("brown", "blue", "brown", "brown", "brown")
  5. )
  6. cases
  7. #> Sex Color
  8. #> 1 M brown
  9. #> 2 M blue
  10. #> 3 F brown
  11. #> 4 F brown
  12. #> 5 F brown
  13. # A contingency table
  14. ctable <- table(cases)
  15. ctable
  16. #> Color
  17. #> Sex blue brown
  18. #> F 0 3
  19. #> M 1 1
  20. # A table with counts of each combination
  21. counts <- data.frame(
  22. Sex=c("F", "M", "F", "M"),
  23. Color=c("blue", "blue", "brown", "brown"),
  24. Freq=c(0, 1, 3, 1)
  25. )
  26. counts
  27. #> Sex Color Freq
  28. #> 1 F blue 0
  29. #> 2 M blue 1
  30. #> 3 F brown 3
  31. #> 4 M brown 1

Cases to contingency table

To convert from cases to contingency table (this is already shown above):

  1. # Cases to Table
  2. ctable <- table(cases)
  3. ctable
  4. #> Color
  5. #> Sex blue brown
  6. #> F 0 3
  7. #> M 1 1
  8. # If you call table using two vectors, it will not add names (Sex and Color) to
  9. # the dimensions.
  10. table(cases$Sex, cases$Color)
  11. #>
  12. #> blue brown
  13. #> F 0 3
  14. #> M 1 1
  15. # The dimension names can be specified manually with `dnn`, or by using a subset
  16. # of the data frame that contains only the desired columns.
  17. table(cases$Sex, cases$Color, dnn=c("Sex","Color"))
  18. #> Color
  19. #> Sex blue brown
  20. #> F 0 3
  21. #> M 1 1
  22. table(cases[,c("Sex","Color")])
  23. #> Color
  24. #> Sex blue brown
  25. #> F 0 3
  26. #> M 1 1

Cases to counts

It can also be represented as a data frame of counts of each combination. Note that it’s converted here and stored in countdf:

  1. # Cases to Counts
  2. countdf <- as.data.frame(table(cases))
  3. countdf
  4. #> Sex Color Freq
  5. #> 1 F blue 0
  6. #> 2 M blue 1
  7. #> 3 F brown 3
  8. #> 4 M brown 1

Contingency table to cases

  1. countsToCases(as.data.frame(ctable))
  2. #> Sex Color
  3. #> 2 M blue
  4. #> 3 F brown
  5. #> 3.1 F brown
  6. #> 3.2 F brown
  7. #> 4 M brown

Note that the expand.dft function is defined below.

Contingency table to counts

  1. as.data.frame(ctable)
  2. #> Sex Color Freq
  3. #> 1 F blue 0
  4. #> 2 M blue 1
  5. #> 3 F brown 3
  6. #> 4 M brown 1

Counts to cases

  1. countsToCases(countdf)
  2. #> Sex Color
  3. #> 2 M blue
  4. #> 3 F brown
  5. #> 3.1 F brown
  6. #> 3.2 F brown
  7. #> 4 M brown

Note that the countsToCases function is defined below.

Counts to contingency table

  1. xtabs(Freq ~ Sex+Color, data=countdf)
  2. #> Color
  3. #> Sex blue brown
  4. #> F 0 3
  5. #> M 1 1

countsToCases() function

This function is used in the examples above:

  1. # Convert from data frame of counts to data frame of cases.
  2. # `countcol` is the name of the column containing the counts
  3. countsToCases <- function(x, countcol = "Freq") {
  4. # Get the row indices to pull from x
  5. idx <- rep.int(seq_len(nrow(x)), x[[countcol]])
  6. # Drop count column
  7. x[[countcol]] <- NULL
  8. # Get the rows from x
  9. x[idx, ]
  10. }