Indexing into a data structure

Problem

You want to get part of a data structure.

Solution

Elements from a vector, matrix, or data frame can be extracted using numeric indexing, or by using a boolean vector of the appropriate length.

In many of the examples, below, there are multiple ways of doing the same thing.

Indexing with numbers and names

With a vector:

  1. # A sample vector
  2. v <- c(1,4,4,3,2,2,3)
  3. v[c(2,3,4)]
  4. #> [1] 4 4 3
  5. v[2:4]
  6. #> [1] 4 4 3
  7. v[c(2,4,3)]
  8. #> [1] 4 3 4

With a data frame:

  1. # Create a sample data frame
  2. data <- read.table(header=T, text='
  3. subject sex size
  4. 1 M 7
  5. 2 F 6
  6. 3 F 9
  7. 4 M 11
  8. ')
  9. # Get the element at row 1, column 3
  10. data[1,3]
  11. #> [1] 7
  12. data[1,"size"]
  13. #> [1] 7
  14. # Get rows 1 and 2, and all columns
  15. data[1:2, ]
  16. #> subject sex size
  17. #> 1 1 M 7
  18. #> 2 2 F 6
  19. data[c(1,2), ]
  20. #> subject sex size
  21. #> 1 1 M 7
  22. #> 2 2 F 6
  23. # Get rows 1 and 2, and only column 2
  24. data[1:2, 2]
  25. #> [1] M F
  26. #> Levels: F M
  27. data[c(1,2), 2]
  28. #> [1] M F
  29. #> Levels: F M
  30. # Get rows 1 and 2, and only the columns named "sex" and "size"
  31. data[1:2, c("sex","size")]
  32. #> sex size
  33. #> 1 M 7
  34. #> 2 F 6
  35. data[c(1,2), c(2,3)]
  36. #> sex size
  37. #> 1 M 7
  38. #> 2 F 6

Indexing with a boolean vector

With the vector v from above:

  1. v > 2
  2. #> [1] FALSE TRUE TRUE TRUE FALSE FALSE TRUE
  3. v[v>2]
  4. #> [1] 4 4 3 3
  5. v[ c(F,T,T,T,F,F,T)]
  6. #> [1] 4 4 3 3

With the data frame from above:

  1. # A boolean vector
  2. data$subject < 3
  3. #> [1] TRUE TRUE FALSE FALSE
  4. data[data$subject < 3, ]
  5. #> subject sex size
  6. #> 1 1 M 7
  7. #> 2 2 F 6
  8. data[c(TRUE,TRUE,FALSE,FALSE), ]
  9. #> subject sex size
  10. #> 1 1 M 7
  11. #> 2 2 F 6
  12. # It is also possible to get the numeric indices of the TRUEs
  13. which(data$subject < 3)
  14. #> [1] 1 2

Negative indexing

Unlike in some other programming languages, when you use negative numbers for indexing in R, it doesn’t mean to index backward from the end. Instead, it means to drop the element at that index, counting the usual way, from the beginning.

  1. # Here's the vector again.
  2. v
  3. #> [1] 1 4 4 3 2 2 3
  4. # Drop the first element
  5. v[-1]
  6. #> [1] 4 4 3 2 2 3
  7. # Drop first three
  8. v[-1:-3]
  9. #> [1] 3 2 2 3
  10. # Drop just the last element
  11. v[-length(v)]
  12. #> [1] 1 4 4 3 2 2

Notes

Also see ../Getting a subset of a data structure.