Getting a subset of a data structure

Problem

You want to do get a subset of the elements of a vector, matrix, or data frame.

Solution

To get a subset based on some conditional criterion, the subset() function or indexing using square brackets can be used. In the examples here, both ways are shown.

  1. # A sample vector
  2. v <- c(1,4,4,3,2,2,3)
  3. subset(v, v<3)
  4. #> [1] 1 2 2
  5. v[v<3]
  6. #> [1] 1 2 2
  7. # Another vector
  8. t <- c("small", "small", "large", "medium")
  9. # Remove "small" entries
  10. subset(t, t!="small")
  11. #> [1] "large" "medium"
  12. t[t!="small"]
  13. #> [1] "large" "medium"

One important difference between the two methods is that you can assign values to elements with square bracket indexing, but you cannot with subset().

  1. v[v<3] <- 9
  2. subset(v, v<3) <- 9
  3. #> Error in subset(v, v < 3) <- 9: could not find function "subset<-"

With data frames:

  1. # A sample data frame
  2. data <- read.table(header=T, text='
  3. subject sex size
  4. 1 M 7
  5. 2 F 6
  6. 3 F 9
  7. 4 M 11
  8. ')
  9. subset(data, subject < 3)
  10. #> subject sex size
  11. #> 1 1 M 7
  12. #> 2 2 F 6
  13. data[data$subject < 3, ]
  14. #> subject sex size
  15. #> 1 1 M 7
  16. #> 2 2 F 6
  17. # Subset of particular rows and columns
  18. subset(data, subject < 3, select = -subject)
  19. #> sex size
  20. #> 1 M 7
  21. #> 2 F 6
  22. subset(data, subject < 3, select = c(sex,size))
  23. #> sex size
  24. #> 1 M 7
  25. #> 2 F 6
  26. subset(data, subject < 3, select = sex:size)
  27. #> sex size
  28. #> 1 M 7
  29. #> 2 F 6
  30. data[data$subject < 3, c("sex","size")]
  31. #> sex size
  32. #> 1 M 7
  33. #> 2 F 6
  34. # Logical AND of two conditions
  35. subset(data, subject < 3 & sex=="M")
  36. #> subject sex size
  37. #> 1 1 M 7
  38. data[data$subject < 3 & data$sex=="M", ]
  39. #> subject sex size
  40. #> 1 1 M 7
  41. # Logical OR of two conditions
  42. subset(data, subject < 3 | sex=="M")
  43. #> subject sex size
  44. #> 1 1 M 7
  45. #> 2 2 F 6
  46. #> 4 4 M 11
  47. data[data$subject < 3 | data$sex=="M", ]
  48. #> subject sex size
  49. #> 1 1 M 7
  50. #> 2 2 F 6
  51. #> 4 4 M 11
  52. # Condition based on transformed data
  53. subset(data, log2(size) > 3 )
  54. #> subject sex size
  55. #> 3 3 F 9
  56. #> 4 4 M 11
  57. data[log2(data$size) > 3, ]
  58. #> subject sex size
  59. #> 3 3 F 9
  60. #> 4 4 M 11
  61. # Subset if elements are in another vector
  62. subset(data, subject %in% c(1,3))
  63. #> subject sex size
  64. #> 1 1 M 7
  65. #> 3 3 F 9
  66. data[data$subject %in% c(1,3), ]
  67. #> subject sex size
  68. #> 1 1 M 7
  69. #> 3 3 F 9

Notes

Also see ../Indexing into a data structure.