Indexing into a data structure
- Problem
- Solution

Indexing into a data structure

Problem

You want to get part of a data structure.

Solution

Elements from a vector, matrix, or data frame can be extracted using numeric indexing, or by using a boolean vector of the appropriate length.

In many of the examples, below, there are multiple ways of doing the same thing.

Indexing with numbers and names

With a vector:

# A sample vector
v <- c(1,4,4,3,2,2,3)
v[c(2,3,4)]
#> [1] 4 4 3
v[2:4]
#> [1] 4 4 3
v[c(2,4,3)]
#> [1] 4 3 4

With a data frame:

# Create a sample data frame
data <- read.table(header=T, text='
 subject sex size
       1   M    7
       2   F    6
       3   F    9
       4   M   11
 ')
# Get the element at row 1, column 3
data[1,3]
#> [1] 7
data[1,"size"]
#> [1] 7
# Get rows 1 and 2, and all columns
data[1:2, ]   
#>   subject sex size
#> 1       1   M    7
#> 2       2   F    6
data[c(1,2), ]
#>   subject sex size
#> 1       1   M    7
#> 2       2   F    6
# Get rows 1 and 2, and only column 2
data[1:2, 2]
#> [1] M F
#> Levels: F M
data[c(1,2), 2]
#> [1] M F
#> Levels: F M
# Get rows 1 and 2, and only the columns named "sex" and "size"
data[1:2, c("sex","size")]
#>   sex size
#> 1   M    7
#> 2   F    6
data[c(1,2), c(2,3)]
#>   sex size
#> 1   M    7
#> 2   F    6

Indexing with a boolean vector

With the vector v from above:

v > 2
#> [1] FALSE  TRUE  TRUE  TRUE FALSE FALSE  TRUE
v[v>2]
#> [1] 4 4 3 3
v[ c(F,T,T,T,F,F,T)]
#> [1] 4 4 3 3

With the data frame from above:

# A boolean vector   
data$subject < 3
#> [1]  TRUE  TRUE FALSE FALSE
data[data$subject < 3, ]
#>   subject sex size
#> 1       1   M    7
#> 2       2   F    6
data[c(TRUE,TRUE,FALSE,FALSE), ]
#>   subject sex size
#> 1       1   M    7
#> 2       2   F    6
# It is also possible to get the numeric indices of the TRUEs
which(data$subject < 3)
#> [1] 1 2

Negative indexing

Unlike in some other programming languages, when you use negative numbers for indexing in R, it doesn’t mean to index backward from the end. Instead, it means to drop the element at that index, counting the usual way, from the beginning.

# Here's the vector again.
v
#> [1] 1 4 4 3 2 2 3
# Drop the first element
v[-1]
#> [1] 4 4 3 2 2 3
# Drop first three
v[-1:-3]
#> [1] 3 2 2 3
# Drop just the last element
v[-length(v)]
#> [1] 1 4 4 3 2 2

Notes

Also see ../Getting a subset of a data structure.