Re-computing the levels of all factor columns in a data frame
Problem
You want to re-compute factor levels of all factor columns in a data frame.
Solution
Sometimes after reading in data and cleaning it, you will end up with factor columns that have levels that should no longer be there.
For example, d
below has one blank row. When it’s read in, the factor columns have a level ""
, which shouldn’t be part of the data.
d <- read.csv(header = TRUE, text='
x,y,value
a,one,1
,,5
b,two,4
c,three,10
')
d
#> x y value
#> 1 a one 1
#> 2 5
#> 3 b two 4
#> 4 c three 10
str(d)
#> 'data.frame': 4 obs. of 3 variables:
#> $ x : Factor w/ 4 levels "","a","b","c": 2 1 3 4
#> $ y : Factor w/ 4 levels "","one","three",..: 2 1 4 3
#> $ value: int 1 5 4 10
Even after removing the empty row, the factors still have the blank string ""
as a level:
# Remove second row
d <- d[-2,]
d
#> x y value
#> 1 a one 1
#> 3 b two 4
#> 4 c three 10
str(d)
#> 'data.frame': 3 obs. of 3 variables:
#> $ x : Factor w/ 4 levels "","a","b","c": 2 3 4
#> $ y : Factor w/ 4 levels "","one","three",..: 2 4 3
#> $ value: int 1 4 10
With droplevels
The simplest way is to use the droplevels()
function:
d1 <- droplevels(d)
str(d1)
#> 'data.frame': 3 obs. of 3 variables:
#> $ x : Factor w/ 3 levels "a","b","c": 1 2 3
#> $ y : Factor w/ 3 levels "one","three",..: 1 3 2
#> $ value: int 1 4 10
With vapply and lapply
To re-compute the levels for all factor columns, we can use vapply()
with is.factor()
to find out which of columns are factors, and then use that information with lapply
to apply the factor()
function to those columns.
# Find which columns are factors
factor_cols <- vapply(d, is.factor, logical(1))
# Apply the factor() function to those columns, and assign then back into d
d[factor_cols] <- lapply(d[factor_cols], factor)
str(d)
#> 'data.frame': 3 obs. of 3 variables:
#> $ x : Factor w/ 3 levels "a","b","c": 1 2 3
#> $ y : Factor w/ 3 levels "one","three",..: 1 3 2
#> $ value: int 1 4 10
See also
For information about re-computing the levels of a factor, see ../Re-computing_the_levels_of_factor.