Comparing vectors or factors with NA

Problem

You want to compare two vectors or factors but want comparisons with NA’s to be reported as TRUE or FALSE (instead of NA).

Solution

Suppose you have this data frame with two columns which consist of boolean vectors:

  1. df <- data.frame( a=c(TRUE,TRUE,TRUE,FALSE,FALSE,FALSE,NA,NA,NA),
  2. b=c(TRUE,FALSE,NA,TRUE,FALSE,NA,TRUE,FALSE,NA))
  3. df
  4. #> a b
  5. #> 1 TRUE TRUE
  6. #> 2 TRUE FALSE
  7. #> 3 TRUE NA
  8. #> 4 FALSE TRUE
  9. #> 5 FALSE FALSE
  10. #> 6 FALSE NA
  11. #> 7 NA TRUE
  12. #> 8 NA FALSE
  13. #> 9 NA NA

Normally, when you compare two vectors or factors containing NA values, the vector of results will have NAs where either of the original items was NA. Depending on your purposes, this may or not be desirable.

  1. df$a == df$b
  2. #> [1] TRUE FALSE NA FALSE TRUE NA NA NA NA
  3. # The same comparison, but presented as another column in the data frame:
  4. data.frame(df, isSame = (df$a==df$b))
  5. #> a b isSame
  6. #> 1 TRUE TRUE TRUE
  7. #> 2 TRUE FALSE FALSE
  8. #> 3 TRUE NA NA
  9. #> 4 FALSE TRUE FALSE
  10. #> 5 FALSE FALSE TRUE
  11. #> 6 FALSE NA NA
  12. #> 7 NA TRUE NA
  13. #> 8 NA FALSE NA
  14. #> 9 NA NA NA

A function for comparing with NA’s

This comparison function will essentially treat NA’s as just another value. If an item in both vectors is NA, then it reports TRUE for that item; if the item is NA in just one vector, it reports FALSE; all other comparisons (between non-NA items) behaves the same.

  1. # This function returns TRUE wherever elements are the same, including NA's,
  2. # and FALSE everywhere else.
  3. compareNA <- function(v1,v2) {
  4. same <- (v1 == v2) | (is.na(v1) & is.na(v2))
  5. same[is.na(same)] <- FALSE
  6. return(same)
  7. }

Examples of the function in use

Comparing boolean vectors:

  1. compareNA(df$a, df$b)
  2. #> [1] TRUE FALSE FALSE FALSE TRUE FALSE FALSE FALSE TRUE
  3. # Same comparison, presented as another column
  4. data.frame(df, isSame = compareNA(df$a,df$b))
  5. #> a b isSame
  6. #> 1 TRUE TRUE TRUE
  7. #> 2 TRUE FALSE FALSE
  8. #> 3 TRUE NA FALSE
  9. #> 4 FALSE TRUE FALSE
  10. #> 5 FALSE FALSE TRUE
  11. #> 6 FALSE NA FALSE
  12. #> 7 NA TRUE FALSE
  13. #> 8 NA FALSE FALSE
  14. #> 9 NA NA TRUE

It also works with factors, even if the levels of the factors are in different orders:

  1. # Create sample data frame with factors.
  2. df1 <- data.frame(a = factor(c('x','x','x','y','y','y', NA, NA, NA)),
  3. b = factor(c('x','y', NA,'x','y', NA,'x','y', NA)))
  4. # Do the comparison
  5. data.frame(df1, isSame = compareNA(df1$a, df1$b))
  6. #> a b isSame
  7. #> 1 x x TRUE
  8. #> 2 x y FALSE
  9. #> 3 x <NA> FALSE
  10. #> 4 y x FALSE
  11. #> 5 y y TRUE
  12. #> 6 y <NA> FALSE
  13. #> 7 <NA> x FALSE
  14. #> 8 <NA> y FALSE
  15. #> 9 <NA> <NA> TRUE
  16. # It still works if the factor levels are arranged in a different order
  17. df1$b <- factor(df1$b, levels=c('y','x'))
  18. data.frame(df1, isSame = compareNA(df1$a, df1$b))
  19. #> a b isSame
  20. #> 1 x x TRUE
  21. #> 2 x y FALSE
  22. #> 3 x <NA> FALSE
  23. #> 4 y x FALSE
  24. #> 5 y y TRUE
  25. #> 6 y <NA> FALSE
  26. #> 7 <NA> x FALSE
  27. #> 8 <NA> y FALSE
  28. #> 9 <NA> <NA> TRUE