Working with NULL, NA, and NaN

Problem

You want to properly handle NULL, NA, or NaN values.

Solution

Sometimes your data will include NULL, NA, or NaN. These work somewhat differently from “normal” values, and may require explicit testing.

Here are some examples of comparisons with these values:

  1. x <- NULL
  2. x > 5
  3. # logical(0)
  4. y <- NA
  5. y > 5
  6. # NA
  7. z <- NaN
  8. z > 5
  9. # NA

Here’s how to test whether a variable has one of these values:

  1. is.null(x)
  2. # TRUE
  3. is.na(y)
  4. # TRUE
  5. is.nan(z)
  6. # TRUE

Note that NULL is different from the other two. NULL means that there is no value, while NA and NaN mean that there is some value, although one that is perhaps not usable. Here’s an illustration of the difference:

  1. # Is y null?
  2. is.null(y)
  3. # FALSE
  4. # Is x NA?
  5. is.na(x)
  6. # logical(0)
  7. # Warning message:
  8. # In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL'

In the first case, it checks if y is NULL, and the answer is no. In the second case, it tries to check if x is `NA, but there is no value to be checked.

Ignoring “bad” values in vector summary functions

If you run functions like mean() or sum() on a vector containing NA or NaN, they will return NA and NaN, which is generally unhelpful, though this will alert you to the presence of the bad value. Many of these functions take the flag na.rm, which tells them to ignore these values.

  1. vy <- c(1, 2, 3, NA, 5)
  2. # 1 2 3 NA 5
  3. mean(vy)
  4. # NA
  5. mean(vy, na.rm=TRUE)
  6. # 2.75
  7. vz <- c(1, 2, 3, NaN, 5)
  8. # 1 2 3 NaN 5
  9. sum(vz)
  10. # NaN
  11. sum(vz, na.rm=TRUE)
  12. # 11
  13. # NULL isn't a problem, because it doesn't exist
  14. vx <- c(1, 2, 3, NULL, 5)
  15. # 1 2 3 5
  16. sum(vx)
  17. # 11

Removing bad values from a vector

These values can be removed from a vector by filtering using is.na() or is.nan().

  1. vy
  2. # 1 2 3 NA 5
  3. vy[ !is.na(vy) ]
  4. # 1 2 3 5
  5. vz
  6. # 1 2 3 NaN 5
  7. vz[ !is.nan(vz) ]
  8. # 1 2 3 5

Notes

There are also the infinite numerical values Inf and -Inf, and the associated functions is.finite() and is.infinite().

Also see /Manipulating data/Comparing vectors or factors with NA