Writing data to a file

Problem

You want to write data to a file.

Solution

Writing to a delimited text file

The easiest way to do this is to use write.csv(). By default, write.csv() includes row names, but these are usually unnecessary and may cause confusion.

  1. # A sample data frame
  2. data <- read.table(header=TRUE, text='
  3. subject sex size
  4. 1 M 7
  5. 2 F NA
  6. 3 F 9
  7. 4 M 11
  8. ')
  9. # Write to a file, suppress row names
  10. write.csv(data, "data.csv", row.names=FALSE)
  11. # Same, except that instead of "NA", output blank cells
  12. write.csv(data, "data.csv", row.names=FALSE, na="")
  13. # Use tabs, suppress row names and column names
  14. write.table(data, "data.csv", sep="\t", row.names=FALSE, col.names=FALSE)

Saving in R data format

write.csv() and write.table() are best for interoperability with other data analysis programs. They will not, however, preserve special attributes of the data structures, such as whether a column is a character type or factor, or the order of levels in factors. In order to do that, it should be written out in a special format for R.

Below are are three primary ways of doing this:

The first method is to output R source code which, when run, will re-create the object. This should work for most data objects, but it may not be able to faithfully re-create some more complicated data objects.

  1. # Save in a text format that can be easily loaded in R
  2. dump("data", "data.Rdmpd")
  3. # Can save multiple objects:
  4. dump(c("data", "data1"), "data.Rdmpd")
  5. # To load the data again:
  6. source("data.Rdmpd")
  7. # When loaded, the original data names will automatically be used.

The next method is to write out individual data objects in RDS format. This format can be binary or ASCII. Binary is more compact, while ASCII will be more efficient with version control systems like Git.

  1. # Save a single object in binary RDS format
  2. saveRDS(data, "data.rds")
  3. # Or, using ASCII format
  4. saveRDS(data, "data.rds", ascii=TRUE)
  5. # To load the data again:
  6. data <- readRDS("data.rds")

It’s also possible to save multiple objects into an single file, using the RData format.

  1. # Saving multiple objects in binary RData format
  2. save(data, file="data.RData")
  3. # Or, using ASCII format
  4. save(data, file="data.RData", ascii=TRUE)
  5. # Can save multiple objects
  6. save(data, data1, file="data.RData")
  7. # To load the data again:
  8. load("data.RData")

An important difference between saveRDS() and save() is that, with the former, when you readRDS() the data, you specify the name of the object, and with the latter, when you load() the data, the original object names are automatically used. Automatically using the original object names can sometimes simplify a workflow, but it can also be a drawback if the data object is meant to be distributed to others for use in a different environment.