Loading data from a file

Problem

You want to load data from a file.

Solution

Delimited text files

The simplest way to import data is to save it as a text file with delimiters such as tabs or commas (CSV).

  1. data <- read.csv("datafile.csv")
  2. # Load a CSV file that doesn't have headers
  3. data <- read.csv("datafile-noheader.csv", header=FALSE)

The function read.table() is a more general function which allows you to set the delimiter, whether or not there are headers, whether strings are set off with quotes, and more. See ?read.table for more information on the details.

  1. data <- read.table("datafile-noheader.csv",
  2. header=FALSE,
  3. sep="," # use "\t" for tab-delimited files
  4. )

Loading a file with a file chooser

On some platforms, using file.choose() will open a file chooser dialog window. On others, it will simply prompt the user to type in a filename.

  1. data <- read.csv(file.choose())

Treating strings as factors or characters

By default, strings in the data are converted to factors. If you load the data below with read.csv, then all the text columns will be treated as factors, even though it might make more sense to treat some of them as strings. To dothis, use stringsAsFactors=FALSE:

  1. data <- read.csv("datafile.csv", stringsAsFactors=FALSE)
  2. # You might have to convert some columns to factors
  3. data$Sex <- factor(data$Sex)

Another alternative is to load them as factors and convert some columns to characters:

  1. data <- read.csv("datafile.csv")
  2. data$First <- as.character(data$First)
  3. data$Last <- as.character(data$Last)
  4. # Another method: convert columns named "First" and "Last"
  5. stringcols <- c("First","Last")
  6. data[stringcols] <- lapply(data[stringcols], as.character)

Loading a file from the Internet

Data can also be loaded from a URL. These (very long) URLs will load the files linked to below.

  1. data <- read.csv("http://www.cookbook-r.com/Data_input_and_output/Loading_data_from_a_file/datafile.csv")
  2. # Read in a CSV file without headers
  3. data <- read.csv("http://www.cookbook-r.com/Data_input_and_output/Loading_data_from_a_file/datafile-noheader.csv", header=FALSE)
  4. # Manually assign the header names
  5. names(data) <- c("First","Last","Sex","Number")

The data files used above:

datafile.csv:

  1. "First","Last","Sex","Number"
  2. "Currer","Bell","F",2
  3. "Dr.","Seuss","M",49
  4. "","Student",NA,21

datafile-noheader.csv:

  1. "Currer","Bell","F",2
  2. "Dr.","Seuss","M",49
  3. "","Student",NA,21

Fixed-width text files

Suppose your data has fixed-width columns, like this:

  1. First Last Sex Number
  2. Currer Bell F 2
  3. Dr. Seuss M 49
  4. "" Student NA 21

One way to read it in is to simply use read.table() with strip.white=TRUE, which will remove extra spaces.

  1. read.table("clipboard", header=TRUE, strip.white=TRUE)

However, your data file may have columns containing spaces, or columns with no spaces separating them, like this, where the scores column represents six different measurements, each from 0 to 3.

  1. subject sex scores
  2. N 1 M 113311
  3. NE 2 F 112231
  4. S 3 F 111221
  5. W 4 M 011002

In this case, you may need to use the read.fwf() function. If you read the column names from the file, it requires that they be separated with a delimiter like a single tab, space, or comma. If they are separated with multiple spaces, as in this example, you will have to assign the column names directly.

  1. # Assign the column names manually
  2. read.fwf("myfile.txt",
  3. c(7,5,-2,1,1,1,1,1,1), # Width of the columns. -2 means drop those columns
  4. skip=1, # Skip the first line (contains header here)
  5. col.names=c("subject","sex","s1","s2","s3","s4","s5","s6"),
  6. strip.white=TRUE) # Strip out leading and trailing whitespace when reading each
  7. #> subject sex s1 s2 s3 s4 s5 s6
  8. #> 1 N 1 M 1 1 3 3 1 1
  9. #> 2 NE 2 F 1 1 2 2 3 1
  10. #> 3 S 3 F 1 1 1 2 2 1
  11. #> 4 W 4 M 0 1 1 0 0 2
  12. # subject sex s1 s2 s3 s4 s5 s6
  13. # N 1 M 1 1 3 3 1 1
  14. # NE 2 F 1 1 2 2 3 1
  15. # S 3 F 1 1 1 2 2 1
  16. # W 4 M 0 1 1 0 0 2
  17. # If the first row looked like this:
  18. # subject,sex,scores
  19. # Then we could use header=TRUE:
  20. read.fwf("myfile.txt", c(7,5,-2,1,1,1,1,1,1), header=TRUE, strip.white=TRUE)
  21. #> Error in read.table(file = FILE, header = header, sep = sep, row.names = row.names, : more columns than column names

Excel files

The read.xls function in the gdata package can read in Excel files.

  1. library(gdata)
  2. data <- read.xls("data.xls")

See http://cran.r-project.org/doc/manuals/R-data.html#Reading-Excel-spreadsheets.

SPSS data files

The read.spss function in the foreign package can read in SPSS files.

  1. library(foreign)
  2. data <- read.spss("data.sav", to.data.frame=TRUE)