Histogram and density plot

Problem

You want to make a histogram or density plot.

Solution

Some sample data: these two vectors contain 200 data points each:

  1. set.seed(1234)
  2. rating <- rnorm(200)
  3. head(rating)
  4. #> [1] -1.2070657 0.2774292 1.0844412 -2.3456977 0.4291247 0.5060559
  5. rating2 <- rnorm(200, mean=.8)
  6. head(rating2)
  7. #> [1] 1.2852268 1.4967688 0.9855139 1.5007335 1.1116810 1.5604624

When plotting multiple groups of data, some graphing routines require a data frame with one column for the grouping variable and one for the measure variable.

  1. # Make a column to indicate which group each value is in
  2. cond <- factor( rep(c("A","B"), each=200) )
  3. data <- data.frame(cond, rating = c(rating,rating2))
  4. head(data)
  5. #> cond rating
  6. #> 1 A -1.2070657
  7. #> 2 A 0.2774292
  8. #> 3 A 1.0844412
  9. #> 4 A -2.3456977
  10. #> 5 A 0.4291247
  11. #> 6 A 0.5060559
  1. # Histogram
  2. hist(rating)
  3. # Use 8 bins (this is only approximate - it places boundaries on nice round numbers)
  4. # Make it light blue #CCCCFF
  5. # Instead of showing count, make area sum to 1, (freq=FALSE)
  6. hist(rating, breaks=8, col="#CCCCFF", freq=FALSE)
  7. # Put breaks at every 0.6
  8. boundaries <- seq(-3, 3.6, by=.6)
  9. boundaries
  10. #> [1] -3.0 -2.4 -1.8 -1.2 -0.6 0.0 0.6 1.2 1.8 2.4 3.0 3.6
  11. hist(rating, breaks=boundaries)
  12. # Kernel density plot
  13. plot(density(rating))

plot of chunk unnamed-chunk-4plot of chunk unnamed-chunk-4plot of chunk unnamed-chunk-4plot of chunk unnamed-chunk-4

Multiple groups with kernel density plots.

This code is from: http://onertipaday.blogspot.com/2007/09/plotting-two-or-more-overlapping.html

  1. plot.multi.dens <- function(s)
  2. {
  3. junk.x = NULL
  4. junk.y = NULL
  5. for(i in 1:length(s)) {
  6. junk.x = c(junk.x, density(s[[i]])$x)
  7. junk.y = c(junk.y, density(s[[i]])$y)
  8. }
  9. xr <- range(junk.x)
  10. yr <- range(junk.y)
  11. plot(density(s[[1]]), xlim = xr, ylim = yr, main = "")
  12. for(i in 1:length(s)) {
  13. lines(density(s[[i]]), xlim = xr, ylim = yr, col = i)
  14. }
  15. }
  16. # the input of the following function MUST be a numeric list
  17. plot.multi.dens( list(rating, rating2))

plot of chunk unnamed-chunk-5

The sm package also includes a way of doing multiple density plots. The data must be in a data frame.

  1. library(sm)
  2. sm.density.compare(data$rating, data$cond)
  3. # Add a legend (the color numbers start from 2 and go up)
  4. legend("topright", levels(data$cond), fill=2+(0:nlevels(data$cond)))

plot of chunk unnamed-chunk-6