Scatterplots (ggplot2)

Problem

You want to make a scatterplot.

Solution

Suppose this is your data:

  1. set.seed(955)
  2. # Make some noisily increasing data
  3. dat <- data.frame(cond = rep(c("A", "B"), each=10),
  4. xvar = 1:20 + rnorm(20,sd=3),
  5. yvar = 1:20 + rnorm(20,sd=3))
  6. head(dat)
  7. #> cond xvar yvar
  8. #> 1 A -4.252354 3.473157275
  9. #> 2 A 1.702318 0.005939612
  10. #> 3 A 4.323054 -0.094252427
  11. #> 4 A 1.780628 2.072808278
  12. #> 5 A 11.537348 1.215440358
  13. #> 6 A 6.672130 3.608111411
  14. library(ggplot2)

Basic scatterplots with regression lines

  1. ggplot(dat, aes(x=xvar, y=yvar)) +
  2. geom_point(shape=1) # Use hollow circles
  3. ggplot(dat, aes(x=xvar, y=yvar)) +
  4. geom_point(shape=1) + # Use hollow circles
  5. geom_smooth(method=lm) # Add linear regression line
  6. # (by default includes 95% confidence region)
  7. ggplot(dat, aes(x=xvar, y=yvar)) +
  8. geom_point(shape=1) + # Use hollow circles
  9. geom_smooth(method=lm, # Add linear regression line
  10. se=FALSE) # Don't add shaded confidence region
  11. ggplot(dat, aes(x=xvar, y=yvar)) +
  12. geom_point(shape=1) + # Use hollow circles
  13. geom_smooth() # Add a loess smoothed fit curve with confidence region
  14. #> `geom_smooth()` using method = 'loess'

plot of chunk unnamed-chunk-3plot of chunk unnamed-chunk-3plot of chunk unnamed-chunk-3plot of chunk unnamed-chunk-3

Set color/shape by another variable

  1. # Set color by cond
  2. ggplot(dat, aes(x=xvar, y=yvar, color=cond)) + geom_point(shape=1)
  3. # Same, but with different colors and add regression lines
  4. ggplot(dat, aes(x=xvar, y=yvar, color=cond)) +
  5. geom_point(shape=1) +
  6. scale_colour_hue(l=50) + # Use a slightly darker palette than normal
  7. geom_smooth(method=lm, # Add linear regression lines
  8. se=FALSE) # Don't add shaded confidence region
  9. # Extend the regression lines beyond the domain of the data
  10. ggplot(dat, aes(x=xvar, y=yvar, color=cond)) + geom_point(shape=1) +
  11. scale_colour_hue(l=50) + # Use a slightly darker palette than normal
  12. geom_smooth(method=lm, # Add linear regression lines
  13. se=FALSE, # Don't add shaded confidence region
  14. fullrange=TRUE) # Extend regression lines
  15. # Set shape by cond
  16. ggplot(dat, aes(x=xvar, y=yvar, shape=cond)) + geom_point()
  17. # Same, but with different shapes
  18. ggplot(dat, aes(x=xvar, y=yvar, shape=cond)) + geom_point() +
  19. scale_shape_manual(values=c(1,2)) # Use a hollow circle and triangle

plot of chunk unnamed-chunk-4plot of chunk unnamed-chunk-4plot of chunk unnamed-chunk-4plot of chunk unnamed-chunk-4plot of chunk unnamed-chunk-4

See Colors (ggplot2)) and Shapes and line types for more information about colors and shapes.

Handling overplotting

If you have many data points, or if your data scales are discrete, then the data points might overlap and it will be impossible to see if there are many points at the same location.

  1. # Round xvar and yvar to the nearest 5
  2. dat$xrnd <- round(dat$xvar/5)*5
  3. dat$yrnd <- round(dat$yvar/5)*5
  4. # Make each dot partially transparent, with 1/4 opacity
  5. # For heavy overplotting, try using smaller values
  6. ggplot(dat, aes(x=xrnd, y=yrnd)) +
  7. geom_point(shape=19, # Use solid circles
  8. alpha=1/4) # 1/4 opacity
  9. # Jitter the points
  10. # Jitter range is 1 on the x-axis, .5 on the y-axis
  11. ggplot(dat, aes(x=xrnd, y=yrnd)) +
  12. geom_point(shape=1, # Use hollow circles
  13. position=position_jitter(width=1,height=.5))

plot of chunk unnamed-chunk-5plot of chunk unnamed-chunk-5