Bar and line graphs (ggplot2)

Problem

You want to do make basic bar or line graphs.

Solution

To make graphs with ggplot2, the data must be in a data frame, and in “long” (as opposed to wide) format. If your data needs to be restructured, see this page for more information.

Basic graphs with discrete x-axis

With bar graphs, there are two different things that the heights of bars commonly represent:

  • The count of cases for each group – typically, each x value represents one group. This is done with stat_bin, which calculates the number of cases in each group (if x is discrete, then each x value is a group; if x is continuous, then all the data is automatically in one group, unless you specifiy grouping with group=xx).
  • The value of a column in the data set. This is done with stat_identity, which leaves the y values unchanged.
x axis is Height of bar represents Common name
Continuous Count Histogram
Discrete Count Bar graph
Continuous Value Bar graph
Discrete Value Bar graph

In ggplot2, the default is to use stat_bin, so that the bar height represents the count of cases.

Bar graphs of values

Here is some sample data (derived from the tips dataset in the reshape2 package):

  1. dat <- data.frame(
  2. time = factor(c("Lunch","Dinner"), levels=c("Lunch","Dinner")),
  3. total_bill = c(14.89, 17.23)
  4. )
  5. dat
  6. #> time total_bill
  7. #> 1 Lunch 14.89
  8. #> 2 Dinner 17.23
  9. # Load the ggplot2 package
  10. library(ggplot2)

In these examples, the height of the bar will represent the value in a column of the data frame. This is done by using stat="identity" instead of the default, stat="bin".

These are the variable mappings used here:

  • time: x-axis and sometimes color fill
  • total_bill: y-axis
  1. # Very basic bar graph
  2. ggplot(data=dat, aes(x=time, y=total_bill)) +
  3. geom_bar(stat="identity")
  4. # Map the time of day to different fill colors
  5. ggplot(data=dat, aes(x=time, y=total_bill, fill=time)) +
  6. geom_bar(stat="identity")
  7. ## This would have the same result as above
  8. # ggplot(data=dat, aes(x=time, y=total_bill)) +
  9. # geom_bar(aes(fill=time), stat="identity")
  10. # Add a black outline
  11. ggplot(data=dat, aes(x=time, y=total_bill, fill=time)) +
  12. geom_bar(colour="black", stat="identity")
  13. # No legend, since the information is redundant
  14. ggplot(data=dat, aes(x=time, y=total_bill, fill=time)) +
  15. geom_bar(colour="black", stat="identity") +
  16. guides(fill=FALSE)

plot of chunk unnamed-chunk-3plot of chunk unnamed-chunk-3plot of chunk unnamed-chunk-3plot of chunk unnamed-chunk-3

The desired bar graph might look something like this:

  1. # Add title, narrower bars, fill color, and change axis labels
  2. ggplot(data=dat, aes(x=time, y=total_bill, fill=time)) +
  3. geom_bar(colour="black", fill="#DD8888", width=.8, stat="identity") +
  4. guides(fill=FALSE) +
  5. xlab("Time of day") + ylab("Total bill") +
  6. ggtitle("Average bill for 2 people")

plot of chunk unnamed-chunk-4

See ../Colors (ggplot2)) for more information on colors.

Bar graphs of counts

In these examples, the height of the bar will represent the count of cases.This is done by using stat="bin" (which is the default).

We’ll start with the tips data from the reshape2 package:

  1. library(reshape2)
  2. # Look at fist several rows
  3. head(tips)
  4. #> total_bill tip sex smoker day time size
  5. #> 1 16.99 1.01 Female No Sun Dinner 2
  6. #> 2 10.34 1.66 Male No Sun Dinner 3
  7. #> 3 21.01 3.50 Male No Sun Dinner 3
  8. #> 4 23.68 3.31 Male No Sun Dinner 2
  9. #> 5 24.59 3.61 Female No Sun Dinner 4
  10. #> 6 25.29 4.71 Male No Sun Dinner 4

To get a bar graph of counts, don’t map a variable to y, and use stat="bin" (which is the default) instead of stat="identity":

  1. # Bar graph of counts
  2. ggplot(data=tips, aes(x=day)) +
  3. geom_bar(stat="count")
  4. ## Equivalent to this, since stat="bin" is the default:
  5. # ggplot(data=tips, aes(x=day)) +
  6. # geom_bar()

plot of chunk unnamed-chunk-6

Line graphs

For line graphs, the data points must be grouped so that it knows which points to connect. In this case, it is simple – all points should be connected, so group=1. When more variables are used and multiple lines are drawn, the grouping for lines is usually done by variable (this is seen in later examples).

These are the variable mappings used here:

  • time: x-axis
  • total_bill: y-axis
  1. # Basic line graph
  2. ggplot(data=dat, aes(x=time, y=total_bill, group=1)) +
  3. geom_line()
  4. ## This would have the same result as above
  5. # ggplot(data=dat, aes(x=time, y=total_bill)) +
  6. # geom_line(aes(group=1))
  7. # Add points
  8. ggplot(data=dat, aes(x=time, y=total_bill, group=1)) +
  9. geom_line() +
  10. geom_point()
  11. # Change color of both line and points
  12. # Change line type and point type, and use thicker line and larger points
  13. # Change points to circles with white fill
  14. ggplot(data=dat, aes(x=time, y=total_bill, group=1)) +
  15. geom_line(colour="red", linetype="dashed", size=1.5) +
  16. geom_point(colour="red", size=4, shape=21, fill="white")

plot of chunk unnamed-chunk-7plot of chunk unnamed-chunk-7plot of chunk unnamed-chunk-7

The desired line graph might look something like this:

  1. # Change the y-range to go from 0 to the maximum value in the total_bill column,
  2. # and change axis labels
  3. ggplot(data=dat, aes(x=time, y=total_bill, group=1)) +
  4. geom_line() +
  5. geom_point() +
  6. expand_limits(y=0) +
  7. xlab("Time of day") + ylab("Total bill") +
  8. ggtitle("Average bill for 2 people")

plot of chunk unnamed-chunk-8

See ../Colors (ggplot2)) for more information on colors, and ../Shapes and line types for information on shapes and line types.

Graphs with more variables

This data will be used for the examples below:

  1. dat1 <- data.frame(
  2. sex = factor(c("Female","Female","Male","Male")),
  3. time = factor(c("Lunch","Dinner","Lunch","Dinner"), levels=c("Lunch","Dinner")),
  4. total_bill = c(13.53, 16.81, 16.24, 17.42)
  5. )
  6. dat1
  7. #> sex time total_bill
  8. #> 1 Female Lunch 13.53
  9. #> 2 Female Dinner 16.81
  10. #> 3 Male Lunch 16.24
  11. #> 4 Male Dinner 17.42

This is derived from the tips dataset in the reshape2 package.

Bar graphs

These are the variable mappings used here:

  • time: x-axis
  • sex: color fill
  • total_bill: y-axis.
  1. # Stacked bar graph -- this is probably not what you want
  2. ggplot(data=dat1, aes(x=time, y=total_bill, fill=sex)) +
  3. geom_bar(stat="identity")
  4. # Bar graph, time on x-axis, color fill grouped by sex -- use position_dodge()
  5. ggplot(data=dat1, aes(x=time, y=total_bill, fill=sex)) +
  6. geom_bar(stat="identity", position=position_dodge())
  7. ggplot(data=dat1, aes(x=time, y=total_bill, fill=sex)) +
  8. geom_bar(stat="identity", position=position_dodge(), colour="black")
  9. # Change colors
  10. ggplot(data=dat1, aes(x=time, y=total_bill, fill=sex)) +
  11. geom_bar(stat="identity", position=position_dodge(), colour="black") +
  12. scale_fill_manual(values=c("#999999", "#E69F00"))

plot of chunk unnamed-chunk-10plot of chunk unnamed-chunk-10plot of chunk unnamed-chunk-10plot of chunk unnamed-chunk-10

It’s easy to change which variable is mapped the x-axis and which is mapped to the fill.

  1. # Bar graph, time on x-axis, color fill grouped by sex -- use position_dodge()
  2. ggplot(data=dat1, aes(x=sex, y=total_bill, fill=time)) +
  3. geom_bar(stat="identity", position=position_dodge(), colour="black")

plot of chunk unnamed-chunk-11

See ../Colors (ggplot2)) for more information on colors.

Line graphs

These are the variable mappings used here:

  • time: x-axis
  • sex: line color
  • total_bill: y-axis.

To draw multiple lines, the points must be grouped by a variable; otherwise all points will be connected by a single line. In this case, we want them to be grouped by sex.

  1. # Basic line graph with points
  2. ggplot(data=dat1, aes(x=time, y=total_bill, group=sex)) +
  3. geom_line() +
  4. geom_point()
  5. # Map sex to color
  6. ggplot(data=dat1, aes(x=time, y=total_bill, group=sex, colour=sex)) +
  7. geom_line() +
  8. geom_point()
  9. # Map sex to different point shape, and use larger points
  10. ggplot(data=dat1, aes(x=time, y=total_bill, group=sex, shape=sex)) +
  11. geom_line() +
  12. geom_point()
  13. # Use thicker lines and larger points, and hollow white-filled points
  14. ggplot(data=dat1, aes(x=time, y=total_bill, group=sex, shape=sex)) +
  15. geom_line(size=1.5) +
  16. geom_point(size=3, fill="white") +
  17. scale_shape_manual(values=c(22,21))

plot of chunk unnamed-chunk-12plot of chunk unnamed-chunk-12plot of chunk unnamed-chunk-12plot of chunk unnamed-chunk-12

It’s easy to change which variable is mapped the x-axis and which is mapped to the color or shape.

  1. ggplot(data=dat1, aes(x=sex, y=total_bill, group=time, shape=time, color=time)) +
  2. geom_line() +
  3. geom_point()

plot of chunk unnamed-chunk-13

See ../Colors (ggplot2)) for more information on colors, and ../Shapes and line types for information on shapes and line types.

Finished examples

The finished graphs might look like these:

  1. # A bar graph
  2. ggplot(data=dat1, aes(x=time, y=total_bill, fill=sex)) +
  3. geom_bar(colour="black", stat="identity",
  4. position=position_dodge(),
  5. size=.3) + # Thinner lines
  6. scale_fill_hue(name="Sex of payer") + # Set legend title
  7. xlab("Time of day") + ylab("Total bill") + # Set axis labels
  8. ggtitle("Average bill for 2 people") + # Set title
  9. theme_bw()
  10. # A line graph
  11. ggplot(data=dat1, aes(x=time, y=total_bill, group=sex, shape=sex, colour=sex)) +
  12. geom_line(aes(linetype=sex), size=1) + # Set linetype by sex
  13. geom_point(size=3, fill="white") + # Use larger points, fill with white
  14. expand_limits(y=0) + # Set y range to include 0
  15. scale_colour_hue(name="Sex of payer", # Set legend title
  16. l=30) + # Use darker colors (lightness=30)
  17. scale_shape_manual(name="Sex of payer",
  18. values=c(22,21)) + # Use points with a fill color
  19. scale_linetype_discrete(name="Sex of payer") +
  20. xlab("Time of day") + ylab("Total bill") + # Set axis labels
  21. ggtitle("Average bill for 2 people") + # Set title
  22. theme_bw() +
  23. theme(legend.position=c(.7, .4)) # Position legend inside
  24. # This must go after theme_bw

plot of chunk unnamed-chunk-14plot of chunk unnamed-chunk-14

In the line graph, the reason that the legend title, “Sex of payer”, must be specified three times is so that there is only one legend. The issue is explained here#With_lines_and_points).

With a numeric x-axis

When the variable on the x-axis is numeric, it is sometimes useful to treat it as continuous, and sometimes useful to treat it as categorical. In this data set, the dose is a numeric variable with values 0.5, 1.0, and 2.0. It might be useful to treat these values as equal categories when making a graph.

  1. datn <- read.table(header=TRUE, text='
  2. supp dose length
  3. OJ 0.5 13.23
  4. OJ 1.0 22.70
  5. OJ 2.0 26.06
  6. VC 0.5 7.98
  7. VC 1.0 16.77
  8. VC 2.0 26.14
  9. ')

This is derived from the ToothGrowth dataset included with R.

With x-axis treated as continuous

A simple graph might put dose on the x-axis as a numeric value. It is possible to make a line graph this way, but not a bar graph.

  1. ggplot(data=datn, aes(x=dose, y=length, group=supp, colour=supp)) +
  2. geom_line() +
  3. geom_point()

plot of chunk unnamed-chunk-16

With x-axis treated as categorical

If you wish to treat it as a categorical variable instead of a numeric one, it must be converted to a factor. This can be done by modifying the data frame, or by changing the specification of the graph.

  1. # Copy the data frame and convert dose to a factor
  2. datn2 <- datn
  3. datn2$dose <- factor(datn2$dose)
  4. ggplot(data=datn2, aes(x=dose, y=length, group=supp, colour=supp)) +
  5. geom_line() +
  6. geom_point()
  7. # Use the original data frame, but put factor() directly in the plot specification
  8. ggplot(data=datn, aes(x=factor(dose), y=length, group=supp, colour=supp)) +
  9. geom_line() +
  10. geom_point()

plot of chunk unnamed-chunk-17plot of chunk unnamed-chunk-17

It is also possible to make a bar graph when the variable is treated as categorical rather than numeric.

  1. # Use datn2 from above
  2. ggplot(data=datn2, aes(x=dose, y=length, fill=supp)) +
  3. geom_bar(stat="identity", position=position_dodge())
  4. # Use the original data frame, but put factor() directly in the plot specification
  5. ggplot(data=datn, aes(x=factor(dose), y=length, fill=supp)) +
  6. geom_bar(stat="identity", position=position_dodge())

plot of chunk unnamed-chunk-18plot of chunk unnamed-chunk-18