An introduction to ggplot2

William Marble
February 19, 2016

What is ggplot2?

  • ggplot2 is a graphics package for R based on the “Grammar of Graphics” (Leland Wilkinson, 1999/2000)
  • Created by Hadley Wickham, world's foremost R guru
  • The idea is “to take the good parts of base and lattice graphics and none of the bad parts”
  • The learning curve for ggplot comes primarily from learning how to think about data visualization in the way Wickham wants you to think about data visualization

Benefits of ggplot2

The good:

  • The “Grammar of Graphics” gives a systematic way of thinking about lots of different types of graphics in a unified framework
  • Simple to create complex graphics that convey a lot of information
  • Easy to add “layers” to a plot without much extra code
  • Lots of out-of-the-box functions for different types of graphics
  • Excellent documentation (http://docs.ggplot2.org/current/)

Drawbacks of ggplot2

The bad:

  • Learning curve
  • Data must structured in a particular way – may require extra pre-processing of the data
  • Not as customizable as base graphics
  • Strange default settings (grid lines, grey background, weird colors)

Installation

# load ggplot, install if not already installed
if (!require(ggplot2)){
  install.packages("ggplot2")
  require(ggplot2)
}
# and for fun themes
if (!require(ggthemes)){
  install.packages("ggthemes")
  require(ggthemes)
}

Running example

Congressional district-level data; unit of observation is district-year

  • dwnom1 = Congress member's first-dimension DW-NOMINATE score (voteview.com)
  • median = ideology of median donor within district (Bonica)
  • gini = estimated gini coefficient within district
  • party = representative's party
ideol.data = read.csv(file="http://stanford.edu/~wpmarble/data/rep_data.csv")
print(head(ideol.data[, c("year", "dwnom1", "median", "gini", "party")]), digits=2)
  year dwnom1 median gini      party
1 1984  0.234  0.439 0.40 Republican
2 1984  0.354  0.584 0.43 Republican
3 1984  0.343  0.344 0.44 Republican
4 1984 -0.036  0.153 0.42   Democrat
5 1984 -0.202  0.318 0.42   Democrat
6 1984 -0.156  0.023 0.41   Democrat
ggplot(ideol.data, aes(x = dwnom1)) + geom_histogram()

plot of chunk unnamed-chunk-4

ggplot(ideol.data, aes(x = median, y = dwnom1)) + geom_point()