In [ ]:
library('tidyverse')

gglpot always has this form:

ggplot(data = <DATA>) + 
  <GEOM_FUNCTION>(mapping = aes(<MAPPINGS>))

In [ ]:
ggplot(data = mpg)

In [ ]:
str(mpg)

In [ ]:
ggplot(data = mpg) +
  geom_point(mapping = aes(x = cyl, y = hwy))

In [ ]:
ggplot(data = mpg) +
  geom_point(mapping = aes(x = drv, y = class))

3.3 Aesthetic mapping

  • aesthetic is the visual property of the objects in plot, size, shape, color, alpha (transparency), x/y location
  • map an aesthetic to a variable, associate the name of varible with that of aesthetic
  • ggplot2 assign level to aesthetic.

In [ ]:
ggplot(data = mpg) +
  geom_point(mapping = aes(x = displ, y = hwy), color = "blue", shape = 1)

3.4.1 exercises


In [ ]:
str(mpg)

In [ ]:
ggplot(data = mpg) +
  geom_point(mapping = aes(x = displ, y = hwy, color = displ))

In [ ]:
ggplot(data = mpg) +
  geom_point(mapping = aes(x = displ, y = hwy, color = class, alpha = class), size = 2)

In [ ]:
# 4:
# For shapes that have a border (like 21), you can colour the inside and
# outside separately. Use the stroke aesthetic to modify the width of the
# border
ggplot(mtcars, aes(wt, mpg)) +
  geom_point(shape = 21, colour = "black", fill = "white", size = 5, stroke = 5)

In [ ]:
ggplot(data = mpg) +
  geom_point(mapping = aes(x = displ, y = hwy, color = displ < 5))

3.5 Facets


In [ ]:
ggplot(data = mpg) +
  geom_point(mapping = aes(x = displ, y = hwy)) +
  facet_wrap(~ class, nrow = 2)

In [ ]:
ggplot(data = mpg) +
  geom_point(mapping = aes(x = displ, y = hwy)) +
  facet_grid(drv ~ cyl)

In [ ]:
ggplot(data = mpg) +
  geom_point(mapping = aes(x = displ, y = hwy)) +
  facet_grid(. ~ cyl)

3.5.1


In [ ]:
# 1.
str(mpg)
# ggplot(data = mpg) +
#  geom_point(mapping = aes(x = displ, y = hwy)) +
#  facet_grid(. ~ displ)

In [ ]:
# 3.
ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy)) +
  facet_grid(drv ~ .)

In [ ]:
# 5. because facet_wrap is for 1-d. facet_wrap is for 2-d.
# 6. We don't have much space in a row.

3.6 Geometric objects

  • geom is geometrical object that a plot uses to represent data
  • geom_smooth takes "linetype" aesthetic
  • We can have multiple geoms on one plot, then it necessary to move mapping to ggplot()
    • And then we can have localized customization for different geoms

In [ ]:
ggplot(data = mpg) + 
  geom_smooth(mapping = aes(x = displ, y = hwy, linetype = drv, color = drv)) +
  geom_point(mapping = aes(x = displ, y = hwy, shape = drv, color = drv))

In [ ]:
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
  geom_point(mapping = aes(color = class)) +
    geom_smooth()

In [ ]:
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
  geom_point(mapping = aes(color = class)) +
    geom_smooth(data = filter(mpg, class == "subcompact"), se = FALSE)

3.6.1 Exercises


In [ ]:
# 6.1
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + 
  geom_point() + 
  geom_smooth()

In [ ]:
ggplot() + 
  geom_point(data = mpg, mapping = aes(x = displ, y = hwy)) + 
  geom_smooth(data = mpg, mapping = aes(x = displ, y = hwy))

In [ ]:
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
    geom_point(mapping = aes(color = drv)) +
    geom_smooth(se = FALSE)

In [ ]:
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
    geom_point(mapping = aes(color = drv)) +
    geom_smooth(mapping = aes(linetype = drv), se = FALSE)

In [ ]:
# 6.6
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
    geom_point(color = "white", size = 4) +
  geom_point(mapping = aes(color = drv))

#      geom_point(color = "white", size = 2) +

3.7 Statistical transformations

  • bar plot uses count as y, but the dataset doesn't contain count. geom_bar do the statistic transformation. Other geoms do the transfomration like:
    • bars, histogram, frequency ploygon: bin your data and then plot bin counts
    • smoothers: fit a model
    • boxplots: compute robust summary
  • You can generally use geoms and stats interchangeably. For example, you can recreate the previous plot using stat_count() instead of geom_bar(). We can do this because every geom has a default stat; and every stat has a default geom.
  • 3 cases we might want to use other stat other than the default one:
    • override the default stat
    • override the default mapping from transformed variables to aesthetics
    • want to draw greater attention to the statistical transformation in your code, e.g. if you use
  • To see a complete list of stats, try the ggplot2 cheatsheet

In [ ]:
summary(diamonds)

In [ ]:
ggplot(data = diamonds) +
    geom_bar(mapping = aes(x = cut, fill = cut))

In [ ]:
demo <- tribble(
  ~cut,         ~freq,
  "Fair",       1610,
  "Good",       4906,
  "Very Good",  12082,
  "Premium",    13791,
  "Ideal",      21551
)
summary(demo)

In [ ]:
ggplot(data = demo) +
    geom_bar(mapping = aes(x = cut, y = freq), stat = "identity")

In [ ]:
ggplot(data = diamonds) +
    geom_bar(mapping = aes(x = cut, y = ..prop.., group = 1))

3.7.1 Exercises


In [ ]:
# 1. geom_pointrange is associated with stat_summary()
ggplot(data = diamonds) +
    geom_pointrange(mapping = aes(x = cut, y = depth), stat = "summary",
                    fun.ymin = min, fun.ymax = max, fun.y = median)

In [ ]:
# 2. geom does geom_bar(stat = "identity")
ggplot(data = demo) +
    geom_col(mapping = aes(x = cut, y = freq))

In [ ]:
# 3. See: http://ggplot2.tidyverse.org/reference/index.html#section-layer-geoms

In [ ]:
# 4. stat_smooth computed variables: y/y_min/y_max/se. method/formula/span.

In [ ]:
# 5. The problem with the following code is that it treats each combination
# of cut and color as one group, so the proportion is always 1 (100%)
# But our intention is to treat all data as one group
ggplot(data = diamonds) + 
  geom_bar(mapping = aes(x = cut, fill = color, y = ..prop..))

3.8 position adjustment

  • In bar graph, if you map x and color to different variable, It will stack the data together
    • the default is "stack"
    • "identity" leaves the data as it is.
    • "dodge" places them side-by-side
    • "fill" will normalize them
    • "jitter" will add random noise to the data
  • Each position has a corresponding position_ function.

In [ ]:
ggplot(data = diamonds) +
    geom_bar(mapping = aes(x = cut, fill = clarity))

In [ ]:
ggplot(data = diamonds) +
    geom_bar(mapping = aes(x = cut, fill = clarity), position = "identity", alpha = 0.2)

In [ ]:
ggplot(data = diamonds) +
    geom_bar(mapping = aes(x = cut, colour = clarity), position = "identity", alpha = 0.2, fill = NA)

In [ ]:
ggplot(data = diamonds) +
    geom_bar(mapping = aes(x = cut, fill = clarity), position = "fill")

In [ ]:
ggplot(data = diamonds) +
    geom_bar(mapping = aes(x = cut, fill = clarity), position = "dodge")

In [ ]:
ggplot(data = mpg) +
    geom_point(mapping = aes(x = displ, y = hwy))

In [ ]:
ggplot(data = mpg) +
    geom_point(mapping = aes(x = displ, y = hwy), position = "jitter")

3.8.1 Exercises

  1. What is the problem with this plot? How could you improve it?

A: overplotting. Solution is to use geom_jitter or geom_count

  1. What parameters to geom_jitter() control the amount of jittering?

A: width and height

  1. Compare and contrast geom_jitter() with geom_count().

In [ ]:
ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) + 
  geom_point() +
  geom_jitter()

In [ ]:
ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) + 
  geom_point() +
  geom_count()
  1. What’s the default position adjustment for geom_boxplot()? Create a visualisation of the mpg dataset that demonstrates it.

A: The default position is "dodge"


In [ ]:
ggplot(data = mpg, mapping = aes(x = class, y = hwy)) +
    geom_boxplot(mapping = aes(color = drv))

3.9 Coordinate systems

  • coord_flip()
  • coord_quickmap()
  • coord_polar()

In [ ]:
bar <- ggplot(data = diamonds) +
geom_bar(
    mapping = aes(x = cut, fill = cut),
    show.legend = FALSE,
    width = 1
) +
theme(aspect.ratio = 1) +
labs(x = NULL, y = NULL)
bar + coord_flip()

In [ ]:
bar + coord_polar()

3.9.1 Exercises

  1. Turn a stacked bar chart into a pie chart using coord_polar().

In [ ]:
ggplot(data = diamonds) +
    geom_bar(mapping = aes(x = cut, fill = clarity), width = 1) +
    coord_polar()
  1. What does labs() do? Read the documentation.

A: Control labels of axis

  1. What does the plot below tell you about the relationship between city and highway mpg? Why is coord_fixed() important? What does geom_abline() do?

A: coord_fixed() makes the aspect ratio to be 1


In [ ]:
ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) +
  geom_point() + 
  geom_abline() +
  coord_fixed()

3.10 The layered grammar of graphics

The updated formula

ggplot(data = <DATA>) + 
  <GEOM_FUNCTION>(
     mapping = aes(<MAPPINGS>),
     stat = <STAT>, 
     position = <POSITION>
  ) +
  <COORDINATE_FUNCTION> +
  <FACET_FUNCTION>