Skip to content
Snippets Groups Projects
Forked from LBMC / Hub / formations / R_basis
184 commits behind the upstream repository.
title: "R#3: stats with ggplot2"
author: "Laurent Modolo [laurent.modolo@ens-lyon.fr](mailto:laurent.modolo@ens-lyon.fr)"
date: "08 Nov 2019"
output:
  slidy_presentation:
    highlight: tango
  beamer_presentation:
    theme: metropolis
    slide_level: 3
    fig_caption: no
    df_print: tibble
    highlight: tango
    latex_engine: xelatex
knitr::opts_chunk$set(echo = FALSE)
library(tidyverse)

R#3: stats with ggplot2

The goal of this practical is to practices advanced features of ggplot2.

The objectives of this session will be to:

  • learn about statistical transformations
  • practices position adjustments
  • change the coordinate systems

ggplot2 statistical transformations

We are going to use the diamonds data set included in tidyverse.

  • Use the help and View command to explore this data set.
  • Try the str command, which information are displayed ?

ggplot2 statistical transformations

ggplot(data = diamonds) + 
  geom_bar(mapping = aes(x = cut))

More diamonds are available with high quality cuts.

ggplot2 statistical transformations

On the x-axis, the chart displays cut, a variable from diamonds. On the y-axis, it displays count, but count is not a variable in diamonds!

The algorithm used to calculate new values for a graph is called a stat, short for statistical transformation. The figure below describes how this process works with geom_bar().

\includegraphics[width=\textwidth]{img/visualization-stat-bar.png}

ggplot2 statistical transformations

You can generally use geoms and stats interchangeably. For example, you can recreate the previous plot using stat_count() instead of geom_bar():

ggplot(data = diamonds) + 
  stat_count(mapping = aes(x = cut))

ggplot2 statistical transformations

Every geom has a default stat; and every stat has a default geom. This means that you can typically use geoms without worrying about the underlying statistical transformation. There are three reasons you might need to use a stat explicitly:

  • You might want to override the default stat. 3_a
  • You might want to override the default mapping from transformed variables to aesthetics. 3_b
  • You might want to draw greater attention to the statistical transformation in your code. 3_c

Statistical transformation challenge

  • What does geom_col() do? How is it different to geom_bar()?
  • What variables does stat_smooth() compute? What parameters control its behaviour?
  • In our proportion bar chart, we need to set group = 1. Why? In other words what is the problem with these two graphs?
ggplot(data = diamonds) + 
  geom_bar(mapping = aes(x = cut, y = ..prop..))
ggplot(data = diamonds) + 
  geom_bar(mapping = aes(x = cut, fill = color, y = ..prop..))

Position adjustments

You can colour a bar chart using either the colour aesthetic, or, more usefully, fill:

ggplot(data = diamonds) + 
  geom_bar(mapping = aes(x = cut, colour = cut))

Position adjustments

You can colour a bar chart using either the colour aesthetic, or, more usefully, fill:

ggplot(data = diamonds) + 
  geom_bar(mapping = aes(x = cut, fill = cut))

Position adjustments

You can also use fill with another variable:

ggplot(data = diamonds) + 
  geom_bar(mapping = aes(x = cut, fill = clarity))

Position adjustments

The stacking is performed by the position adjustment position

ggplot(data = diamonds,
       mapping = aes(x = cut, colour = clarity)) + 
  geom_bar(fill = NA, position = "identity")

Position adjustments

The stacking is performed by the position adjustment position

ggplot(data = diamonds) + 
  geom_bar(mapping = aes(x = cut, fill = clarity),
           position = "fill")

Position adjustments

The stacking is performed by the position adjustment position

ggplot(data = diamonds) + 
  geom_bar(mapping = aes(x = cut, fill = clarity),
           position = "dodge")

Position adjustments

The stacking is performed by the position adjustment position

ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy),
             position = "jitter")

Position adjustments

The stacking is performed by the position adjustment position

ggplot(data = mpg) + 
  geom_jitter(mapping = aes(x = displ, y = hwy))

Position adjustments challenges

  • What is the problem with this plot? How could you improve it?
ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) + 
  geom_point()
  • What parameters to geom_jitter() control the amount of jittering?
  • Compare and contrast geom_jitter() with geom_count()
  • What’s the default position adjustment for geom_boxplot() ? Create a visualisation of the mpg dataset that demonstrates it.

Coordinate systems

Cartesian coordinate system where the x and y positions act independently to determine the location of each point. There are a number of other coordinate systems that are occasionally helpful.

Coordinate systems

ggplot(data = mpg, mapping = aes(x = class, y = hwy)) + 
  geom_boxplot()

Coordinate systems

ggplot(data = mpg, mapping = aes(x = class, y = hwy)) + 
  geom_boxplot() +
  coord_flip()

Coordinate systems

bar <- ggplot(data = diamonds) + 
  geom_bar(
    mapping = aes(x = cut, fill = cut), 
    show.legend = FALSE,
    width = 1
  ) + 
  theme(aspect.ratio = 1) +
  labs(x = NULL, y = NULL)

3_d

Coordinate systems

bar

3_d

Coordinate systems

bar + coord_flip()

Coordinate systems

ggplot(data = mpg) + 
  geom_jitter(mapping = aes(x = cty, y = hwy))

Coordinate systems

ggplot(data = mpg) + 
  geom_jitter(mapping = aes(x = cty, y = hwy)) +
  coord_quickmap()

Coordinate systems

ggplot(data = mpg) + 
  geom_jitter(mapping = aes(x = cty, y = hwy)) +
  scale_y_log10() +
  scale_x_log10()

Coordinate systems

bar + coord_polar()

Coordinate systems challenges

  • Turn a stacked bar chart into a pie chart using coord_polar().
  • What does labs() do? Read the documentation.
  • What does the plot below tell you about the relationship between city and highway mpg? Why is coord_fixed() important? What does geom_abline() do?
ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) +
  geom_point() + 
  geom_abline() +
  coord_fixed()

Coordinate systems challenges

ggplot(data = diamonds) + 
  geom_bar(mapping = aes(x = cut, fill = clarity),
           position = "fill") +
  coord_polar()

Coordinate systems challenges

ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) +
  geom_point() +  geom_abline()

Coordinate systems challenges

ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) +
  geom_point() +  geom_abline() + coord_fixed()