-
Laurent Modolo authoredLaurent Modolo authored
- R#3: stats with ggplot2
- ggplot2 statistical transformations
- ggplot2 statistical transformations
- ggplot2 statistical transformations
- ggplot2 statistical transformations
- ggplot2 statistical transformations
- Statistical transformation challenge
- Position adjustments
- Position adjustments
- Position adjustments
- Position adjustments
- Position adjustments
- Position adjustments
- Position adjustments
- Position adjustments
- Position adjustments challenges
- Coordinate systems
- Coordinate systems
- Coordinate systems
- Coordinate systems
- Coordinate systems
- Coordinate systems
- Coordinate systems
- Coordinate systems
- Coordinate systems
- Coordinate systems
- Coordinate systems challenges
- Coordinate systems challenges
- Coordinate systems challenges
- Coordinate systems challenges
title: "R#3: stats with ggplot2"
author: "Laurent Modolo [laurent.modolo@ens-lyon.fr](mailto:laurent.modolo@ens-lyon.fr)"
date: "08 Nov 2019"
output:
slidy_presentation:
highlight: tango
beamer_presentation:
theme: metropolis
slide_level: 3
fig_caption: no
df_print: tibble
highlight: tango
latex_engine: xelatex
knitr::opts_chunk$set(echo = FALSE)
library(tidyverse)
R#3: stats with ggplot2
The goal of this practical is to practices advanced features of ggplot2
.
The objectives of this session will be to:
- learn about statistical transformations
- practices position adjustments
- change the coordinate systems
ggplot2
statistical transformations
We are going to use the diamonds
data set included in tidyverse
.
- Use the
help
andView
command to explore this data set. - Try the
str
command, which information are displayed ?
ggplot2
statistical transformations
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut))
More diamonds are available with high quality cuts.
ggplot2
statistical transformations
On the x-axis, the chart displays cut, a variable from diamonds. On the y-axis, it displays count, but count is not a variable in diamonds!
The algorithm used to calculate new values for a graph is called a stat, short for statistical transformation. The figure below describes how this process works with geom_bar()
.
\includegraphics[width=\textwidth]{img/visualization-stat-bar.png}
ggplot2
statistical transformations
You can generally use geoms and stats interchangeably. For example, you can recreate the previous plot using stat_count()
instead of geom_bar()
:
ggplot(data = diamonds) +
stat_count(mapping = aes(x = cut))
ggplot2
statistical transformations
Every geom has a default stat; and every stat has a default geom. This means that you can typically use geoms without worrying about the underlying statistical transformation. There are three reasons you might need to use a stat explicitly:
- You might want to override the default stat. 3_a
- You might want to override the default mapping from transformed variables to aesthetics. 3_b
- You might want to draw greater attention to the statistical transformation in your code. 3_c
Statistical transformation challenge
- What does
geom_col()
do? How is it different togeom_bar()
? - What variables does
stat_smooth()
compute? What parameters control its behaviour? - In our proportion bar chart, we need to set
group = 1
. Why? In other words what is the problem with these two graphs?
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, y = ..prop..))
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, fill = color, y = ..prop..))
Position adjustments
You can colour a bar chart using either the colour
aesthetic, or, more usefully, fill
:
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, colour = cut))
Position adjustments
You can colour a bar chart using either the colour
aesthetic, or, more usefully, fill
:
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, fill = cut))
Position adjustments
You can also use fill
with another variable:
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, fill = clarity))
Position adjustments
The stacking is performed by the position adjustment position
ggplot(data = diamonds,
mapping = aes(x = cut, colour = clarity)) +
geom_bar(fill = NA, position = "identity")
Position adjustments
The stacking is performed by the position adjustment position
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, fill = clarity),
position = "fill")
Position adjustments
The stacking is performed by the position adjustment position
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, fill = clarity),
position = "dodge")
Position adjustments
The stacking is performed by the position adjustment position
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy),
position = "jitter")
Position adjustments
The stacking is performed by the position adjustment position
ggplot(data = mpg) +
geom_jitter(mapping = aes(x = displ, y = hwy))
Position adjustments challenges
- What is the problem with this plot? How could you improve it?
ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) +
geom_point()
- What parameters to
geom_jitter()
control the amount of jittering? - Compare and contrast
geom_jitter()
withgeom_count()
- What’s the default position adjustment for
geom_boxplot()
? Create a visualisation of thempg
dataset that demonstrates it.
Coordinate systems
Cartesian coordinate system where the x and y positions act independently to determine the location of each point. There are a number of other coordinate systems that are occasionally helpful.
Coordinate systems
ggplot(data = mpg, mapping = aes(x = class, y = hwy)) +
geom_boxplot()
Coordinate systems
ggplot(data = mpg, mapping = aes(x = class, y = hwy)) +
geom_boxplot() +
coord_flip()
Coordinate systems
bar <- ggplot(data = diamonds) +
geom_bar(
mapping = aes(x = cut, fill = cut),
show.legend = FALSE,
width = 1
) +
theme(aspect.ratio = 1) +
labs(x = NULL, y = NULL)
3_d
Coordinate systems
bar
3_d
Coordinate systems
bar + coord_flip()
Coordinate systems
ggplot(data = mpg) +
geom_jitter(mapping = aes(x = cty, y = hwy))
Coordinate systems
ggplot(data = mpg) +
geom_jitter(mapping = aes(x = cty, y = hwy)) +
coord_quickmap()
Coordinate systems
ggplot(data = mpg) +
geom_jitter(mapping = aes(x = cty, y = hwy)) +
scale_y_log10() +
scale_x_log10()
Coordinate systems
bar + coord_polar()
Coordinate systems challenges
- Turn a stacked bar chart into a pie chart using
coord_polar()
. - What does
labs()
do? Read the documentation. - What does the plot below tell you about the relationship between
city
and highwaympg
? Why iscoord_fixed()
important? What doesgeom_abline()
do?
ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) +
geom_point() +
geom_abline() +
coord_fixed()
Coordinate systems challenges
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, fill = clarity),
position = "fill") +
coord_polar()
Coordinate systems challenges
ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) +
geom_point() + geom_abline()
Coordinate systems challenges
ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) +
geom_point() + geom_abline() + coord_fixed()