From 00f811810e3848f476a2c509b87735d4c6909a00 Mon Sep 17 00:00:00 2001 From: hpolvech <helene.polveche@ens-lyon.fr> Date: Wed, 25 Mar 2020 12:57:38 +0100 Subject: [PATCH] session3: until position part --- session_3/HTML_tuto_s3.Rmd | 355 +++++++++++++++++++++++++++++++++++++ 1 file changed, 355 insertions(+) create mode 100644 session_3/HTML_tuto_s3.Rmd diff --git a/session_3/HTML_tuto_s3.Rmd b/session_3/HTML_tuto_s3.Rmd new file mode 100644 index 0000000..e81f8cc --- /dev/null +++ b/session_3/HTML_tuto_s3.Rmd @@ -0,0 +1,355 @@ +--- +title: "R#3: Transformations with ggplot2" +author: "Laurent Modolo [laurent.modolo@ens-lyon.fr](mailto:laurent.modolo@ens-lyon.fr), Hélène Polvèche [hpolveche@istem.fr](mailto:hpolveche@istem.fr)" +date: "Mars 2020" +output: + html_document: default + pdf_document: default +--- +<style type="text/css"> +h3 { /* Header 3 */ + position: relative ; + color: #729FCF ; + left: 5%; +} +h2 { /* Header 2 */ + color: darkblue ; + left: 10%; +} +h1 { /* Header 1 */ + color: #034b6f ; +} +#pencadre{ + border:1px; + border-style:solid; + border-color: #034b6f; + background-color: #EEF3F9; + padding: 1em; + text-align: center ; + border-radius : 5px 4px 3px 2px; +} +legend{ + color: #034b6f ; +} +#pquestion { + color: darkgreen; + font-weight: bold; +} +</style> + +```{r setup, include=FALSE, cache=TRUE} +knitr::opts_chunk$set(echo = TRUE) +``` + +The goal of this practical is to practices advanced features of `ggplot2`. + +The objectives of this session will be to: + +- learn about statistical transformations +- practices position adjustments +- change the coordinate systems + + \ + +# `ggplot2` statistical transformations + + \ + +```{r packageloaded, include=TRUE, message=FALSE} +library("tidyverse") +``` + + \ + +We are going to use the `diamonds` data set included in `tidyverse`. + +- Use the `help` and `view` command to explore this data set. +- Try the `str` command, which information are displayed ? + +```R +str(diamonds) +``` + +``` +## Classes 'tbl_df', 'tbl' and 'data.frame': 53940 obs. of 10 variables: +## $ carat : num 0.23 0.21 0.23 0.29 0.31 0.24 0.24 0.26 0.22 0.23 ... +## $ cut : Ord.factor w/ 5 levels "Fair"<"Good"<..: 5 4 2 4 2 3 3 3 1 3 ... +## $ color : Ord.factor w/ 7 levels "D"<"E"<"F"<"G"<..: 2 2 2 6 7 7 6 5 2 5 ... +## $ clarity: Ord.factor w/ 8 levels "I1"<"SI2"<"SI1"<..: 2 3 5 4 2 6 7 3 4 5 ... +## $ depth : num 61.5 59.8 56.9 62.4 63.3 62.8 62.3 61.9 65.1 59.4 ... +## $ table : num 55 61 65 58 58 57 57 55 61 61 ... +## $ price : int 326 326 327 334 335 336 336 337 337 338 ... +## $ x : num 3.95 3.89 4.05 4.2 4.34 3.94 3.95 4.07 3.87 4 ... +## $ y : num 3.98 3.84 4.07 4.23 4.35 3.96 3.98 4.11 3.78 4.05 ... +## $ z : num 2.43 2.31 2.31 2.63 2.75 2.48 2.47 2.53 2.49 2.39 ... +``` + + \ + +We saw scatterplot (`geom_point()`), smoothplot (`geom_smooth()`). Now barplot with `geom_bar()` : + +```{r diamonds_barplot, cache = TRUE, fig.width=8, fig.height=4.5, message=FALSE} +ggplot(data = diamonds, mapping = aes(x = cut)) + + geom_bar() +``` + +More diamonds are available with high quality cuts. + +On the x-axis, the chart displays cut, a variable from diamonds. On the y-axis, it displays count, but count is not a variable in diamonds! + +The algorithm used to calculate new values for a graph is called a **stat**, short for statistical transformation. The figure below describes how this process works with `geom_bar()`. + + + + +You can generally use geoms and stats interchangeably. For example, you can recreate the previous plot using `stat_count()` instead of `geom_bar()`: + +```{r diamonds_stat_count, include=TRUE, fig.width=8, fig.height=4.5, message=FALSE} +ggplot(data = diamonds, mapping = aes(x = cut)) + + stat_count() +``` + + \ + +Every geom has a default stat; and every stat has a default geom. This means that you can typically use geoms without worrying about the underlying statistical transformation. There are three reasons you might need to use a stat explicitly: + +- You might want to override the default stat. + +```{r 3_a, include=TRUE, fig.width=8, fig.height=4.5} +demo <- tribble( + ~cut, ~freq, + "Fair", 1610, + "Good", 4906, + "Very Good", 12082, + "Premium", 13791, + "Ideal", 21551 +) + +# (Don't worry that you haven't seen <- or tribble() before. You might be able +# to guess at their meaning from the context, and you will learn exactly what +# they do soon!) + +ggplot(data = demo, mapping = aes(x = cut, y = freq)) + + geom_bar(stat = "identity") + +``` + +- You might want to override the default mapping from transformed variables to aesthetics ( e.g. proportion). +```{r 3_b, include=TRUE, fig.width=8, fig.height=4.5} +ggplot(data = diamonds, mapping = aes(x = cut, y = ..prop.., group = 1)) + + geom_bar() +``` + +- In our proportion bar chart, we need to set `group = 1`. Why? + +```{r diamonds_stats_challenge, include=TRUE, message=FALSE, fig.width=8, fig.height=4.5} +ggplot(data = diamonds, mapping = aes(x = cut, y = ..prop..)) + + geom_bar() +``` + +If group is not used, the proportion is calculated with respect to the data that contains that field and is ultimately going to be 100% in any case. For instance, The proportion of an ideal cut in the ideal cut specific data will be 1. + + \ + +- You might want to draw greater attention to the statistical transformation in your code. + +```{r 3_c, include=TRUE, fig.width=8, fig.height=4.5, message=FALSE} +# you might use stat_summary(), which summarises the y values for each unique x +# value, to draw attention to the summary that you are computing: + +ggplot(data = diamonds, mapping = aes(x = cut, y = depth)) + + stat_summary() + + +ggplot(data = diamonds, mapping = aes(x = cut, y = depth)) + + stat_summary( + fun.min = min, + fun.max = max, + fun = median + ) +``` + + +# Position adjustments + + \ + +You can colour a bar chart using either the `color` aesthetic, + +```{r diamonds_barplot_color, cache = TRUE, fig.width=8, fig.height=4.5, message=FALSE} +ggplot(data = diamonds, mapping = aes(x = cut, color = cut)) + + geom_bar() +``` + + \ + +or, more usefully, `fill`: + +```{r diamonds_barplot_fill, cache = TRUE, fig.width=8, fig.height=4.5, message=FALSE} +ggplot(data = diamonds, mapping = aes(x = cut, fill = cut)) + + geom_bar() +``` + + + +You can also use `fill` with another variable: + +```{r diamonds_barplot_fill_clarity, cache = TRUE, fig.width=8, fig.height=4.5, message=FALSE} +ggplot(data = diamonds, mapping = aes(x = cut, fill = clarity)) + + geom_bar() +``` + + + +The stacking is performed by the position adjustment `position` + +### fill + +```{r diamonds_barplot_pos_fill, cache = TRUE, fig.width=8, fig.height=4.5, message=FALSE} +ggplot(data = diamonds, mapping = aes(x = cut, fill = clarity)) + + geom_bar( position = "fill") +``` + +### dodge + +```{r diamonds_barplot_pos_dodge, cache = TRUE, fig.width=8, fig.height=4.5, message=FALSE} +ggplot(data = diamonds, mapping = aes(x = cut, fill = clarity)) + + geom_bar( position = "dodge") +``` + +### jitter + +```{r diamonds_barplot_pos_jitter, cache = TRUE, fig.width=8, fig.height=4.5, message=FALSE} +ggplot(data = diamonds, mapping = aes(x = cut, fill = clarity)) + + geom_bar( position = "jitter") +``` + + + +```{r dia_jitter2, cache = TRUE, fig.width=8, fig.height=4.5, message=FALSE} +ggplot(data = diamonds, mapping = aes(x = cut, y = depth, color = clarity)) + + geom_point() +``` + +```{r dia_jitter3, cache = TRUE, fig.width=8, fig.height=4.5, message=FALSE} +ggplot(data = diamonds, mapping = aes(x = cut, y = depth, color = clarity)) + + geom_jitter() +``` + +### violin + +```{r dia_violon, cache = TRUE, fig.width=8, fig.height=4.5, message=FALSE} +ggplot(data = diamonds, mapping = aes(x = cut, y = depth, color = clarity)) + + geom_violin() +``` + + +# Coordinate systems + +Cartesian coordinate system where the x and y positions act independently to determine the location of each point. There are a number of other coordinate systems that are occasionally helpful. + + +```{r dia_boxplot, cache = TRUE, fig.width=8, fig.height=4.5, message=FALSE} +ggplot(data = diamonds, mapping = aes(x = cut, y = depth, color = clarity)) + + geom_boxplot() +``` + + + +```{r dia_boxplot_flip, cache = TRUE, fig.width=8, fig.height=4.5, message=FALSE} +ggplot(data = diamonds, mapping = aes(x = cut, y = depth, color = clarity)) + + geom_boxplot() + + coord_flip() +``` + + + +```{r diamonds_bar, cache = TRUE, fig.width=8, fig.height=4.5, message=FALSE} +bar <- ggplot(data = diamonds) + + geom_bar( + mapping = aes(x = cut, fill = cut), + show.legend = FALSE, + width = 1 + ) + + theme(aspect.ratio = 1) + + labs(x = NULL, y = NULL) +``` +**3_d** + + + +```{r diamonds_bar_plot, echo=F, cache = TRUE, fig.width=8, fig.height=4.5, message=FALSE} +bar +``` + +**3_d** + + +```{r diamonds_bar_flip, cache = TRUE, fig.width=8, fig.height=4.5, message=FALSE} +bar + coord_flip() +``` + + + +```{r mpg_jitter_noquickmap, cache = TRUE, fig.width=8, fig.height=4.5, message=FALSE} +ggplot(data = mpg) + + geom_jitter(mapping = aes(x = cty, y = hwy)) +``` + + + +```{r mpg_jitter_quickmap, cache = TRUE, fig.width=3.5, fig.height=3.5, message=FALSE} +ggplot(data = mpg) + + geom_jitter(mapping = aes(x = cty, y = hwy)) + + coord_quickmap() +``` + + + +```{r mpg_jitter_log, cache = TRUE, fig.width=8.5, fig.height=3.5, message=FALSE} +ggplot(data = mpg) + + geom_jitter(mapping = aes(x = cty, y = hwy)) + + scale_y_log10() + + scale_x_log10() +``` + + +```{r diamonds_bar_polar, cache = TRUE, fig.width=5, fig.height=3.5, message=FALSE} +bar + coord_polar() +``` + +## Coordinate systems challenges + +- Turn a stacked bar chart into a pie chart using `coord_polar()`. +- What does `labs()` do? Read the documentation. +- What does the plot below tell you about the relationship between `city` and highway `mpg`? Why is `coord_fixed()` important? What does `geom_abline()` do? + +```{r mpg_point_fixed, eval = F, cache = TRUE, fig.width=4.5, fig.height=3.5, message=FALSE} +ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) + + geom_point() + + geom_abline() + + coord_fixed() +``` + +## Coordinate systems challenges + +```{r diamonds_barplot_pos_fill_polar, cache = TRUE, fig.width=8, fig.height=4.5, message=FALSE} +ggplot(data = diamonds) + + geom_bar(mapping = aes(x = cut, fill = clarity), + position = "fill") + + coord_polar() +``` + +## Coordinate systems challenges + +```{r mpg_point_nofixed_plot, eval = T, cache = TRUE, fig.width=8, fig.height=3.5, message=FALSE} +ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) + + geom_point() + geom_abline() +``` + +## Coordinate systems challenges + +```{r mpg_point_fixed_plot, eval = T, cache = TRUE, fig.width=8, fig.height=3.5, message=FALSE} +ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) + + geom_point() + geom_abline() + coord_fixed() +``` -- GitLab