Skip to content
Snippets Groups Projects
Verified Commit 7cbabb25 authored by Laurent Modolo's avatar Laurent Modolo
Browse files

session_2: update support

parent 68d2bd0e
No related branches found
No related tags found
No related merge requests found
...@@ -121,9 +121,9 @@ Instead of continuing to learn more about R programming, in this session we are ...@@ -121,9 +121,9 @@ Instead of continuing to learn more about R programming, in this session we are
We make this choice for three reasons: We make this choice for three reasons:
- Rendering nice plots is direclty rewarding - Rendering nice plots is directly rewarding
- You will be able to apply what you learn in this session to your own data (given that they are *correctly formated*) - You will be able to apply what you learn in this session to your own data (given that they are *correctly formated*)
- We will come back to R programming later, when you have all the necessary tools to visualize your results - We will come back to R programming later, when you have all the necessary tools to visualize your results.
The objectives of this session will be to: The objectives of this session will be to:
...@@ -135,7 +135,7 @@ The objectives of this session will be to: ...@@ -135,7 +135,7 @@ The objectives of this session will be to:
## Tidyverse ## Tidyverse
The `tidyverse` is a collection of R packages designed for data science that include `ggplot2`. The `tidyverse` package is a collection of R packages designed for data science that include `ggplot2`.
All packages share an underlying design philosophy, grammar, and data structures (plus the same shape of logo). All packages share an underlying design philosophy, grammar, and data structures (plus the same shape of logo).
...@@ -148,13 +148,13 @@ All packages share an underlying design philosophy, grammar, and data structures ...@@ -148,13 +148,13 @@ All packages share an underlying design philosophy, grammar, and data structures
install.packages("tidyverse") install.packages("tidyverse")
``` ```
Luckily for your `tidyverse` is preinstalled on your Rstudio server. So you just have to load the ` library` Luckily for you, `tidyverse` is preinstalled on your Rstudio server. So you just have to load the ` library`
```{R load_tidyverse} ```{R load_tidyverse}
library("tidyverse") library("tidyverse")
``` ```
### Toy data set `mpg` ## Toy data set `mpg`
This dataset contains a subset of the fuel economy data that the EPA makes available on [fueleconomy.gov](http://fueleconomy.gov). This dataset contains a subset of the fuel economy data that the EPA makes available on [fueleconomy.gov](http://fueleconomy.gov).
It contains only models which had a new release every year between 1999 and 2008. It contains only models which had a new release every year between 1999 and 2008.
...@@ -168,7 +168,7 @@ You can use the `?` command to know more about this dataset. ...@@ -168,7 +168,7 @@ You can use the `?` command to know more about this dataset.
But instead of using a dataset included in a R package, you may want to be able to use any dataset with the same format. But instead of using a dataset included in a R package, you may want to be able to use any dataset with the same format.
For that we are going to use the command `read_csv` which is able to read a [csv](https://en.wikipedia.org/wiki/Comma-separated_values) file. For that we are going to use the command `read_csv` which is able to read a [csv](https://en.wikipedia.org/wiki/Comma-separated_values) file.
This command also work for file URL This command also works for file URL
```{r mpg_download, cache=TRUE, message=FALSE} ```{r mpg_download, cache=TRUE, message=FALSE}
new_mpg <- read_csv( new_mpg <- read_csv(
...@@ -176,34 +176,50 @@ new_mpg <- read_csv( ...@@ -176,34 +176,50 @@ new_mpg <- read_csv(
) )
``` ```
You can check the number of line and column of the data with `dim`: You can check the number of lines and columns of the data with `dim`:
```{r mpg_inspect2, include=TRUE} ```{r mpg_inspect2, include=TRUE}
dim(new_mpg) dim(new_mpg)
``` ```
To visualize the data in Rstudio you can use the command `View` To visualize the data in Rstudio you can use the command. `View`
```R ```R
View(new_mpg) View(new_mpg)
``` ```
### New script Or by simply calling the variable.
Like for simple data type calling a variable print it.
But complex data type like `new_mpg` can use complex print function.
Like in the last session, instead of typing your commands direclty in the console, you are going to write them in an R script. ```{r mpg_inspect3, include=TRUE}
new_mpg
```
Here we can see that `new_mpg` is a `tibble` we will come back to `tibble` later.
## New script
Like in the last session, instead of typing your commands directly in the console, you are going to write them in an R script.
![](./img/formationR_session2_scriptR.png) ![](./img/formationR_session2_scriptR.png)
# First plot with `ggplot2` # First plot with `ggplot2`
We are going to make the simpliest plot possible to study the relationship between two variables: the scatterplot. We are going to make the simplest plot possible to study the relationship between two variables: the scatterplot.
The following command generate a plot between engine size `displ` and fuel efficiency `hwy`. The following command generates a plot between engine size `displ` and fuel efficiency `hwy` present the `new_mpg` `tibble`.
```{r new_mpg_plot_a, cache = TRUE, fig.width=8, fig.height=4.5} ```{r new_mpg_plot_a, cache = TRUE, fig.width=8, fig.height=4.5}
ggplot(data = mpg) + ggplot(data = new_mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) geom_point(mapping = aes(x = displ, y = hwy))
``` ```
<div class="pencadre">
Are cars with bigger engines less fuel efficient ?
</div>
`ggplot2` is a system for declaratively creating graphics, based on [The Grammar of Graphics](https://www.amazon.com/Grammar-Graphics-Statistics-Computing/dp/0387245448/ref=as_li_ss_tl). You provide the data, tell `ggplot2` how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details. `ggplot2` is a system for declaratively creating graphics, based on [The Grammar of Graphics](https://www.amazon.com/Grammar-Graphics-Statistics-Computing/dp/0387245448/ref=as_li_ss_tl). You provide the data, tell `ggplot2` how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.
``` ```
...@@ -214,9 +230,13 @@ ggplot(data = <DATA>) + ...@@ -214,9 +230,13 @@ ggplot(data = <DATA>) +
- you begin a plot with the function `ggplot()` - you begin a plot with the function `ggplot()`
- you complete your graph by adding one or more layers - you complete your graph by adding one or more layers
- `geom_point()` adds a layer with a scatterplot - `geom_point()` adds a layer with a scatterplot
- each geom function in `ggplot2` takes a `mapping` argument - each **geom **function in `ggplot2` takes a `mapping` argument
- the `mapping` argument is always paired with `aes()` - the `mapping` argument is always paired with `aes()`
<div class="pencadre">
What happend when you use only the command `ggplot(data = mpg)` ?
</div>
<div class="pencadre"> <div class="pencadre">
Make a scatterplot of `hwy` ( fuel efficiency ) vs. `cyl` ( number of cylinders ). Make a scatterplot of `hwy` ( fuel efficiency ) vs. `cyl` ( number of cylinders ).
...@@ -226,7 +246,7 @@ Make a scatterplot of `hwy` ( fuel efficiency ) vs. `cyl` ( number of cylinders ...@@ -226,7 +246,7 @@ Make a scatterplot of `hwy` ( fuel efficiency ) vs. `cyl` ( number of cylinders
<details><summary>Solution</summary> <details><summary>Solution</summary>
<p> <p>
```{r new_mpg_plot_b, cache = TRUE, fig.width=8, fig.height=4.5} ```{r new_mpg_plot_b, cache = TRUE, fig.width=8, fig.height=4.5}
ggplot(data = mpg, mapping = aes(x = hwy, y = cyl)) + ggplot(data = new_mpg, mapping = aes(x = hwy, y = cyl)) +
geom_point() geom_point()
``` ```
...@@ -249,7 +269,7 @@ Try the following aesthetic: ...@@ -249,7 +269,7 @@ Try the following aesthetic:
## `color` mapping ## `color` mapping
```{r new_mpg_plot_e, cache = TRUE, fig.width=8, fig.height=4.5} ```{r new_mpg_plot_e, cache = TRUE, fig.width=8, fig.height=4.5}
ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = class)) + ggplot(data = new_mpg, mapping = aes(x = displ, y = hwy, color = class)) +
geom_point() geom_point()
``` ```
...@@ -257,28 +277,28 @@ ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = class)) + ...@@ -257,28 +277,28 @@ ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = class)) +
## `size` mapping ## `size` mapping
```{r new_mpg_plot_f, cache = TRUE, fig.width=8, fig.height=4.5, warning=FALSE} ```{r new_mpg_plot_f, cache = TRUE, fig.width=8, fig.height=4.5, warning=FALSE}
ggplot(data = mpg, mapping = aes(x = displ, y = hwy, size = class)) + ggplot(data = new_mpg, mapping = aes(x = displ, y = hwy, size = class)) +
geom_point() geom_point()
``` ```
## `alpha` mapping ## `alpha` mapping
```{r new_mpg_plot_g, cache = TRUE, fig.width=8, fig.height=4.5, warning=FALSE} ```{r new_mpg_plot_g, cache = TRUE, fig.width=8, fig.height=4.5, warning=FALSE}
ggplot(data = mpg, mapping = aes(x = displ, y = hwy, alpha = class)) + ggplot(data = new_mpg, mapping = aes(x = displ, y = hwy, alpha = class)) +
geom_point() geom_point()
``` ```
## `shape` mapping ## `shape` mapping
```{r new_mpg_plot_h, cache = TRUE, fig.width=8, fig.height=4.5, warning=FALSE} ```{r new_mpg_plot_h, cache = TRUE, fig.width=8, fig.height=4.5, warning=FALSE}
ggplot(data = mpg, mapping = aes(x = displ, y = hwy, shape = class)) + ggplot(data = new_mpg, mapping = aes(x = displ, y = hwy, shape = class)) +
geom_point() geom_point()
``` ```
You can also set the aesthetic properties of your geom manually. For example, we can make all of the points in our plot blue and squares: You can also set the aesthetic properties of your **geom** manually. For example, we can make all of the points in our plot blue and squares:
```{r new_mpg_plot_i, cache = TRUE, fig.width=8, fig.height=4.5} ```{r new_mpg_plot_i, cache = TRUE, fig.width=8, fig.height=4.5}
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + ggplot(data = new_mpg, mapping = aes(x = displ, y = hwy)) +
geom_point(color = "blue", shape=0) geom_point(color = "blue", shape=0)
``` ```
...@@ -292,25 +312,25 @@ What’s gone wrong with this code? Why are the points not blue? ...@@ -292,25 +312,25 @@ What’s gone wrong with this code? Why are the points not blue?
</div> </div>
```{r new_mpg_plot_not_blue, cache = TRUE, fig.width=8, fig.height=4.5} ```{r new_mpg_plot_not_blue, cache = TRUE, fig.width=8, fig.height=4.5}
ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = "blue")) + ggplot(data = new_mpg, mapping = aes(x = displ, y = hwy, color = "blue")) +
geom_point() geom_point()
``` ```
<details><summary>Solution</summary> <details><summary>Solution</summary>
<p> <p>
```{r new_mpg_plot_blue, cache = TRUE, fig.width=8, fig.height=4.5} ```{r new_mpg_plot_blue, cache = TRUE, fig.width=8, fig.height=4.5}
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + ggplot(data = new_mpg, mapping = aes(x = displ, y = hwy)) +
geom_point(color = "blue") geom_point(color = "blue")
``` ```
</p> </p>
</details> </details>
## mapping a **continuous** variable to a color. ## Mapping a **continuous** variable to a color.
You can also map continuous variable to a color You can also map continuous variable to a color
```{r continu, cache = TRUE, fig.width=8, fig.height=4.5} ```{r continu, cache = TRUE, fig.width=8, fig.height=4.5}
ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = cyl)) + ggplot(data = new_mpg, mapping = aes(x = displ, y = hwy, color = cyl)) +
geom_point() geom_point()
``` ```
...@@ -321,7 +341,7 @@ What happens if you map an aesthetic to something other than a variable name, li ...@@ -321,7 +341,7 @@ What happens if you map an aesthetic to something other than a variable name, li
<details><summary>Solution</summary> <details><summary>Solution</summary>
<p> <p>
```{r condiColor, cache = TRUE, fig.width=8, fig.height=4.5} ```{r condiColor, cache = TRUE, fig.width=8, fig.height=4.5}
ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = displ < 5)) + ggplot(data = new_mpg, mapping = aes(x = displ, y = hwy, color = displ < 5)) +
geom_point() geom_point()
``` ```
</p> </p>
...@@ -329,14 +349,14 @@ ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = displ < 5)) + ...@@ -329,14 +349,14 @@ ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = displ < 5)) +
# Facets # Facets
You can create multiple plot at once by faceting. For this you can use the command `facet_wrap`. You can create multiple plots at once by faceting. For this you can use the command `facet_wrap`.
This command take a formula as input. This command takes a formula as input.
We will come back to formulas in R later, for now, your have to know that formulas start with a `~` symbol. We will come back to formulas in R later, for now, you have to know that formulas start with a `~` symbol.
To make a scatterplot of `displ` versus `hwy` per car `class` you can use the following code: To make a scatterplot of `displ` versus `hwy` per car `class` you can use the following code:
```{r new_mpg_plot_k, cache = TRUE, fig.width=8, fig.height=4.5} ```{r new_mpg_plot_k, cache = TRUE, fig.width=8, fig.height=4.5}
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + ggplot(data = new_mpg, mapping = aes(x = displ, y = hwy)) +
geom_point() + geom_point() +
facet_wrap(~class, nrow = 2) facet_wrap(~class, nrow = 2)
``` ```
...@@ -348,7 +368,7 @@ Now try to facet your plot by `fl + class` ...@@ -348,7 +368,7 @@ Now try to facet your plot by `fl + class`
<details><summary>Solution</summary> <details><summary>Solution</summary>
<p> <p>
Formulas allow your to express complex relationship between variables in R ! Formulas allow you to express complex relationship between variables in R !
```{r new_mpg_plot_l, cache = TRUE, fig.width=8, fig.height=4.5} ```{r new_mpg_plot_l, cache = TRUE, fig.width=8, fig.height=4.5}
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
...@@ -363,14 +383,14 @@ ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + ...@@ -363,14 +383,14 @@ ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
There are different ways to represent the information : There are different ways to represent the information :
```{r new_mpg_plot_o, cache = TRUE, fig.width=8, fig.height=4.5} ```{r new_mpg_plot_o, cache = TRUE, fig.width=8, fig.height=4.5}
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + ggplot(data = new_mpg, mapping = aes(x = displ, y = hwy)) +
geom_point() geom_point()
``` ```
\ \
```{r new_mpg_plot_p, cache = TRUE, fig.width=8, fig.height=4.5, message=FALSE} ```{r new_mpg_plot_p, cache = TRUE, fig.width=8, fig.height=4.5, message=FALSE}
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + ggplot(data = new_mpg, mapping = aes(x = displ, y = hwy)) +
geom_smooth() geom_smooth()
``` ```
...@@ -379,7 +399,7 @@ ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + ...@@ -379,7 +399,7 @@ ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
We can add as many layers as we want We can add as many layers as we want
```{r new_mpg_plot_q, cache = TRUE, fig.width=8, fig.height=4.5, message=FALSE} ```{r new_mpg_plot_q, cache = TRUE, fig.width=8, fig.height=4.5, message=FALSE}
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + ggplot(data = new_mpg, mapping = aes(x = displ, y = hwy)) +
geom_point() + geom_point() +
geom_smooth() geom_smooth()
``` ```
...@@ -389,28 +409,29 @@ ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + ...@@ -389,28 +409,29 @@ ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
We can make `mapping` layer specific We can make `mapping` layer specific
```{r new_mpg_plot_s, cache = TRUE, fig.width=8, fig.height=4.5, message=FALSE} ```{r new_mpg_plot_s, cache = TRUE, fig.width=8, fig.height=4.5, message=FALSE}
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + ggplot(data = new_mpg, mapping = aes(x = displ, y = hwy)) +
geom_point(mapping = aes(color = class)) + geom_point(mapping = aes(color = class)) +
geom_smooth() geom_smooth()
``` ```
\ \
We can use different `data` for different layer (You will lean more on `filter()` later) We can use different `data` for different layers (you will lean more on `filter()` later)
```{r new_mpg_plot_t, cache = TRUE, fig.width=8, fig.height=4.5, message=FALSE} ```{r new_mpg_plot_t, cache = TRUE, fig.width=8, fig.height=4.5, message=FALSE}
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + ggplot(data = new_mpg, mapping = aes(x = displ, y = hwy)) +
geom_point(mapping = aes(color = class)) + geom_point(mapping = aes(color = class)) +
geom_smooth(data = filter(mpg, class == "subcompact")) geom_smooth(data = filter(mpg, class == "subcompact"))
``` ```
# Challenge ! # Challenge !
## First challenge
<div class="pencadre"> <div class="pencadre">
Run this code in your head and predict what the output will look like. Then, run the code in R and check your predictions. Run this code in your head and predict what the output will look like. Then, run the code in R and check your predictions.
</div> </div>
```R ```R
ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv)) + ggplot(data = new_mpg, mapping = aes(x = displ, y = hwy, color = drv)) +
geom_point(show.legend = FALSE) + geom_point(show.legend = FALSE) +
geom_smooth(se = FALSE) geom_smooth(se = FALSE)
``` ```
...@@ -420,6 +441,43 @@ ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv)) + ...@@ -420,6 +441,43 @@ ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv)) +
- What does the `se` argument to `geom_smooth()` do? - What does the `se` argument to `geom_smooth()` do?
</div> </div>
## Second challenge
<div class="pencadre">
How being a `2seater` car impact the engine size versus fuel efficiency relationship ?
Make a plot *colorizing* this information
</div>
<details><summary>Solution</summary>
<p>
```{r new_mpg_plot_color_2seater, cache = TRUE, fig.width=8, fig.height=4.5}
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point() +
geom_point(data = filter(mpg, class == "2seater"), color = "red")
```
</p>
</details>
<div class="pencadre">
Write a `function` called `plot_color_2seater` that can take as sol argument the variable `mpg` and plot the same graph.
</div>
<details><summary>Solution</summary>
<p>
```{r new_mpg_plot_color_2seater_fx, cache = TRUE, fig.width=8, fig.height=4.5}
plot_color_2seater <- function(mpg) {
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point() +
geom_point(data = filter(mpg, class == "2seater"), color = "red")
}
plot_color_2seater(mpg)
```
</p>
</details>
## Third challenge ## Third challenge
<div class="pencadre"> <div class="pencadre">
...@@ -432,10 +490,12 @@ ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv)) + ...@@ -432,10 +490,12 @@ ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv)) +
geom_smooth(mapping = aes(linetype = drv)) geom_smooth(mapping = aes(linetype = drv))
``` ```
## Third challenge <details><summary>Solution</summary>
<p>
```{r new_mpg_plot_v, cache = TRUE, fig.width=8, fig.height=4.5, message=FALSE} ```{r new_mpg_plot_v, eval=F}
ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv)) + ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv)) +
geom_point() + geom_point() +
geom_smooth(mapping = aes(linetype = drv)) geom_smooth(mapping = aes(linetype = drv))
``` ```
\ No newline at end of file </p>
</details>
\ No newline at end of file
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment