Skip to content
Snippets Groups Projects
Commit 9fa1c314 authored by hpolvech's avatar hpolvech
Browse files

s2: start support HTML

parent 4ef8e24a
No related branches found
No related tags found
No related merge requests found
---
title: "R#2: introduction to Tidyverse"
author: "Laurent Modolo [laurent.modolo@ens-lyon.fr](mailto:laurent.modolo@ens-lyon.fr), Hélène Polvèche [hpolveche@istem.fr](mailto:hpolveche@istem.fr)"
date: "Mars 2020"
output:
html_document: default
pdf_document: default
---
<style type="text/css">
h3 { /* Header 3 */
position: relative ;
color: #729FCF ;
left: 5%;
}
h2 { /* Header 2 */
color: darkblue ;
left: 10%;
}
h1 { /* Header 1 */
color: #034b6f ;
}
#pencadre{
border:1px;
border-style:solid;
border-color: #034b6f;
background-color: #EEF3F9;
padding: 1em;
text-align: center ;
border-radius : 5px 4px 3px 2px;
}
legend{
color: #034b6f ;
}
#pquestion {
color: darkgreen;
font-weight: bold;
}
}
</style>
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(tidyverse)
# tmp <- tempfile(fileext = ".zip")
# download.file("http://www.fueleconomy.gov/feg/epadata/vehicles.csv.zip",
# tmp,
# quiet = TRUE)
# unzip(tmp, exdir = "data-raw")
# new_class_level <- c(
# "Compact Cars",
# "Large Cars",
# "Midsize Cars",
# "Midsize Cars",
# "Midsize Cars",
# "Compact Cars",
# "Minivan",
# "Minivan",
# "Pickup Trucks",
# "Pickup Trucks",
# "Pickup Trucks",
# "Sport Utility Vehicle",
# "Sport Utility Vehicle",
# "Compact Cars",
# "Special Purpose Vehicle",
# "Special Purpose Vehicle",
# "Special Purpose Vehicle",
# "Special Purpose Vehicle",
# "Special Purpose Vehicle",
# "Special Purpose Vehicle",
# "Sport Utility Vehicle",
# "Sport Utility Vehicle",
# "Pickup Trucks",
# "Pickup Trucks",
# "Pickup Trucks",
# "Pickup Trucks",
# "Sport Utility Vehicle",
# "Sport Utility Vehicle",
# "Compact Cars",
# "Two Seaters",
# "Vans",
# "Vans",
# "Vans",
# "Vans"
# )
# new_fuel_level <- c(
# "gas",
# "Diesel",
# "Regular",
# "gas",
# "gas",
# "Regular",
# "Regular",
# "Hybrid",
# "Hybrid",
# "Regular",
# "Regular",
# "Hybrid",
# "Hybrid"
# )
# read_csv("data-raw/vehicles.csv") %>%
# select(
# "id",
# "make",
# "model",
# "year",
# "VClass",
# "trany",
# "drive",
# "cylinders",
# "displ",
# "fuelType",
# "highway08",
# "city08"
# ) %>%
# rename(
# "class" = "VClass",
# "trans" = "trany",
# "drive" = "drive",
# "cyl" = "cylinders",
# "displ" = "displ",
# "fuel" = "fuelType",
# "hwy" = "highway08",
# "cty" = "city08"
# ) %>%
# filter(drive != "") %>%
# drop_na() %>%
# arrange(make, model, year) %>%
# mutate(class = factor(as.factor(class), labels = new_class_level)) %>%
# mutate(fuel = factor(as.factor(fuel), labels = new_fuel_level)) %>%
# write_csv("2_data.csv")
```
The goal of this practical is to familiarize yourself with `ggplot2`.
The objectives of this session will be to:
- Create basic plot with `ggplot2`
- Understand the `tibble` type
- Learn the different aesthetics in R plots
- Compose graphics
<div id='pencadre'>
**Write the commands in the grey box in the terminal.**
**The expected results will always be printed in a white box here.**
**You can `copy-paste` but I advise you to practice writing directly in the terminal. To validate the line at the end of your command: press `Return`.**
</div>
## Tidyverse
The tidyverse is a collection of R packages designed for data science.
All packages share an underlying design philosophy, grammar, and data structures.
<center>
![](./img/tidyverse.jpg){width=500px}
</center>
\
```R
install.packages("tidyverse")
```
```R
library("tidyverse")
```
### Toy data set `mpg`
This dataset contains a subset of the fuel economy data that the EPA makes available on http://fueleconomy.gov . It contains only models which had a new release every year between 1999 and 2008.
```{r mpg_inspect, include=TRUE}
?mpg
mpg
```
```{r mpg_inspect2, include=TRUE}
dim(mpg)
```
```R
View(mpg)
```
### New script
![](./img/formationR_session2_scriptR.png)
\
### Updated version of the data
`mpg` is loaded with tidyverse, we want to be able to read our own data from
\
http://perso.ens-lyon.fr/laurent.modolo/R/2_data.csv
```{r mpg_download, cache=TRUE, message=FALSE}
new_mpg <- read_csv(
"http://perso.ens-lyon.fr/laurent.modolo/R/2_data.csv"
)
```
\
\
# First plot with `ggplot2`
Relationship between engine size `displ` and fuel efficiency `hwy`.
```{r new_mpg_plot_a, cache = TRUE, fig.width=8, fig.height=4.5}
ggplot(data = new_mpg) +
geom_point(mapping = aes(x = displ, y = hwy))
```
### Composition of plot with `ggplot2`
```
ggplot(data = <DATA>) +
<GEOM_FUNCTION>(mapping = aes(<MAPPINGS>))
```
- you begin a plot with the function `ggplot()`
- you complete your graph by adding one or more layers
- `geom_point()` adds a layer with a scatterplot
- each geom function in `ggplot2` takes a `mapping` argument
- the `mapping` argument is always paired with `aes()`
\
# First challenge!
- Run `ggplot(data = new_mpg)`. What do you see?
- How many rows are in `new_mpg`? How many columns?
- What does the `cty` variable describe? Read the help for `?mpg` to find out.
- Make a scatterplot of `hwy` vs. `cyl`.
- What happens if you make a scatterplot of `class` vs. `drive`? Why is the plot not useful?
\
\
\
\
\
\
\
\
<div id="pquestion">- `ggplot(data = mpg)`. What do you see? </div>
```{r empty_plot, cache = TRUE, fig.width=8, fig.height=4.5}
ggplot(data = new_mpg)
```
\
<div id="pquestion">- How many rows are in `new_mpg`? How many columns? </div>
```{r size_of_mpg, cache = TRUE, fig.width=8, fig.height=4.5}
new_mpg
```
\
<div id="pquestion">- Make a scatterplot of `hwy` vs. `cyl`. </div>
```{r new_mpg_plot_b, cache = TRUE, fig.width=8, fig.height=4.5}
ggplot(data = new_mpg) +
geom_point(mapping = aes(x = hwy, y = cyl))
```
\
<div id="pquestion">- What happens if you make a scatterplot of `class` vs. `drive`? </div>
<div id="pquestion">- Why is the plot not useful? </div>
```{r new_mpg_plot_c, cache = TRUE, fig.width=8, fig.height=4.5}
ggplot(data = new_mpg) +
geom_point(mapping = aes(x = class, y = drive))
```
\
### Aesthetic mappings
<div id="pquestion">- How can you explain these cars?</div>
```{r new_mpg_plot_d, echo = FALSE, cache = TRUE, fig.width=8, fig.height=4.5}
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
geom_point(data = mpg %>% filter(class == "2seater"),
mapping = aes(x = displ, y = hwy), color = "red")
```
```{r new_mpg_plot_e, cache = TRUE, fig.width=8, fig.height=4.5}
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, color = class))
```
### Aesthetic mappings : `color`
`ggplot2` will automatically assign a unique level of the aesthetic (here a unique color) to each unique value of the variable, a process known as scaling. `ggplot2` will also add a legend that explains which levels correspond to which values.
Try the following aesthetic:
- `size`
- `alpha`
- `shape`
### Aesthetic mappings : `size`
```{r new_mpg_plot_f, cache = TRUE, fig.width=8, fig.height=4.5, warning=FALSE}
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, size = class))
```
### Aesthetic mapping : `alpha`
```{r new_mpg_plot_g, cache = TRUE, fig.width=8, fig.height=4.5, warning=FALSE}
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, alpha = class))
```
### Aesthetic mapping : `shape`
```{r new_mpg_plot_h, cache = TRUE, fig.width=8, fig.height=4.5, warning=FALSE}
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, shape = class))
```
### Aesthetic
You can also set the aesthetic properties of your geom manually. For example, we can make all of the points in our plot blue:
```{r new_mpg_plot_i, cache = TRUE, fig.width=8, fig.height=4.5}
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy), color = "blue")
```
## Second challenge!
- What’s gone wrong with this code? Why are the points not blue?
```R
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, color = "blue"))
```
- Which variables in `mpg` are **categorical**? Which variables are **continuous**? (Hint: type `mpg`)
- Map a **continuous** variable to color, size, and shape.
- What does the `stroke` aesthetic do? What shapes does it work with? (Hint: use ?geom_point)
- What happens if you map an aesthetic to something other than a variable name, like `color = displ < 5`?
## Facets
```{r new_mpg_plot_j, cache = TRUE, fig.width=8, fig.height=4.5}
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_wrap(~class)
```
## Facets
```{r new_mpg_plot_k, cache = TRUE, fig.width=8, fig.height=4.5}
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_wrap(~class, nrow = 2)
```
## Facets
```{r new_mpg_plot_l, cache = TRUE, fig.width=8, fig.height=4.5}
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_wrap(~ fl + class, nrow = 2)
```
## Composition
There are different ways to represent the information
```{r new_mpg_plot_o, cache = TRUE, fig.width=8, fig.height=4.5}
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy))
```
## Composition
There are different ways to represent the information
```{r new_mpg_plot_p, cache = TRUE, fig.width=8, fig.height=4.5, message=FALSE}
ggplot(data = mpg) +
geom_smooth(mapping = aes(x = displ, y = hwy))
```
## Composition
We can add as many layers as we want
```{r new_mpg_plot_q, cache = TRUE, fig.width=8, fig.height=4.5, message=FALSE}
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
geom_smooth(mapping = aes(x = displ, y = hwy))
```
## Composition
We can avoid code duplication
```{r new_mpg_plot_r, cache = TRUE, fig.width=8, fig.height=4.5, message=FALSE}
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point() +
geom_smooth()
```
## Composition
We can make `mapping` layer specific
```{r new_mpg_plot_s, cache = TRUE, fig.width=8, fig.height=4.5, message=FALSE}
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point(mapping = aes(color = class)) +
geom_smooth()
```
## Composition
We can use different `data` for different layer (You will lean more on `filter()` later)
```{r new_mpg_plot_t, cache = TRUE, fig.width=8, fig.height=4.5, message=FALSE}
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point(mapping = aes(color = class)) +
geom_smooth(data = filter(mpg, class == "subcompact"))
```
## Fird challenge
- Run this code in your head and predict what the output will look like. Then, run the code in R and check your predictions.
```R
ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv)) +
geom_point() +
geom_smooth(se = FALSE)
```
**http://perso.ens-lyon.fr/laurent.modolo/R/2_d**
- What does `show.legend = FALSE` do?
- What does the `se` argument to `geom_smooth()` do?
## Third challenge
- Recreate the R code necessary to generate the following graph
```{r new_mpg_plot_u, echo = FALSE, cache = TRUE, fig.width=8, fig.height=4.5, message=FALSE}
ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv)) +
geom_point() +
geom_smooth(mapping = aes(linetype = drv))
```
## Third challenge
```{r new_mpg_plot_v, cache = TRUE, fig.width=8, fig.height=4.5, message=FALSE}
ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv)) +
geom_point() +
geom_smooth(mapping = aes(linetype = drv))
```
\ No newline at end of file
session_2/img/formationR_session2_scriptR.png

198 KiB

session_2/img/tidyverse.jpg

69.5 KiB

0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment