s2: start support HTML

9fa1c314 · hpolvech · 4ef8e24a · 9fa1c314 · 9fa1c314 · 9fa1c314
Commit 9fa1c314 authored Mar 24, 2020 by hpolvech
--- a/session_2/HTML_tuto_s2.Rmd
+++ b/session_2/HTML_tuto_s2.Rmd
+---
+title: "R#2: introduction to Tidyverse"
+author: "Laurent Modolo [laurent.modolo@ens-lyon.fr](mailto:laurent.modolo@ens-lyon.fr), Hélène Polvèche [hpolveche@istem.fr](mailto:hpolveche@istem.fr)"
+date: "Mars 2020"
+output:
+  html_document: default
+  pdf_document: default
+---
+<style type="text/css">
+h3 { /* Header 3 */
+  position: relative ;
+  color: #729FCF ;
+  left: 5%;
+}
+h2 { /* Header 2 */
+  color: darkblue ;
+  left: 10%;
+} 
+h1 { /* Header 1 */
+  color: #034b6f ;
+} 
+#pencadre{
+  border:1px; 
+  border-style:solid; 
+  border-color: #034b6f; 
+  background-color: #EEF3F9; 
+  padding: 1em;
+  text-align: center ;
+  border-radius : 5px 4px 3px 2px;
+}
+legend{
+  color: #034b6f ;
+}
+#pquestion {
+  color: darkgreen;
+  font-weight: bold;
+  
+}
+}
+</style>
+```{r setup, include=FALSE}
+knitr::opts_chunk$set(echo = TRUE)
+
+library(tidyverse)
+# tmp <- tempfile(fileext = ".zip")
+# download.file("http://www.fueleconomy.gov/feg/epadata/vehicles.csv.zip",
+#               tmp,
+#               quiet = TRUE)
+# unzip(tmp, exdir = "data-raw")
+# new_class_level <- c(
+#   "Compact Cars",
+#   "Large Cars",
+#   "Midsize Cars",
+#   "Midsize Cars",
+#   "Midsize Cars",
+#   "Compact Cars",
+#   "Minivan",
+#   "Minivan",
+#   "Pickup Trucks",
+#   "Pickup Trucks",
+#   "Pickup Trucks",
+#   "Sport Utility Vehicle",
+#   "Sport Utility Vehicle",
+#   "Compact Cars",
+#   "Special Purpose Vehicle",
+#   "Special Purpose Vehicle",
+#   "Special Purpose Vehicle",
+#   "Special Purpose Vehicle",
+#   "Special Purpose Vehicle",
+#   "Special Purpose Vehicle",
+#   "Sport Utility Vehicle",
+#   "Sport Utility Vehicle",
+#   "Pickup Trucks",
+#   "Pickup Trucks",
+#   "Pickup Trucks",
+#   "Pickup Trucks",
+#   "Sport Utility Vehicle",
+#   "Sport Utility Vehicle",
+#   "Compact Cars",
+#   "Two Seaters",
+#   "Vans",
+#   "Vans",
+#   "Vans",
+#   "Vans"
+# )
+# new_fuel_level <- c(
+#   "gas",
+#   "Diesel",
+#   "Regular",
+#   "gas",
+#   "gas",
+#   "Regular",
+#   "Regular",
+#   "Hybrid",
+#   "Hybrid",
+#   "Regular",
+#   "Regular",
+#   "Hybrid",
+#   "Hybrid"
+# )
+# read_csv("data-raw/vehicles.csv") %>%
+#   select(
+#     "id",
+#     "make",
+#     "model",
+#     "year",
+#     "VClass",
+#     "trany",
+#     "drive",
+#     "cylinders",
+#     "displ",
+#     "fuelType",
+#     "highway08",
+#     "city08"
+#   ) %>% 
+#   rename(
+#     "class" = "VClass",
+#     "trans" = "trany",
+#     "drive" = "drive",
+#     "cyl" = "cylinders",
+#     "displ" = "displ",
+#     "fuel" = "fuelType",
+#     "hwy" = "highway08",
+#     "cty" = "city08"
+#   ) %>%
+#   filter(drive != "") %>%
+#   drop_na() %>% 
+#   arrange(make, model, year) %>%
+#   mutate(class = factor(as.factor(class), labels = new_class_level)) %>%
+#   mutate(fuel = factor(as.factor(fuel), labels = new_fuel_level)) %>%
+#   write_csv("2_data.csv")
+
+```
+
+
+The goal of this practical is to familiarize yourself with `ggplot2`.
+
+The objectives of this session will be to:
+
+- Create basic plot with `ggplot2`
+- Understand the `tibble` type
+- Learn the different aesthetics in R plots
+- Compose graphics
+
+
+<div id='pencadre'>
+
+**Write the commands in the grey box in the terminal.**
+
+**The expected results will always be printed in a white box here.**
+
+**You can `copy-paste` but I advise you to practice writing directly in the terminal. To validate the line at the end of your command: press `Return`.**
+</div> 
+
+
+## Tidyverse
+
+The tidyverse is a collection of R packages designed for data science.
+
+All packages share an underlying design philosophy, grammar, and data structures.
+
+<center>
+![](./img/tidyverse.jpg){width=500px}
+</center>
+
+ \ 
+```R
+install.packages("tidyverse")
+```
+
+```R
+library("tidyverse")
+```
+
+
+### Toy data set `mpg`
+
+This dataset contains a subset of the fuel economy data that the EPA makes available on http://fueleconomy.gov . It contains only models which had a new release every year between 1999 and 2008.
+
+
+```{r mpg_inspect, include=TRUE}
+?mpg
+mpg
+```
+
+```{r mpg_inspect2, include=TRUE}
+dim(mpg)
+```
+
+```R
+View(mpg)
+```
+### New script
+
+![](./img/formationR_session2_scriptR.png)
+
+ \ 
+
+### Updated version of the data
+
+`mpg` is loaded with tidyverse, we want to be able to read our own data from
+
+ \ 
+http://perso.ens-lyon.fr/laurent.modolo/R/2_data.csv
+
+```{r mpg_download, cache=TRUE, message=FALSE}
+new_mpg <- read_csv(
+  "http://perso.ens-lyon.fr/laurent.modolo/R/2_data.csv"
+  )
+
+```
+
+ \ 
+ 
+ \ 
+
+# First plot with `ggplot2`
+
+Relationship between engine size `displ` and fuel efficiency `hwy`.
+```{r new_mpg_plot_a, cache = TRUE, fig.width=8, fig.height=4.5}
+ggplot(data = new_mpg) + 
+  geom_point(mapping = aes(x = displ, y = hwy))
+```
+
+### Composition of plot with `ggplot2`
+
+
+```
+ggplot(data = <DATA>) + 
+  <GEOM_FUNCTION>(mapping = aes(<MAPPINGS>))
+```
+
+- you begin a plot with the function `ggplot()`
+- you complete your graph by adding one or more layers
+- `geom_point()` adds a layer with a scatterplot
+- each geom function in `ggplot2` takes a `mapping` argument
+- the `mapping` argument is always paired with `aes()`
+
+ \ 
+ 
+# First challenge!
+
+- Run `ggplot(data = new_mpg)`. What do you see?
+- How many rows are in `new_mpg`? How many columns?
+- What does the `cty` variable describe? Read the help for `?mpg` to find out.
+- Make a scatterplot of `hwy` vs. `cyl`.
+- What happens if you make a scatterplot of `class` vs. `drive`? Why is the plot not useful?
+
+ \ 
+ 
+ \ 
+ 
+ \ 
+ 
+ \ 
+ 
+ \ 
+ 
+ \ 
+ 
+ \ 
+ 
+ \ 
+ 
+ 
+<div id="pquestion">- `ggplot(data = mpg)`. What do you see? </div>
+
+```{r empty_plot, cache = TRUE, fig.width=8, fig.height=4.5}
+ggplot(data = new_mpg)
+```
+
+ \ 
+ 
+<div id="pquestion">- How many rows are in `new_mpg`? How many columns? </div>
+
+```{r size_of_mpg, cache = TRUE, fig.width=8, fig.height=4.5}
+new_mpg
+```
+
+ \ 
+ 
+<div id="pquestion">- Make a scatterplot of `hwy` vs. `cyl`. </div>
+
+```{r new_mpg_plot_b, cache = TRUE, fig.width=8, fig.height=4.5}
+ggplot(data = new_mpg) + 
+  geom_point(mapping = aes(x = hwy, y = cyl))
+```
+
+ \ 
+ 
+<div id="pquestion">- What happens if you make a scatterplot of `class` vs. `drive`? </div>
+<div id="pquestion">- Why is the plot not useful? </div>
+
+```{r new_mpg_plot_c, cache = TRUE, fig.width=8, fig.height=4.5}
+ggplot(data = new_mpg) + 
+  geom_point(mapping = aes(x = class, y = drive))
+```
+
+ \ 
+ 
+### Aesthetic mappings
+
+<div id="pquestion">- How can you explain these cars?</div>
+
+```{r new_mpg_plot_d, echo = FALSE, cache = TRUE, fig.width=8, fig.height=4.5}
+ggplot(data = mpg) + 
+  geom_point(mapping = aes(x = displ, y = hwy)) +
+  geom_point(data = mpg %>% filter(class == "2seater"),
+             mapping = aes(x = displ, y = hwy), color = "red")
+```
+
+
+
+```{r new_mpg_plot_e, cache = TRUE, fig.width=8, fig.height=4.5}
+ggplot(data = mpg) + 
+  geom_point(mapping = aes(x = displ, y = hwy, color = class))
+```
+
+### Aesthetic mappings : `color`
+
+`ggplot2` will automatically assign a unique level of the aesthetic (here a unique color) to each unique value of the variable, a process known as scaling. `ggplot2` will also add a legend that explains which levels correspond to which values.
+
+Try the following aesthetic:
+
+- `size`
+- `alpha`
+- `shape`
+
+### Aesthetic mappings : `size`
+
+```{r new_mpg_plot_f, cache = TRUE, fig.width=8, fig.height=4.5, warning=FALSE}
+ggplot(data = mpg) + 
+  geom_point(mapping = aes(x = displ, y = hwy, size = class))
+```
+
+###  Aesthetic mapping : `alpha`
+
+```{r new_mpg_plot_g, cache = TRUE, fig.width=8, fig.height=4.5, warning=FALSE}
+ggplot(data = mpg) + 
+  geom_point(mapping = aes(x = displ, y = hwy, alpha = class))
+```
+
+###  Aesthetic mapping : `shape`
+
+```{r new_mpg_plot_h, cache = TRUE, fig.width=8, fig.height=4.5, warning=FALSE}
+ggplot(data = mpg) + 
+  geom_point(mapping = aes(x = displ, y = hwy, shape = class))
+```
+
+###  Aesthetic
+
+You can also set the aesthetic properties of your geom manually. For example, we can make all of the points in our plot blue:
+
+```{r new_mpg_plot_i, cache = TRUE, fig.width=8, fig.height=4.5}
+ggplot(data = mpg) + 
+  geom_point(mapping = aes(x = displ, y = hwy), color = "blue")
+```
+
+## Second challenge!
+
+- What’s gone wrong with this code? Why are the points not blue?
+
+```R
+ggplot(data = mpg) + 
+  geom_point(mapping = aes(x = displ, y = hwy, color = "blue"))
+```
+
+- Which variables in `mpg` are **categorical**? Which variables are **continuous**? (Hint: type `mpg`)
+- Map a **continuous** variable to color, size, and shape.
+- What does the `stroke` aesthetic do? What shapes does it work with? (Hint: use ?geom_point)
+- What happens if you map an aesthetic to something other than a variable name, like `color = displ < 5`?
+
+## Facets
+
+```{r new_mpg_plot_j, cache = TRUE, fig.width=8, fig.height=4.5}
+ggplot(data = mpg) + 
+  geom_point(mapping = aes(x = displ, y = hwy)) + 
+  facet_wrap(~class)
+```
+
+## Facets
+
+```{r new_mpg_plot_k, cache = TRUE, fig.width=8, fig.height=4.5}
+ggplot(data = mpg) + 
+  geom_point(mapping = aes(x = displ, y = hwy)) + 
+  facet_wrap(~class, nrow = 2)
+```
+
+## Facets
+
+```{r new_mpg_plot_l, cache = TRUE, fig.width=8, fig.height=4.5}
+ggplot(data = mpg) + 
+  geom_point(mapping = aes(x = displ, y = hwy)) + 
+  facet_wrap(~ fl + class, nrow = 2)
+```
+
+## Composition
+
+There are different ways to represent the information
+
+```{r new_mpg_plot_o, cache = TRUE, fig.width=8, fig.height=4.5}
+ggplot(data = mpg) + 
+  geom_point(mapping = aes(x = displ, y = hwy))
+```
+
+## Composition
+
+There are different ways to represent the information
+
+```{r new_mpg_plot_p, cache = TRUE, fig.width=8, fig.height=4.5, message=FALSE}
+ggplot(data = mpg) + 
+  geom_smooth(mapping = aes(x = displ, y = hwy))
+```
+
+
+## Composition
+
+We can add as many layers as we want
+
+```{r new_mpg_plot_q, cache = TRUE, fig.width=8, fig.height=4.5, message=FALSE}
+ggplot(data = mpg) + 
+  geom_point(mapping = aes(x = displ, y = hwy)) +
+  geom_smooth(mapping = aes(x = displ, y = hwy))
+```
+
+
+## Composition
+
+We can avoid code duplication
+
+```{r new_mpg_plot_r, cache = TRUE, fig.width=8, fig.height=4.5, message=FALSE}
+ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + 
+  geom_point() +
+  geom_smooth()
+```
+
+
+## Composition
+
+We can make `mapping` layer specific
+
+```{r new_mpg_plot_s, cache = TRUE, fig.width=8, fig.height=4.5, message=FALSE}
+ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + 
+  geom_point(mapping = aes(color = class)) +
+  geom_smooth()
+```
+
+## Composition
+
+We can use different `data` for different layer (You will lean more on `filter()` later)
+
+```{r new_mpg_plot_t, cache = TRUE, fig.width=8, fig.height=4.5, message=FALSE}
+ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + 
+  geom_point(mapping = aes(color = class)) +
+  geom_smooth(data = filter(mpg, class == "subcompact"))
+```
+
+## Fird challenge
+
+- Run this code in your head and predict what the output will look like. Then, run the code in R and check your predictions.
+```R
+ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv)) + 
+  geom_point() + 
+  geom_smooth(se = FALSE)
+```
+**http://perso.ens-lyon.fr/laurent.modolo/R/2_d**
+
+- What does `show.legend = FALSE` do?
+- What does the `se` argument to `geom_smooth()` do?
+
+## Third challenge
+
+- Recreate the R code necessary to generate the following graph
+
+```{r new_mpg_plot_u, echo = FALSE, cache = TRUE, fig.width=8, fig.height=4.5, message=FALSE}
+ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv)) + 
+  geom_point() +
+  geom_smooth(mapping = aes(linetype = drv))
+```
+
+## Third challenge
+
+```{r new_mpg_plot_v, cache = TRUE, fig.width=8, fig.height=4.5, message=FALSE}
+ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv)) + 
+  geom_point() +
+  geom_smooth(mapping = aes(linetype = drv))
+```
\ No newline at end of file
--- a/session_2/img/formationR_session2_scriptR.png
+++ b/session_2/img/formationR_session2_scriptR.png
--- a/session_2/img/tidyverse.jpg
+++ b/session_2/img/tidyverse.jpg