diff --git a/session_1/HTML_tuto.Rmd b/session_1/HTML_tuto.Rmd
new file mode 100644
index 0000000000000000000000000000000000000000..121eebe15a4dd233715aa3c785804fd9879e812c
--- /dev/null
+++ b/session_1/HTML_tuto.Rmd
@@ -0,0 +1,659 @@
+---
+title: 'R#1: Introduction to R and RStudio'
+author: "Laurent Modolo [laurent.modolo@ens-lyon.fr](mailto:laurent.modolo@ens-lyon.fr), Hélène Polvèche [hpolveche@istem.fr](mailto:hpolveche@istem.fr)"
+date: "Mars 2020"
+output:
+  html_document: default
+  pdf_document: default
+---
+<style type="text/css">
+h3 { /* Header 3 */
+  position: relative ;
+  color: #729FCF ;
+  left: 5%;
+}
+h2 { /* Header 2 */
+  color: darkblue ;
+  left: 10%;
+} 
+h1 { /* Header 1 */
+  color: #034b6f ;
+} 
+#pencadre{
+  border:1px; 
+  border-style:solid; 
+  border-color: #034b6f; 
+  background-color: #EEF3F9; 
+  padding: 1em;
+  text-align: center ;
+  border-radius : 5px 4px 3px 2px;
+}
+legend{
+  color: #034b6f ;
+}
+#pquestion {
+  color: darkgreen;
+  font-weight: bold;
+  
+}
+}
+</style>
+
+
+```{r setup, include=FALSE}
+knitr::opts_chunk$set(echo = TRUE)
+```
+
+
+The goal of this practical is to familiarize yourself with R and the RStudio
+environment.
+
+The objectives of this session will be to:
+
+- Understand the purpose of each pane in RStudio
+- Do basic computation with R
+- Define variables and assign data to variables
+- Manage a workspace in R
+- Call functions
+- Manage packages
+- Be ready to write graphics !
+
+<center>
+![](./img/intro_img.png){width=400px}
+</center>
+
+### Acknowledgments
+
+<div id='pencadre'>
+
+  ![](./img/software_carpentry_logo.svg){width=300px}
+  
+  
+  https://software-carpentry.org/
+</div> 
+
+ 
+<div id='pencadre'>
+  ![](./img/r_for_data_science.png){width=100px}
+ 
+  http://swcarpentry.github.io/r-novice-gapminder/
+</div> 
+
+ \ 
+ 
+- **Margot** and **Alexandre** for assistance!
+
+
+# Some R background
+
+![](./img/Rlogo.png){width=40px}
+is a programming language and free software environment for statistical
+computing and graphics supported by the *R Foundation for Statistical Computing*.
+
+- Created by **Ross Ihaka** and **Robert Gentleman**
+- initial version released in 1995
+- free and open-source implementation the S programming language
+- currently developed by the **R Development Core Team**.
+
+
+Reasons to use it:
+
+- It’s free, well documented, and runs almost everywhere
+- it has a large (and growing) user base among scientists
+- it has a large library of external packages available for performing diverse tasks. 
+
+- **15,068** available packages on https://cran.r-project.org/
+- **3,087** available packages on http://www.bioconductor.org
+- **122,720** available repository on https://github.com/
+
+R is usually used in a terminal:
+
+![](./img/R_terminal.png)
+
+
+# RStudio, the R Integrated development environment (*IDE*)
+
+IDE application that provides **comprehensive facilities** to computer programmers for
+software development. Rstudio is **free** and **open-source**.
+
+
+### An interface
+
+![](./img/RStudio.png)
+
+### The same console as before (in Red box)
+
+![](./img/RStudio_console.png)
+
+# R as a calculator
+
+- Add: `+`
+- Divide: `/`
+- Multiply: `*`
+- Subtract: `-`
+- Exponents: `^` or `**`
+- Parentheses: `(`, `)`
+
+<div id='pencadre'>
+**Now Open RStudio.**
+
+**Write the commands in the grey box in the terminal.**
+
+**The expected results will always be printed in a white box here.**
+
+**You can `copy-paste` but I advise you to practice writing directly in the terminal. To validate the line at the end of your command: press `Return`.**
+</div> 
+
+
+### First commands
+
+```{r calculatorstep1, include=TRUE}
+1 + 100
+```
+
+ \ 
+```R
+1 +
+```
+The console displays `+`.  
+It is waiting for the next command. Write just `100` :
+
+```R
+100
+```
+```{r calculatorstep2, echo=FALSE}
+1 + 100
+```
+
+
+### R keeps to the mathematical order
+```{r calculatorstep3, include=TRUE}
+3 + 5 * 2
+```
+
+```{r calculatorstep4, include=TRUE}
+(3 + 5) * 2
+```
+
+ \ 
+
+```{r calculatorstep5, include=TRUE}
+(3 + (5 * (2 ^ 2))) # hard to read
+3 + 5 * (2 ^ 2)     # if you forget some rules, this might help
+```
+ \ 
+ **Note :** The text following a `#` is a comment. It will not be interpreted by R. In the future, I advise you to use comments a lot to explain in your own words what the command means. 
+
+### Scientific notation
+```{r calculatorstep6, include=TRUE}
+2/10000
+```
+
+
+`2e-4` is shorthand for `2 * 10^(-4)`
+
+```{r calculatorstep7, include=TRUE}
+5e3
+```
+
+### Mathematical functions
+
+```{r calculatorstep8, include=TRUE}
+log(1)  # natural logarithm
+```
+
+```{r calculatorstep9, include=TRUE}
+log10(10) # base-10 logarithm
+```
+
+```{r calculatorstep10, include=TRUE}
+exp(0.5)
+```
+
+\ 
+
+Compute the factorial of 9 (`9!`)
+
+```{r calculatorstep11, include=TRUE}
+9 * 8 * 7 * 6 * 5 * 4 * 3 * 2 * 1
+```
+
+ \ 
+or
+```{r calculatorstep12, include=TRUE}
+factorial(9)
+```
+
+### Comparing things
+
+Comparisons can be made with R. The result will return a `TRUE` or `FALSE` value (`boolean` type).
+
+
+equality (note two equal signs read as "is equal to")
+```{r calculatorstep13, include=TRUE}
+1 == 1
+```
+
+
+inequality (read as "is not equal to")
+```{r calculatorstep14, include=TRUE}
+1 != 2 
+```
+
+
+less than
+```{r calculatorstep15, include=TRUE}
+1 < 2
+```
+
+
+less than or equal to
+```{r calculatorstep16, include=TRUE}
+1 <= 1
+```
+
+
+greater than
+```{r calculatorstep17, include=TRUE}
+1 > 0
+```
+
+ \ \
+ 
+<fieldset id='pencadre' style='text-align: left'>
+  <legend style='border: 0px;'>Summary box</legend>
+  <li> R is a programming language and free software environment for statistical
+computing and graphics (free & opensource) with a large library of external packages available for performing diverse tasks.</li>
+  <li> RStudio is an IDR application that provides comprehensive facilities to computer programmers for
+software development.</li>
+  <li>R as a calculator </li>
+  <li>R allows comparisons to be made </li>
+  
+</fieldset>
+ 
+ \ \ 
+ \ \ 
+
+# Variables and assignment
+
+`<-` is the assignment operator in R. (read as left member take right member value) 
+
+` = ` also exists but is **not recommended!** It will be used preferentially in other cases.  (*We will see them later*)
+
+```{r VandAstep1, include=TRUE}
+x <- 1/40
+```
+
+```{r VandAstep2, include=TRUE}
+x
+```
+
+### The environment
+
+You now see the `x` value in the environment box (*in red*).
+
+![](./img/RStudio_environment.png)
+
+ \ 
+
+This **variable** is present in your work environment. You can use it to perform different mathematical applications.
+
+
+```{r VandAstep3, include=TRUE}
+log(x)
+```
+
+You can assign another value to `x`.
+```{r VandAstep4, include=TRUE}
+x <- 100
+log(x)
+```
+
+\ 
+```{r VandAstep5, include=TRUE}
+x <- x + 1  # x become 101 (100 + 1)
+y <- x * 2
+y
+```
+
+ \ 
+
+A variable can be assigned a `numeric` value as well as a `character` value.
+
+Just put our character (or string) between double quote `"` when you assign this value.
+```{r VandAstep6, include=TRUE}
+z <- "x"  # One character
+z
+a <- "Hello world"  # Multiple characters == String
+a
+```
+
+ \ 
+```R
+x + z
+```
+ \ 
+ 
+How to test the type of the variable?
+```{r VandAstep20, include=TRUE}
+
+is.character(z)
+
+b <- 1/40
+b
+typeof(b)
+```
+
+### Variables names
+
+Variable names can contain letters, numbers, underscores and periods.
+
+They cannot start with a number nor contain spaces at all.
+
+Different people use different conventions for long variable names, these include:
+
+```
+periods.between.words
+underscores_between_words
+camelCaseToSeparateWords
+```
+
+What you use is up to you, but be consistent.
+
+ \ 
+
+<div id="pquestion">   Which of the following are valid R variable names?</div>
+
+```
+min_height
+max.height
+_age            # no
+.mass
+MaxLength
+min-length      # no
+2widths         # no
+celsius2kelvin
+```
+ \ 
+
+### Functions are also variables
+
+```{r VandAstep7, include=TRUE}
+logarithm <- log
+```
+
+
+A R function can have different arguments
+
+```
+function (x, base = exp(1))
+```
+
+- `base` is a named argument are read from left to right
+- named arguments breaks the reading order
+- named arguments make your code more readable
+
+ \ 
+ 
+To know more about the `log` function we can read its manual.
+
+```{r VandAstep8, include=TRUE}
+help(log)
+```
+
+or
+
+```{r VandAstep9, include=TRUE}
+?log
+```
+
+ \ 
+This block allows you to view the different outputs (?help, graphs, etc.).
+
+![](./img/formationR_VandAstep8_encadre.png)
+
+
+ \ 
+ 
+
+### A code editor
+
+![](./img/RStudio_editor.png)
+
+ \ 
+ 
+RStudio offers you great flexibility in running code from within the editor window. There are buttons, menu choices, and keyboard shortcuts. To run the current line, you can
+
+- click on the `Run button` above the editor panel, or
+- select “Run Lines” from the “Code” menu, or
+- hit `Ctrl`+`Return` in Windows or Linux or `Cmd`+`Return` on OS X. To run a block of code, select it and then Run. 
+
+If you have modified a line of code within a block of code you have just run, there is no need to reselect the section and Run, you can use the next button along, Rerun the previous region. This will run the previous code block including the modifications you have made.
+
+
+Copy your `function` into a `tp_1.R` file
+
+ \ 
+
+We can  define our own function with :
+
+- function name,
+- declaration of function type,
+- arguments,
+- `{` and `}` top open and close function,
+
+```
+function_name <- function(a, b){
+
+
+}
+```
+- a series of operations,
+
+```
+function_name <- function(a, b){
+  result_1 <- operation1(a, b)
+  result_2 <- operation2(result_1, b)
+
+}
+```
+
+- `return` operation
+
+```
+function_name <- function(a, b){
+  result_1 <- operation1(a, b)
+  result_2 <- operation2(result_1, b)
+  return(result_2)
+}
+```
+
+ \ 
+<div id="pquestion">How to write a function to test if a number is even?</div>
+
+ \ 
+ 
+ 
+```{r VandAstep11, include=TRUE}
+even_test <- function(x){
+  modulo_result <- x %% 2         # %% is modulo operator
+  is_even <- modulo_result == 0
+  return(is_even)
+}
+
+even_test(4)
+
+even_test(3)
+```
+
+ **Note :** A function can write in several forms.
+
+ \ 
+
+No We can now clean your environment
+
+```{r VandAstep15, include=TRUE}
+rm(x)
+```
+
+
+```{r VandAstep16, include=TRUE}
+?rm
+```
+
+```{r VandAstep17, include=TRUE}
+ls()
+```
+
+```{r VandAstep18, include=TRUE}
+rm(list = ls())
+```
+
+```{r VandAstep19, include=TRUE}
+ls()
+```
+
+ \ 
+
+<fieldset id='pencadre' style='text-align: left'>
+  <legend style='border: 0px;'>Summary box</legend>
+  <li> Assigning a variable is done with ` <- `.</li>
+  <li> The assigned variables are listed in the environment box.</li>
+  <li> Variable names can contain letters, numbers, underscores and periods. </li>
+  <li> Functions are also variable and can write in several forms</li>
+  <li> An editing box is available on Rstudio.</li>
+
+</fieldset>
+
+ \ 
+ 
+ \ 
+ 
+# Complex variable type 
+
+### Vector (aka list)
+
+```{r Vecstep1, include=TRUE}
+c(1, 2, 3, 4, 5)
+```
+
+or
+
+```{r Vecstep2, include=TRUE}
+c(1:5)
+```
+
+ \ 
+ 
+ \ 
+A mathematical calculation can be performed on the elements of the vector:
+
+```{r Vecstep3, include=TRUE}
+2^c(1:5)
+```
+
+
+```{r Vecstep4, include=TRUE}
+x <- c(1:5)
+2^x
+```
+
+ \ 
+ 
+ \ 
+To determine the type of the elements of a vector:
+
+```{r Vecstep5, include=TRUE}
+typeof(x)
+```
+
+
+```{r Vecstep6, include=TRUE}
+typeof(x + 0.5)
+x + 0.5
+```
+
+
+```{r Vecstep7, include=TRUE}
+is.vector(x)
+```
+
+ \ 
+ 
+ \ 
+
+```{r Vecstep8, include=TRUE}
+y <- c(a = 1, b = 2, c = 3, d = 4, e = 5)
+```
+
+ \ 
+We can compare the elements of two vectors:
+
+```{r Vecstep9, include=TRUE}
+x
+y
+x == y
+```
+
+ \ 
+
+ \ 
+
+<fieldset id='pencadre' style='text-align: left'>
+  <legend style='border: 0px;'>Summary box</legend>
+  <li> A variable can be of different types : `numeric`, `character`, `vector`, `function`, etc.</li>
+  <li> Calculations and comparisons apply to vectors.</li>
+  <li> Do not hesitate to use the help box to understand functions!  </li>
+</fieldset>
+
+ \ 
+ 
+ \ 
+
+# Packages  
+### Installing packages
+
+```R
+install.packages("tidyverse")
+```
+
+or click on `Tools` and `Install Packages...`
+
+![](./img/formationR_installTidyverse.png)
+
+ \ 
+
+```R
+install.packages("ggplot2")
+```
+
+### Loading packages
+
+```{r packagesstep1, include=TRUE}
+sessionInfo()
+```
+
+ \ 
+
+```{r packagesstep2, include=TRUE}
+library(tidyverse)
+```
+
+
+```R
+sessionInfo()
+```
+### Unloading packages
+
+```{r packagesstep4, include=TRUE}
+unloadNamespace("tidyverse")
+```
+
+
+```R
+sessionInfo()
+```
+
+ \ 
+ 
+##See you to Session#2 : "Introduction to Tidyverse"
diff --git a/session_1/img/formationR_VandAstep8.png b/session_1/img/formationR_VandAstep8.png
new file mode 100644
index 0000000000000000000000000000000000000000..3891cbd4957acbbba117d59c024015483ce2d3ad
Binary files /dev/null and b/session_1/img/formationR_VandAstep8.png differ
diff --git a/session_1/img/formationR_VandAstep8_encadre.png b/session_1/img/formationR_VandAstep8_encadre.png
new file mode 100644
index 0000000000000000000000000000000000000000..25f0562fcedb0c0c94c2b43317ef7d4acacd900e
Binary files /dev/null and b/session_1/img/formationR_VandAstep8_encadre.png differ
diff --git a/session_1/img/formationR_installTidyverse.png b/session_1/img/formationR_installTidyverse.png
new file mode 100644
index 0000000000000000000000000000000000000000..0649cdbae2ac2c5bdf7341be54c734bd5b5dfcf0
Binary files /dev/null and b/session_1/img/formationR_installTidyverse.png differ
diff --git a/session_1/img/intro_img.png b/session_1/img/intro_img.png
new file mode 100644
index 0000000000000000000000000000000000000000..2bc5c70194b9b25ad19f6442411089670935d342
Binary files /dev/null and b/session_1/img/intro_img.png differ
diff --git a/session_1/slides.Rmd b/session_1/slides.Rmd
index afa5846791657956601aab7c51997d8a9c9c5825..ea75ec293ffd5fb074fb9ea15fb9440af5f636a6 100644
--- a/session_1/slides.Rmd
+++ b/session_1/slides.Rmd
@@ -1,15 +1,19 @@
 ---
-title: "R#1: Introduction to R and RStudio"
-author: Laurent Modolo [laurent.modolo@ens-lyon.fr](mailto:laurent.modolo@ens-lyon.fr)
-date: 10 Oct 2019
+title: 'R#1: Introduction to R and RStudio'
+author: "Laurent Modolo [laurent.modolo@ens-lyon.fr](mailto:laurent.modolo@ens-lyon.fr)"
+date: "10 Oct 2019"
 output:
   beamer_presentation:
-    theme: "metropolis"
-    slide_level: 3
-    fig_caption: false
     df_print: tibble
+    fig_caption: no
     highlight: tango
     latex_engine: xelatex
+    slide_level: 3
+    theme: metropolis
+  ioslides_presentation:
+    highlight: tango
+  slidy_presentation:
+    highlight: tango
 ---
 ## R#1: Introduction to R and RStudio
 The goal of this practical is to familiarize yourself with R and the RStudio
diff --git a/session_1/slides_b.Rmd b/session_1/slides_b.Rmd
index cb12a4b6d59a37a1bd9f48d62a33aaddfef52f0d..bfd6d2680014bff39f05504c6507ebe450037c46 100644
--- a/session_1/slides_b.Rmd
+++ b/session_1/slides_b.Rmd
@@ -345,7 +345,7 @@ rosette[13]
 
 ## Matrix
 
-In R matrix are two dimensional vectors
+In R matrices are two dimensional vectors
 
 ```R
 matrix_example <- matrix(1:(6*3), ncol=6, nrow=3)
@@ -368,10 +368,12 @@ matrix_example[2, 3]
 
 ## DataFrame
 
-In R `data.frame` are table type with mixed type
+In R `data.frame` are a table type with mixed type
 
 ```R
-data_frame_example <- data.frame(numbers=1:26, letters=letters, LETTERS=LETTERS)
+data_frame_example <- data.frame(numbers=1:26, 
+                                 letters=letters, 
+                                 LETTERS=LETTERS)
 data_frame_example
 ```
 
diff --git a/session_2/HTML_tuto_s2.Rmd b/session_2/HTML_tuto_s2.Rmd
new file mode 100644
index 0000000000000000000000000000000000000000..6c22ae24f9939631460033b5e2b2aa6f2d20e2a7
--- /dev/null
+++ b/session_2/HTML_tuto_s2.Rmd
@@ -0,0 +1,472 @@
+---
+title: "R#2: introduction to Tidyverse"
+author: "Laurent Modolo [laurent.modolo@ens-lyon.fr](mailto:laurent.modolo@ens-lyon.fr), Hélène Polvèche [hpolveche@istem.fr](mailto:hpolveche@istem.fr)"
+date: "Mars 2020"
+output:
+  html_document: default
+  pdf_document: default
+---
+<style type="text/css">
+h3 { /* Header 3 */
+  position: relative ;
+  color: #729FCF ;
+  left: 5%;
+}
+h2 { /* Header 2 */
+  color: darkblue ;
+  left: 10%;
+} 
+h1 { /* Header 1 */
+  color: #034b6f ;
+} 
+#pencadre{
+  border:1px; 
+  border-style:solid; 
+  border-color: #034b6f; 
+  background-color: #EEF3F9; 
+  padding: 1em;
+  text-align: center ;
+  border-radius : 5px 4px 3px 2px;
+}
+legend{
+  color: #034b6f ;
+}
+#pquestion {
+  color: darkgreen;
+  font-weight: bold;
+  
+}
+}
+</style>
+```{r setup, include=FALSE}
+knitr::opts_chunk$set(echo = TRUE)
+
+library(tidyverse)
+# tmp <- tempfile(fileext = ".zip")
+# download.file("http://www.fueleconomy.gov/feg/epadata/vehicles.csv.zip",
+#               tmp,
+#               quiet = TRUE)
+# unzip(tmp, exdir = "data-raw")
+# new_class_level <- c(
+#   "Compact Cars",
+#   "Large Cars",
+#   "Midsize Cars",
+#   "Midsize Cars",
+#   "Midsize Cars",
+#   "Compact Cars",
+#   "Minivan",
+#   "Minivan",
+#   "Pickup Trucks",
+#   "Pickup Trucks",
+#   "Pickup Trucks",
+#   "Sport Utility Vehicle",
+#   "Sport Utility Vehicle",
+#   "Compact Cars",
+#   "Special Purpose Vehicle",
+#   "Special Purpose Vehicle",
+#   "Special Purpose Vehicle",
+#   "Special Purpose Vehicle",
+#   "Special Purpose Vehicle",
+#   "Special Purpose Vehicle",
+#   "Sport Utility Vehicle",
+#   "Sport Utility Vehicle",
+#   "Pickup Trucks",
+#   "Pickup Trucks",
+#   "Pickup Trucks",
+#   "Pickup Trucks",
+#   "Sport Utility Vehicle",
+#   "Sport Utility Vehicle",
+#   "Compact Cars",
+#   "Two Seaters",
+#   "Vans",
+#   "Vans",
+#   "Vans",
+#   "Vans"
+# )
+# new_fuel_level <- c(
+#   "gas",
+#   "Diesel",
+#   "Regular",
+#   "gas",
+#   "gas",
+#   "Regular",
+#   "Regular",
+#   "Hybrid",
+#   "Hybrid",
+#   "Regular",
+#   "Regular",
+#   "Hybrid",
+#   "Hybrid"
+# )
+# read_csv("data-raw/vehicles.csv") %>%
+#   select(
+#     "id",
+#     "make",
+#     "model",
+#     "year",
+#     "VClass",
+#     "trany",
+#     "drive",
+#     "cylinders",
+#     "displ",
+#     "fuelType",
+#     "highway08",
+#     "city08"
+#   ) %>% 
+#   rename(
+#     "class" = "VClass",
+#     "trans" = "trany",
+#     "drive" = "drive",
+#     "cyl" = "cylinders",
+#     "displ" = "displ",
+#     "fuel" = "fuelType",
+#     "hwy" = "highway08",
+#     "cty" = "city08"
+#   ) %>%
+#   filter(drive != "") %>%
+#   drop_na() %>% 
+#   arrange(make, model, year) %>%
+#   mutate(class = factor(as.factor(class), labels = new_class_level)) %>%
+#   mutate(fuel = factor(as.factor(fuel), labels = new_fuel_level)) %>%
+#   write_csv("2_data.csv")
+
+```
+
+
+The goal of this practical is to familiarize yourself with `ggplot2`.
+
+The objectives of this session will be to:
+
+- Create basic plot with `ggplot2`
+- Understand the `tibble` type
+- Learn the different aesthetics in R plots
+- Compose graphics
+
+
+<div id='pencadre'>
+
+**Write the commands in the grey box in the terminal.**
+
+**The expected results will always be printed in a white box here.**
+
+**You can `copy-paste` but I advise you to practice writing directly in the terminal. To validate the line at the end of your command: press `Return`.**
+</div> 
+
+
+## Tidyverse
+
+The tidyverse is a collection of R packages designed for data science.
+
+All packages share an underlying design philosophy, grammar, and data structures.
+
+<center>
+![](./img/tidyverse.jpg){width=500px}
+</center>
+
+ \ 
+```R
+install.packages("tidyverse")
+```
+
+```R
+library("tidyverse")
+```
+
+ \ 
+ 
+### Toy data set `mpg`
+
+This dataset contains a subset of the fuel economy data that the EPA makes available on http://fueleconomy.gov . It contains only models which had a new release every year between 1999 and 2008.
+
+
+```{r mpg_inspect, include=TRUE}
+?mpg
+mpg
+```
+
+```{r mpg_inspect2, include=TRUE}
+dim(mpg)
+```
+
+```R
+View(mpg)
+```
+### New script
+
+![](./img/formationR_session2_scriptR.png)
+
+ \ 
+
+<!-- ### Updated version of the data -->
+
+<!-- `mpg` is loaded with tidyverse, we want to be able to read our own data from -->
+
+<!--  \  -->
+<!-- http://perso.ens-lyon.fr/laurent.modolo/R/2_data.csv -->
+
+<!-- ```{r mpg_download, cache=TRUE, message=FALSE} -->
+<!-- new_mpg <- read_csv( -->
+<!--   "http://perso.ens-lyon.fr/laurent.modolo/R/2_data.csv" -->
+<!--   ) -->
+
+<!-- ``` -->
+
+ \ 
+ 
+ \ 
+
+# First plot with `ggplot2`
+
+Relationship between engine size `displ` and fuel efficiency `hwy`.
+
+```{r new_mpg_plot_a, cache = TRUE, fig.width=8, fig.height=4.5}
+ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + 
+  geom_point()
+
+```
+
+### Composition of plot with `ggplot2`
+
+
+```
+ggplot(data = <DATA>) + 
+  <GEOM_FUNCTION>(mapping = aes(<MAPPINGS>))
+```
+
+- you begin a plot with the function `ggplot()`
+- you complete your graph by adding one or more layers
+- `geom_point()` adds a layer with a scatterplot
+- each geom function in `ggplot2` takes a `mapping` argument
+- the `mapping` argument is always paired with `aes()`
+
+ \ 
+ 
+
+<div id="pquestion"> - Make a scatterplot of `hwy` ( fuel efficiency ) vs. `cyl` ( number of cylinders ). </div>
+
+ \ 
+ 
+ \ 
+ 
+ \ 
+ 
+ \ 
+ 
+ \ 
+ 
+ \ 
+ 
+ \ 
+ 
+ \ 
+ 
+ 
+
+
+```{r new_mpg_plot_b, cache = TRUE, fig.width=8, fig.height=4.5}
+ggplot(data = mpg, mapping = aes(x = hwy, y = cyl)) + 
+  geom_point()
+```
+
+ \ 
+ 
+
+ \ 
+ 
+# Aesthetic mappings
+
+
+`ggplot2` will automatically assign a unique level of the aesthetic (here a unique color) to each unique value of the variable, a process known as scaling. `ggplot2` will also add a legend that explains which levels correspond to which values.
+
+Try the following aesthetic:
+
+- `size`
+- `alpha`
+- `shape`
+
+### Aesthetic mappings : `color`
+
+```{r new_mpg_plot_e, cache = TRUE, fig.width=8, fig.height=4.5}
+ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = class)) + 
+  geom_point()
+```
+
+
+### Aesthetic mappings : `size`
+
+```{r new_mpg_plot_f, cache = TRUE, fig.width=8, fig.height=4.5, warning=FALSE}
+ggplot(data = mpg, mapping = aes(x = displ, y = hwy, size = class)) + 
+  geom_point()
+```
+
+###  Aesthetic mapping : `alpha`
+
+```{r new_mpg_plot_g, cache = TRUE, fig.width=8, fig.height=4.5, warning=FALSE}
+ggplot(data = mpg, mapping = aes(x = displ, y = hwy, alpha = class)) + 
+  geom_point()
+```
+
+###  Aesthetic mapping : `shape`
+
+```{r new_mpg_plot_h, cache = TRUE, fig.width=8, fig.height=4.5, warning=FALSE}
+ggplot(data = mpg, mapping = aes(x = displ, y = hwy, shape = class)) + 
+  geom_point()
+```
+
+ 
+ \ 
+
+You can also set the aesthetic properties of your geom manually. For example, we can make all of the points in our plot blue and squares:
+
+```{r new_mpg_plot_i, cache = TRUE, fig.width=8, fig.height=4.5}
+ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + 
+  geom_point(color = "blue", shape=0)
+```
+
+
+ \ 
+<center>
+![](./img/shapes.png){width=300px}
+
+ \ 
+ 
+![](./img/colors.png){width=100px} 
+</center>
+
+<div id="pquestion">- What’s gone wrong with this code? Why are the points not blue?</div>
+
+```R
+ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = "blue")) + 
+  geom_point()
+```
+ \ 
+ 
+ \ 
+ 
+ \ 
+ 
+ \ 
+ 
+ \ 
+ 
+ \ 
+ 
+ \ 
+
+```{r res2, cache = TRUE, echo=FALSE, fig.width=8, fig.height=4.5}
+ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = "blue")) + 
+  geom_point()
+```
+
+ \ 
+ 
+- Map a **continuous** variable to color.
+
+```{r continu, cache = TRUE, fig.width=8, fig.height=4.5}
+ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = cyl)) + 
+  geom_point()
+```
+
+<div id="pquestion">- What happens if you map an aesthetic to something other than a variable name, like `color = displ < 5`?</div>
+```{r condiColor, cache = TRUE, fig.width=8, fig.height=4.5}
+ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = displ < 5)) + 
+  geom_point()
+```
+
+ \ 
+ 
+ \ 
+ 
+# Facets
+
+
+```{r new_mpg_plot_k, cache = TRUE, fig.width=8, fig.height=4.5}
+ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + 
+  geom_point() + 
+  facet_wrap(~class, nrow = 2)
+```
+
+ \ 
+
+```{r new_mpg_plot_l, cache = TRUE, fig.width=8, fig.height=4.5}
+ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + 
+  geom_point() + 
+  facet_wrap(~ fl + class, nrow = 2)
+```
+
+# Composition
+
+There are different ways to represent the information :
+
+```{r new_mpg_plot_o, cache = TRUE, fig.width=8, fig.height=4.5}
+ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + 
+  geom_point()
+```
+
+ \ 
+
+```{r new_mpg_plot_p, cache = TRUE, fig.width=8, fig.height=4.5, message=FALSE}
+ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + 
+  geom_smooth()
+```
+
+ \ 
+
+We can add as many layers as we want
+
+```{r new_mpg_plot_q, cache = TRUE, fig.width=8, fig.height=4.5, message=FALSE}
+ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + 
+  geom_point() +
+  geom_smooth()
+```
+
+ \
+
+We can make `mapping` layer specific
+
+```{r new_mpg_plot_s, cache = TRUE, fig.width=8, fig.height=4.5, message=FALSE}
+ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + 
+  geom_point(mapping = aes(color = class)) +
+  geom_smooth()
+```
+
+ \ 
+
+We can use different `data` for different layer (You will lean more on `filter()` later)
+
+```{r new_mpg_plot_t, cache = TRUE, fig.width=8, fig.height=4.5, message=FALSE}
+ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + 
+  geom_point(mapping = aes(color = class)) +
+  geom_smooth(data = filter(mpg, class == "subcompact"))
+```
+
+# Challenge  !
+
+<div id="pquestion">- Run this code in your head and predict what the output will look like. Then, run the code in R and check your predictions.</div>
+```R
+ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv)) + 
+  geom_point() + 
+  geom_smooth(se = FALSE)
+```
+**http://perso.ens-lyon.fr/laurent.modolo/R/2_d**
+
+- What does `show.legend = FALSE` do?
+- What does the `se` argument to `geom_smooth()` do?
+
+## Third challenge
+
+- Recreate the R code necessary to generate the following graph
+
+```{r new_mpg_plot_u, echo = FALSE, cache = TRUE, fig.width=8, fig.height=4.5, message=FALSE}
+ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv)) + 
+  geom_point() +
+  geom_smooth(mapping = aes(linetype = drv))
+```
+
+## Third challenge
+
+```{r new_mpg_plot_v, cache = TRUE, fig.width=8, fig.height=4.5, message=FALSE}
+ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv)) + 
+  geom_point() +
+  geom_smooth(mapping = aes(linetype = drv))
+```
\ No newline at end of file
diff --git a/session_2/img/colors.png b/session_2/img/colors.png
new file mode 100644
index 0000000000000000000000000000000000000000..102a7f0f771148cfd802ca21eaab569b96c60dac
Binary files /dev/null and b/session_2/img/colors.png differ
diff --git a/session_2/img/formationR_session2_scriptR.png b/session_2/img/formationR_session2_scriptR.png
new file mode 100644
index 0000000000000000000000000000000000000000..191ad3f9a270c6b07edd2317c711ba6b0f9cd9c4
Binary files /dev/null and b/session_2/img/formationR_session2_scriptR.png differ
diff --git a/session_2/img/shapes.png b/session_2/img/shapes.png
new file mode 100644
index 0000000000000000000000000000000000000000..1415e128cda08bc68508f8ac41e29881a31a1c17
Binary files /dev/null and b/session_2/img/shapes.png differ
diff --git a/session_2/img/tidyverse.jpg b/session_2/img/tidyverse.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..f89124a41cfd5f26a7b8570d8b9ccf00aff00508
Binary files /dev/null and b/session_2/img/tidyverse.jpg differ
diff --git a/session_3/HTML_tuto_s3.Rmd b/session_3/HTML_tuto_s3.Rmd
new file mode 100644
index 0000000000000000000000000000000000000000..4612e49d0ac0e2c083282ce7b682b4d2b24eb369
--- /dev/null
+++ b/session_3/HTML_tuto_s3.Rmd
@@ -0,0 +1,298 @@
+---
+title: 'R#3: Transformations with ggplot2'
+author: "Laurent Modolo [laurent.modolo@ens-lyon.fr](mailto:laurent.modolo@ens-lyon.fr), Hélène Polvèche [hpolveche@istem.fr](mailto:hpolveche@istem.fr)"
+date: "Mars 2020"
+output:
+  html_document: default
+  pdf_document: default
+---
+<style type="text/css">
+h3 { /* Header 3 */
+  position: relative ;
+  color: #729FCF ;
+  left: 5%;
+}
+h2 { /* Header 2 */
+  color: darkblue ;
+  left: 10%;
+} 
+h1 { /* Header 1 */
+  color: #034b6f ;
+} 
+#pencadre{
+  border:1px; 
+  border-style:solid; 
+  border-color: #034b6f; 
+  background-color: #EEF3F9; 
+  padding: 1em;
+  text-align: center ;
+  border-radius : 5px 4px 3px 2px;
+}
+legend{
+  color: #034b6f ;
+}
+#pquestion {
+  color: darkgreen;
+  font-weight: bold;
+}
+</style>
+
+```{r setup, include=FALSE, cache=TRUE}
+knitr::opts_chunk$set(echo = TRUE)
+```
+
+The goal of this practical is to practices advanced features of `ggplot2`.
+
+The objectives of this session will be to:
+
+- learn about statistical transformations
+- practices position adjustments
+- change the coordinate systems
+
+ \ 
+ 
+# `ggplot2` statistical transformations
+
+ \ 
+ 
+```{r packageloaded, include=TRUE, message=FALSE}
+library("tidyverse")
+```
+
+ \ 
+ 
+We are going to use the `diamonds` data set included in `tidyverse`.
+
+- Use the `help` and `view` command to explore this data set.
+- Try the `str` command, which information are displayed ?
+
+```R
+str(diamonds)
+```
+
+```
+## Classes 'tbl_df', 'tbl' and 'data.frame':    53940 obs. of  10 variables:
+##  $ carat  : num  0.23 0.21 0.23 0.29 0.31 0.24 0.24 0.26 0.22 0.23 ...
+##  $ cut    : Ord.factor w/ 5 levels "Fair"<"Good"<..: 5 4 2 4 2 3 3 3 1 3 ...
+##  $ color  : Ord.factor w/ 7 levels "D"<"E"<"F"<"G"<..: 2 2 2 6 7 7 6 5 2 5 ...
+##  $ clarity: Ord.factor w/ 8 levels "I1"<"SI2"<"SI1"<..: 2 3 5 4 2 6 7 3 4 5 ...
+##  $ depth  : num  61.5 59.8 56.9 62.4 63.3 62.8 62.3 61.9 65.1 59.4 ...
+##  $ table  : num  55 61 65 58 58 57 57 55 61 61 ...
+##  $ price  : int  326 326 327 334 335 336 336 337 337 338 ...
+##  $ x      : num  3.95 3.89 4.05 4.2 4.34 3.94 3.95 4.07 3.87 4 ...
+##  $ y      : num  3.98 3.84 4.07 4.23 4.35 3.96 3.98 4.11 3.78 4.05 ...
+##  $ z      : num  2.43 2.31 2.31 2.63 2.75 2.48 2.47 2.53 2.49 2.39 ...
+```
+
+ \ 
+ 
+We saw scatterplot (`geom_point()`), smoothplot (`geom_smooth()`). Now barplot with `geom_bar()` : 
+
+```{r diamonds_barplot, cache = TRUE, fig.width=8, fig.height=4.5, message=FALSE}
+ggplot(data = diamonds, mapping = aes(x = cut)) + 
+  geom_bar()
+```
+
+More diamonds are available with high quality cuts.
+
+On the x-axis, the chart displays cut, a variable from diamonds. On the y-axis, it displays count, but count is not a variable in diamonds!
+
+The algorithm used to calculate new values for a graph is called a **stat**, short for statistical transformation. The figure below describes how this process works with `geom_bar()`.
+
+![](img/visualization-stat-bar.png)
+
+
+You can generally use geoms and stats interchangeably. For example, you can recreate the previous plot using `stat_count()` instead of `geom_bar()`:
+
+```{r diamonds_stat_count, include=TRUE, fig.width=8, fig.height=4.5, message=FALSE}
+ggplot(data = diamonds, mapping = aes(x = cut)) + 
+  stat_count()
+```
+
+ \ 
+
+Every geom has a default stat; and every stat has a default geom. This means that you can typically use geoms without worrying about the underlying statistical transformation. There are three reasons you might need to use a stat explicitly:
+
+- You might want to override the default stat. 
+
+```{r 3_a, include=TRUE, fig.width=8, fig.height=4.5}
+demo <- tribble(
+  ~cut,         ~freq,
+  "Fair",       1610,
+  "Good",       4906,
+  "Very Good",  12082,
+  "Premium",    13791,
+  "Ideal",      21551
+)
+
+# (Don't worry that you haven't seen <- or tribble() before. You might be able
+# to guess at their meaning from the context, and you will learn exactly what
+# they do soon!)
+
+ggplot(data = demo, mapping = aes(x = cut, y = freq)) +
+  geom_bar(stat = "identity")
+
+```
+
+- You might want to override the default mapping from transformed variables to aesthetics ( e.g. proportion). 
+```{r 3_b, include=TRUE, fig.width=8, fig.height=4.5}
+ggplot(data = diamonds, mapping = aes(x = cut, y = ..prop.., group = 1)) + 
+  geom_bar()
+```
+  
+- In our proportion bar chart, we need to set `group = 1`. Why?
+
+```{r diamonds_stats_challenge, include=TRUE, message=FALSE, fig.width=8, fig.height=4.5}
+ggplot(data = diamonds, mapping = aes(x = cut, y = ..prop..)) + 
+  geom_bar()
+```
+
+If group is not used, the proportion is calculated with respect to the data that contains that field and is ultimately going to be 100% in any case. For instance, The proportion of an ideal cut in the ideal cut specific data will be 1.
+
+ \ 
+ 
+- You might want to draw greater attention to the statistical transformation in your code. 
+
+```{r 3_c, include=TRUE, fig.width=8, fig.height=4.5, message=FALSE}
+# you might use stat_summary(), which summarises the y values for each unique x
+# value, to draw attention to the summary that you are computing:
+
+ggplot(data = diamonds, mapping = aes(x = cut, y = depth)) + 
+  stat_summary()
+
+  
+ggplot(data = diamonds, mapping = aes(x = cut, y = depth)) + 
+  stat_summary(
+    fun.min = min,
+    fun.max = max,
+    fun = median
+  )
+```
+
+
+# Position adjustments
+
+ \ 
+ 
+You can colour a bar chart using either the `color` aesthetic, 
+
+```{r diamonds_barplot_color, cache = TRUE, fig.width=8, fig.height=4.5, message=FALSE}
+ggplot(data = diamonds, mapping = aes(x = cut, color = cut)) + 
+  geom_bar()
+```
+
+ \ 
+
+or, more usefully, `fill`:
+
+```{r diamonds_barplot_fill, cache = TRUE, fig.width=8, fig.height=4.5, message=FALSE}
+ggplot(data = diamonds, mapping = aes(x = cut, fill = cut)) + 
+  geom_bar()
+```
+
+
+
+You can also use `fill` with another variable:
+
+```{r diamonds_barplot_fill_clarity, cache = TRUE, fig.width=8, fig.height=4.5, message=FALSE}
+ggplot(data = diamonds, mapping = aes(x = cut, fill = clarity)) + 
+  geom_bar()
+```
+
+
+
+The stacking is performed by the position adjustment `position`
+
+### fill
+
+```{r diamonds_barplot_pos_fill, cache = TRUE, fig.width=8, fig.height=4.5, message=FALSE}
+ggplot(data = diamonds, mapping = aes(x = cut, fill = clarity)) + 
+  geom_bar( position = "fill")
+```
+
+### dodge
+
+```{r diamonds_barplot_pos_dodge, cache = TRUE, fig.width=8, fig.height=4.5, message=FALSE}
+ggplot(data = diamonds, mapping = aes(x = cut, fill = clarity)) + 
+  geom_bar( position = "dodge")
+```
+
+### jitter
+
+```{r diamonds_barplot_pos_jitter, cache = TRUE, fig.width=8, fig.height=4.5, message=FALSE}
+ggplot(data = diamonds, mapping = aes(x = cut, fill = clarity)) + 
+  geom_bar( position = "jitter")
+```
+
+
+
+```{r dia_jitter2, cache = TRUE, fig.width=8, fig.height=4.5, message=FALSE}
+ggplot(data = diamonds, mapping = aes(x = cut, y = depth, color = clarity)) + 
+  geom_point()
+```
+
+```{r dia_jitter3, cache = TRUE, fig.width=8, fig.height=4.5, message=FALSE}
+ggplot(data = diamonds, mapping = aes(x = cut, y = depth, color = clarity)) + 
+  geom_jitter()
+```
+
+### violin
+
+```{r dia_violon, cache = TRUE, fig.width=8, fig.height=4.5, message=FALSE}
+ggplot(data = diamonds, mapping = aes(x = cut, y = depth, color = clarity)) + 
+  geom_violin()
+```
+
+
+# Coordinate systems
+
+Cartesian coordinate system where the x and y positions act independently to determine the location of each point. There are a number of other coordinate systems that are occasionally helpful.
+
+
+```{r dia_boxplot, cache = TRUE, fig.width=8, fig.height=4.5, message=FALSE}
+ggplot(data = diamonds, mapping = aes(x = cut, y = depth, color = clarity)) + 
+  geom_boxplot()
+```
+
+
+
+```{r dia_boxplot_flip, cache = TRUE, fig.width=8, fig.height=4.5, message=FALSE}
+ggplot(data = diamonds, mapping = aes(x = cut, y = depth, color = clarity)) + 
+  geom_boxplot() +
+  coord_flip()
+```
+
+
+```{r dia_12, cache = TRUE, fig.width=8, fig.height=4.5, message=FALSE}
+ggplot(data = diamonds, mapping = aes(x = depth, y = table)) + 
+  geom_point() +
+  geom_abline()
+```
+
+
+```{r dia_quickmap, cache = TRUE, fig.width=8, fig.height=4.5, message=FALSE}
+ggplot(data = diamonds, mapping = aes(x = depth, y = table)) + 
+  geom_point() +
+  geom_abline() +
+  coord_quickmap()
+```
+
+
+
+
+```{r diamonds_bar, cache = TRUE, fig.width=8, fig.height=4.5, message=FALSE}
+bar <- ggplot(data = diamonds, mapping = aes(x = cut, fill = cut)) + 
+  geom_bar( show.legend = FALSE,  width = 1 ) + 
+  theme(aspect.ratio = 1) +
+  labs(x = NULL, y = NULL)
+
+bar
+```
+
+
+```{r diamonds_bar_polar, cache = TRUE, fig.width=8, fig.height=4.5, message=FALSE}
+bar + coord_polar()
+```
+
+
+##See you to Session#4 : "data transformation"
\ No newline at end of file
diff --git a/session_4/HTML_toto_s4.Rmd b/session_4/HTML_toto_s4.Rmd
new file mode 100644
index 0000000000000000000000000000000000000000..cb620eedad82fe19e99c80517899001ee3be3254
--- /dev/null
+++ b/session_4/HTML_toto_s4.Rmd
@@ -0,0 +1,342 @@
+---
+title: "R#4: data transformation"
+author: "Laurent Modolo [laurent.modolo@ens-lyon.fr](mailto:laurent.modolo@ens-lyon.fr), Hélène Polvèche [hpolveche@istem.fr](mailto:hpolveche@istem.fr)"
+date: "Mars 2020"
+output:
+  html_document: default
+  pdf_document: default
+---
+<style type="text/css">
+h3 { /* Header 3 */
+  position: relative ;
+  color: #729FCF ;
+  left: 5%;
+}
+h2 { /* Header 2 */
+  color: darkblue ;
+  left: 10%;
+} 
+h1 { /* Header 1 */
+  color: #034b6f ;
+} 
+#pencadre{
+  border:1px; 
+  border-style:solid; 
+  border-color: #034b6f; 
+  background-color: #EEF3F9; 
+  padding: 1em;
+  text-align: center ;
+  border-radius : 5px 4px 3px 2px;
+}
+legend{
+  color: #034b6f ;
+}
+#pquestion {
+  color: darkgreen;
+  font-weight: bold;
+}
+</style>
+
+```{r setup, include=FALSE, cache=TRUE}
+knitr::opts_chunk$set(echo = TRUE)
+```
+
+The goal of this practical is to practices data transformation with `tidyverse`.
+The objectives of this session will be to:
+
+- Filter rows with `filter()`
+- Arrange rows with `arrange()`
+- Select columns with `select()`
+- Add new variables with `mutate()`
+- Combining multiple operations with the pipe `%>%`
+
+```R
+install.packages("nycflights13")
+```
+
+```{r packageloaded, include=TRUE, message=FALSE}
+library("tidyverse")
+library("nycflights13")
+```
+
+ \ 
+ 
+# Data set : nycflights13
+
+`nycflights13::flights`contains all 336,776 flights that departed from New York City in 2013. The data comes from the US Bureau of Transportation Statistics, and is documented in `?flights`
+
+
+```{r display_data, include=TRUE}
+flights
+```
+
+- **int** stands for integers.
+- **dbl** stands for doubles, or real numbers.
+- **chr** stands for character vectors, or strings.
+- **dttm** stands for date-times (a date + a time).
+- **lgl** stands for logical, vectors that contain only TRUE or FALSE.
+- **fctr** stands for factors, which R uses to represent categorical variables with fixed possible values.
+- **date** stands for dates.
+
+ \ 
+ 
+# Filter rows with `filter()`
+
+`filter()` allows you to subset observations based on their values. 
+
+```{r filter_month_day, include=TRUE}
+filter(flights, month == 1, day == 1)
+```
+
+ \ 
+ 
+`dplyr` functions never modify their inputs, so if you want to save the result, you’ll need to use the assignment operator, `<-`
+
+```{r filter_month_day_sav, include=TRUE}
+jan1 <- filter(flights, month == 1, day == 1)
+```
+
+ \ 
+ 
+R either prints out the results, or saves them to a variable.
+
+```{r filter_month_day_sav_display, include=TRUE}
+(dec25 <- filter(flights, month == 12, day == 25))
+```
+
+ \ 
+ 
+# Logical operators
+
+Multiple arguments to `filter()` are combined with “and”: every expression must be true in order for a row to be included in the output.
+
+![](./img/transform-logical.png)
+
+ \ 
+
+Test the following operations:
+
+```{r filter_logical_operators, include=TRUE}
+filter(flights, month == 11 | month == 12)
+filter(flights, month %in% c(11, 12))
+filter(flights, !(arr_delay > 120 | dep_delay > 120))
+filter(flights, arr_delay <= 120, dep_delay <= 120)
+```
+
+ \ 
+ 
+# Missing values
+
+One important feature of R that can make comparison tricky are missing values, or `NA`s (“not availables”). 
+
+```{r filter_logical_operators_NA, include=TRUE}
+NA > 5
+10 == NA
+NA + 10
+```
+
+
+```{r filter_logical_operators_test_NA, include=TRUE}
+is.na(NA)
+```
+
+ \ 
+ 
+# Arrange rows with `arrange()`
+
+ \ 
+
+`arrange()` works similarly to `filter()` except that instead of selecting rows, it changes their order.
+
+```{r arrange_ymd, include=TRUE}
+arrange(flights, year, month, day)
+```
+
+ \ 
+Use `desc()` to re-order by a column in descending order:
+
+```{r arrange_desc, include=TRUE}
+arrange(flights, desc(dep_delay))
+```
+
+Missing values are always sorted at the end:
+
+```{r arrange_NA, include=TRUE}
+arrange(tibble(x = c(5, 2, NA)), x)
+arrange(tibble(x = c(5, 2, NA)), desc(x))
+```
+
+ \ 
+
+# Select columns with `select()`
+
+ \ 
+ 
+`select()` allows you to rapidly zoom in on a useful subset using operations based on the names of the variables.
+
+```{r select_ymd, , include=TRUE}
+select(flights, year, month, day)
+select(flights, year:day)
+select(flights, -(year:day))
+```
+
+ \ 
+
+here are a number of helper functions you can use within `select()`:
+
+- `starts_with("abc")`: matches names that begin with “abc”.
+- `ends_with("xyz")`: matches names that end with “xyz”.
+- `contains("ijk")`: matches names that contain “ijk”.
+- `num_range("x", 1:3)`: matches `x1`, `x2` and `x3`.
+
+See `?select` for more details.
+
+ \ 
+ 
+# Add new variables with `mutate()`
+
+ \ 
+ 
+It’s often useful to add new columns that are functions of existing columns. That’s the job of `mutate()`.
+
+```{r mutate, include=TRUE}
+flights_sml <- select(flights,  year:day, ends_with("delay"), distance, air_time)
+
+flights_sml
+
+mutate(flights_sml, gain = dep_delay - arr_delay,
+            speed = distance / air_time * 60)
+```
+
+ \ 
+
+```{r mutate_reuse, include=TRUE}
+flights_sml <- mutate(flights_sml, gain = dep_delay - arr_delay,
+            speed = distance / air_time * 60)
+
+```
+
+ \ 
+ 
+### Useful creation functions
+
+- Offsets: `lead()` and `lag()` allow you to refer to leading or lagging values. This allows you to compute running differences (e.g. `x - lag(x)`) or find when values change (`x != lag(x)`).
+- Cumulative and rolling aggregates: R provides functions for running sums, products, mins and maxes: `cumsum()`, `cumprod()`, `cummin()`, `cummax()`; and dplyr provides `cummean()` for cumulative means. 
+- Logical comparisons, `<`, `<=`, `>`, `>=`, `!=`, and `==`
+- Ranking: there are a number of ranking functions, but you should start with `min_rank()`. There is also `row_number()`, `dense_rank()`, `percent_rank()`, `cume_dist()`, `ntile()`
+
+ \ 
+ 
+# Combining multiple operations with the pipe
+
+ \ 
+ 
+We don't want to create useless intermediate variables so we can use the pipe operator: `%>%`
+( or `ctrl + shift + M`). 
+
+<div id="pquestion"> - Find the 10 most delayed flights using a ranking function. `min_rank()` </div>
+
+```{r pipe_example_a, include=TRUE}
+flights_md <- mutate(flights,
+                     most_delay = min_rank(desc(dep_delay)))
+flights_md <- filter(flights_md, most_delay < 10)
+flights_md <- arrange(flights_md, most_delay)
+```
+
+ \ 
+ 
+
+```{r pipe_example_b, include=TRUE}
+flights_md2 <- flights %>%
+    mutate(most_delay = min_rank(desc(dep_delay))) %>% 
+    filter(most_delay < 10) %>% 
+    arrange(most_delay)
+
+select(flights_md2, year:day, flight, origin, dest, dep_delay, most_delay)
+```
+
+ \ 
+
+Behind the scenes, `x %>% f(y)` turns into `f(x, y)`, and `x %>% f(y) %>% g(z)` turns into `g(f(x, y), z)` and so on. You can use the pipe to rewrite multiple operations in a way that you can read left-to-right, top-to-bottom. 
+
+ \ 
+
+Working with the pipe is one of the key criteria for belonging to the `tidyverse`. The only exception is `ggplot2`: it was written before the pipe was discovered. Unfortunately, the next iteration of `ggplot2`, `ggvis`, which does use the pipe, isn’t quite ready for prime time yet.
+
+# Grouped summaries with `summarise()`
+
+`summarise()` collapses a data frame to a single row:
+
+```{r load_data, include=TRUE}
+flights %>% 
+  summarise(delay = mean(dep_delay, na.rm = TRUE))
+```
+
+### The power of `summarise()` with `group_by()`
+
+This changes the unit of analysis from the complete dataset to individual groups. Then, when you use the `dplyr` verbs on a grouped data frame they’ll be automatically applied “by group”.
+
+```{r summarise_group_by, include=TRUE, fig.width=8, fig.height=3.5}
+flights_delay <- flights %>% 
+  group_by(year, month) %>% 
+  summarise(delay = mean(dep_delay, na.rm = TRUE), sd = sd(dep_delay, na.rm = TRUE)) %>% 
+  arrange(month)
+
+flights_delay
+
+ggplot(data = flights_delay, mapping = aes(x = month, y = delay)) +
+  geom_bar(stat="identity", color="black", fill = "#619CFF") +
+  geom_errorbar(mapping = aes( ymin=0, ymax=delay+sd)) + 
+  theme(axis.text.x = element_blank())
+
+```
+
+
+### Missing values
+
+You may have wondered about the na.rm argument we used above. What happens if we don’t set it?
+
+```{r summarise_group_by_NA, include=TRUE}
+flights %>% 
+  group_by(dest) %>% 
+  summarise(
+    dist = mean(distance),
+    delay = mean(arr_delay)
+  )
+```
+
+Aggregation functions obey the usual rule of missing values: if there’s any missing value in the input, the output will be a missing value.
+
+
+# Counts
+
+Whenever you do any aggregation, it’s always a good idea to include either a count (`n()`). That way you can check that you’re not drawing conclusions based on very small amounts of data.
+
+```{r summarise_group_by_count, include = TRUE, warning=F, message=F, fig.width=8, fig.height=3.5}
+summ_delay_filghts <- flights %>% 
+                      group_by(dest) %>% 
+                      summarise(
+                          count = n(),
+                          dist = mean(distance, na.rm = TRUE),
+                          delay = mean(arr_delay, na.rm = TRUE)
+                      )
+summ_delay_filghts
+
+ggplot(data = summ_delay_filghts, mapping = aes(x = dist, y = delay, size = count)) +
+  geom_point() +
+  geom_smooth(method = lm, se = FALSE) +
+  theme(legend.position='none')
+
+```
+
+## Thank you !
+
+ \ 
+ 
+## For curious or motivated people: Challenge time!
+
+ \ 
+ 
+ \ 
+ 
+ 
diff --git a/session_4/challengeTime.Rmd b/session_4/challengeTime.Rmd
new file mode 100644
index 0000000000000000000000000000000000000000..1986436996666b2aa1fcc1574c6e009ef12f0991
--- /dev/null
+++ b/session_4/challengeTime.Rmd
@@ -0,0 +1,139 @@
+---
+title: "Challenge time!"
+author: "Laurent Modolo [laurent.modolo@ens-lyon.fr](mailto:laurent.modolo@ens-lyon.fr), Hélène Polvèche [hpolveche@istem.fr](mailto:hpolveche@istem.fr)"
+date: "Mars 2020"
+output:
+  html_document: default
+  pdf_document: default
+---
+  <style type="text/css">
+  h3 { /* Header 3 */
+      position: relative ;
+    color: #729FCF ;
+      left: 5%;
+  }
+h2 { /* Header 2 */
+    color: darkblue ;
+  left: 10%;
+} 
+h1 { /* Header 1 */
+    color: #034b6f ;
+} 
+#pencadre{
+border:1px; 
+border-style:solid; 
+border-color: #034b6f; 
+  background-color: #EEF3F9; 
+  padding: 1em;
+text-align: center ;
+border-radius : 5px 4px 3px 2px;
+}
+legend{
+  color: #034b6f ;
+}
+#pquestion {
+color: darkgreen;
+font-weight: bold;
+}
+</style>
+  
+  ```{r setup, include=FALSE, cache=TRUE}
+knitr::opts_chunk$set(echo = TRUE)
+```
+
+
+### Filter challenges :
+
+Find all flights that:
+  
+  - Had an arrival delay of two or more hours
+- Were operated by United, American, or Delta
+- Departed between midnight and 6am (inclusive)
+
+Another useful dplyr filtering helper is `between()`. What does it do? Can you use it to simplify the code needed to answer the previous challenges?
+
+How many flights have a missing `dep_time`? What other variables are missing? What might these rows represent?
+
+Why is `NA ^ 0` not `NA`? Why is `NA | TRUE` not `NA`? Why is `FALSE & NA` not `NA`? Can you figure out the general rule? (`NA * 0` is a tricky counter-example!)
+
+### Arrange challenges :
+
+- Sort flights to find the most delayed flights. Find the flights that left earliest.
+- Sort flights to find the fastest flights.
+- Which flights traveled the longest? Which traveled the shortest?
+
+### Select challenges :
+
+- Brainstorm as many ways as possible to select `dep_time`, `dep_delay`, `arr_time`, and `arr_delay` from `flights`.
+- What does the `one_of()` function do? Why might it be helpful in conjunction with this vector?
+```{r select_one_of, eval=F, message=F, cache=T}
+vars <- c("year", "month", "day", "dep_delay", "arr_delay")
+```
+- Does the result of running the following code surprise you? How do the select helpers deal with case by default? How can you change that default?
+```{r select_contains, eval=F, message=F, cache=T}
+select(flights, contains("TIME"))
+```
+
+
+### Mutate challenges :
+
+- Currently `dep_time` and `sched_dep_time` are convenient to look at, but hard to compute with because they’re not really continuous numbers. Convert them to a more convenient representation of number of minutes since midnight.
+
+
+```{r mutate_challenges_a, eval=F, message=F, cache=T}
+mutate(
+  flights,
+  dep_time = (dep_time %/% 100) * 60 +
+    dep_time %% 100,
+  sched_dep_time = (sched_dep_time %/% 100) * 60 +
+    sched_dep_time %% 100
+)
+```
+
+\ 
+
+- Compare `dep_time`, `sched_dep_time`, and `dep_delay`. How would you expect those three numbers to be related?
+
+```{r mutate_challenge_b, eval=F, message=F, cache=T}
+mutate(
+  flights,
+  dep_time = (dep_time %/% 100) * 60 + 
+    dep_time %% 100,
+  sched_dep_time = (sched_dep_time %/% 100) * 60 +
+    sched_dep_time %% 100
+)
+```
+
+\ 
+
+### Challenge with `summarise()` and `group_by()`
+
+Imagine that we want to explore the relationship between the distance and average delay for each location. 
+here are three steps to prepare this data: 
+
+- Group flights by destination.
+- Summarise to compute distance, average delay, and number of flights.
+- Filter to remove noisy points and Honolulu airport, which is almost twice as far away as the next closest airport.
+
+```{r summarise_group_by_ggplot_a, eval = F}
+flights %>% 
+  group_by(dest)
+```
+
+ \ 
+
+Imagine that we want to explore the relationship between the distance and average delay for each location. 
+
+- Filter to remove noisy points and Honolulu airport, which is almost twice as far away as the next closest airport.
+
+```{r summarise_group_by_ggplot_b, eval = F}
+flights %>% 
+  group_by(dest) %>% 
+  summarise(
+    count = n(),
+    dist = mean(distance, na.rm = TRUE),
+    delay = mean(arr_delay, na.rm = TRUE)
+  )
+```
+
+