diff --git a/session_1/slides_a.Rmd b/session_1/slides_a.Rmd index cea6da3e4cb3552fcddef7431a9f6614aec40d51..97c3b9b0e7637a58e3f2da7d735cf74bb85061d1 100644 --- a/session_1/slides_a.Rmd +++ b/session_1/slides_a.Rmd @@ -3,8 +3,6 @@ title: 'R#1: Introduction to R and RStudio' author: "Laurent Modolo [laurent.modolo@ens-lyon.fr](mailto:laurent.modolo@ens-lyon.fr)" date: "10 Oct 2019" output: - slidy_presentation: - highlight: tango beamer_presentation: theme: metropolis slide_level: 3 @@ -12,6 +10,8 @@ output: df_print: tibble highlight: tango latex_engine: xelatex + slidy_presentation: + highlight: tango --- ## R#1: Introduction to R and RStudio The goal of this practical is to familiarize yourself with R and the RStudio @@ -318,10 +318,10 @@ Test that your `logarithm` function can work in base 10 We can also define our own function with ```R -function_name <- function(a, b){ - result_1 <- operation1(a, b) - result_2 <- operation2(result_1, b) - return(result_2) +<FUNCTION_NAME> <- function(a, b){ + <RESULT_1> <- <OPERATION_1>(a, b) + <RESULT_2> <- <OPERATION_2>(<RESULT_1>, b) + return(<RESULT_2>) } ``` diff --git a/session_1/slides_b.Rmd b/session_1/slides_b.Rmd index f2375b750b96c439c0dd59d2f7ff29f8dfbe0abd..f239afe72c60adc2f6a7e1c78d6f3f8d9b24209f 100644 --- a/session_1/slides_b.Rmd +++ b/session_1/slides_b.Rmd @@ -28,14 +28,16 @@ The objectives of this session will be to: **http://perso.ens-lyon.fr/laurent.modolo/R/session_1_a** +Press **[alt] + [shift] + k** + ## Functions are also variables We can also define our own function with ```R -function_name <- function(a, b){ - result_1 <- operation1(a, b) - result_2 <- operation2(result_1, b) - return(result_2) +<FUNCTION_NAME> <- function(a, b){ + <RESULT_1> <- <OPERATION_1>(a, b) + <RESULT_2> <- <OPERATION_2>(<RESULT_1>, b) + return(<RESULT_2>) } ``` @@ -263,20 +265,78 @@ x[5] y <- c(a = 1, b = 2, c = 3, d = 4, e = 5) typeof(y) is.vector(y) +names(y) y[1] y["a"] +names(y) <- c("b") ``` \pause ```R x == y +all.equal(x, y) ``` -\pause +## Vector challenge + +- use the `seq()` function to create a vector of even numbers +- You can concatenate vector with `c(<VECTOR_1>, <VECTOR_2>)`, concatenate a vector of integer with a vector of the first 5 letter of the alphabet. +- Check the default vectors `letters` and `LETTERS`, rewrite your previous command using them. +- Create a vector giving you the correspondence between small case letters and upper case letters. + +### Vector challenge +- use the `seq()` function to create a vector of even numbers ```R -all.equal(x, y) +seq(from=2, to=10, by=2) +``` +- You can concatenate vector with `c(<VECTOR_1>, <VECTOR_2>)`, concatenate a vector of integer with a vector of the first 5 letter of the alphabet. What is the type of this vector. +- Check the default vectors `letters` and `LETTERS`, rewrite your previous command using them. +- Create a vector giving you the correspondence between small case letters and upper case letters. + +### Vector challenge + +- use the `seq()` function to create a vector of even numbers +```R +seq(from=2, to=10, by=2) +``` +- You can concatenate vector with `c(<VECTOR_1>, <VECTOR_2>)`, concatenate a vector of integer with a vector of the first 5 letter of the alphabet. What is the type of this vector. +- Check the default vectors `letters` and `LETTERS`, rewrite your previous command using them. +- Create a vector giving you the correspondence between small case letters and upper case letters. + +### Vector challenge + +- use the `seq()` function to create a vector of even numbers +- You can concatenate vector with `c(<VECTOR_1>, <VECTOR_2>)`, concatenate a vector of integer with a vector of the first 5 letter of the alphabet. What is the type of this vector. +```R +c(1:5, "a", "b", "c") +typeof(c(1:5, "a", "b", "c")) +``` +- Check the default vectors `letters` and `LETTERS`, rewrite your previous command using them. +- Create a vector giving you the correspondence between small case letters and upper case letters. + +### Vector challenge + +- use the `seq()` function to create a vector of even numbers +- You can concatenate vector with `c(<VECTOR_1>, <VECTOR_2>)`, concatenate a vector of integer with a vector of the first 5 letter of the alphabet. What is the type of this vector. +- Check the default vectors `letters` and `LETTERS`, rewrite your previous command using them. +```R +c(1:5, letters[1:3]) +``` +- Create a vector giving you the correspondence between small case letters and upper case letters. + +### Vector challenge + +- use the `seq()` function to create a vector of even numbers +- You can concatenate vector with `c(<VECTOR_1>, <VECTOR_2>)`, concatenate a vector of integer with a vector of the first 5 letter of the alphabet. What is the type of this vector. +- Check the default vectors `letters` and `LETTERS`, rewrite your previous command using them. +- Create a vector giving you the correspondence between small case letters and upper case letters. +```R +rosette <- LETTERS +names(rosette) <- letters +rosette["b"] +rosette[13] ``` ## Matrix @@ -301,3 +361,28 @@ ncol(matrix_example) ```R matrix_example[2, 3] ``` + +## DataFrame + +In R `data.frame` are table type with mixed type + +```R +data_frame_example <- data.frame(numbers=1:26, letters=letters, LETTERS=LETTERS) +data_frame_example +``` + +\pause + +```R +class(data_frame_example) +nrow(data_frame_example) +ncol(data_frame_example) +names(data_frame_example) +``` + +\pause + +```R +data_frame_example[2, 3] +data_frame_example["numbers"] +``` \ No newline at end of file diff --git a/session_2/slides.Rmd b/session_2/slides.Rmd index 92e3747bd03a117aa3e9b36d595e632facd54dd4..2cd63a353438ba6f8c2c03179ea49483d52a7453 100644 --- a/session_2/slides.Rmd +++ b/session_2/slides.Rmd @@ -22,6 +22,57 @@ download.file("http://www.fueleconomy.gov/feg/epadata/vehicles.csv.zip", tmp, quiet = TRUE) unzip(tmp, exdir = "data-raw") +new_class_level <- c( + "Compact Cars", + "Large Cars", + "Midsize Cars", + "Midsize Cars", + "Midsize Cars", + "Compact Cars", + "Minivan", + "Minivan", + "Pickup Trucks", + "Pickup Trucks", + "Pickup Trucks", + "Sport Utility Vehicle", + "Sport Utility Vehicle", + "Compact Cars", + "Special Purpose Vehicle", + "Special Purpose Vehicle", + "Special Purpose Vehicle", + "Special Purpose Vehicle", + "Special Purpose Vehicle", + "Special Purpose Vehicle", + "Sport Utility Vehicle", + "Sport Utility Vehicle", + "Pickup Trucks", + "Pickup Trucks", + "Pickup Trucks", + "Pickup Trucks", + "Sport Utility Vehicle", + "Sport Utility Vehicle", + "Compact Cars", + "Two Seaters", + "Vans", + "Vans", + "Vans", + "Vans" +) +new_fuel_level <- c( + "gas", + "Diesel", + "Regular", + "gas", + "gas", + "Regular", + "Regular", + "Hybrid", + "Hybrid", + "Regular", + "Regular", + "Hybrid", + "Hybrid" +) read_csv("data-raw/vehicles.csv") %>% select( "id", @@ -50,9 +101,21 @@ read_csv("data-raw/vehicles.csv") %>% filter(drive != "") %>% drop_na() %>% arrange(make, model, year) %>% + mutate(class = factor(as.factor(class), labels = new_class_level)) %>% + mutate(fuel = factor(as.factor(fuel), labels = new_fuel_level)) %>% write_csv("2_data.csv") ``` +## R#2: introduction to Tidyverse +The goal of this practical is to familiarize yourself with `ggplot2`. + +The objectives of this session will be to: + +- Create basic plot with `ggplot2` +- Understand the `tibble` type +- Learn the different aesthetics in R plots +- Compose graphics + ## Tidyverse The tidyverse is a collection of R packages designed for data science. @@ -91,6 +154,8 @@ new_mpg <- read_csv( ) ``` +**http://perso.ens-lyon.fr/laurent.modolo/R/2_a** + ## First plot with `ggplot2` Relationship between engine size `displ` and fuel efficiency `hwy`. @@ -152,100 +217,188 @@ ggplot(data = new_mpg) + ## Aesthetic mappings +How can you explain these cars? -```{r new_mpg_plot_b, cache = TRUE, fig.width=8, fig.height=4.5} -new_mpg %>% pull(class) %>% as.factor() %>% levels() -c( - "Compact Cars", - "Large Cars", - "Midsize Cars", - "Midsize Station Wagons", - "Midsize-Large Station Wagons", - "Minicompact Cars", - "Minivan - 2WD", - "Minivan - 4WD", - "Small Pickup Trucks", - "Small Pickup Trucks 2WD", - "Small Pickup Trucks 4WD", - "Small Sport Utility Vehicle 2WD", - "Small Sport Utility Vehicle 4WD", - "Small Station Wagons", - "Special Purpose Vehicle", - "Special Purpose Vehicle 2WD", - "Special Purpose Vehicle 4WD", - "Special Purpose Vehicles", - "Special Purpose Vehicles/2wd", - "Special Purpose Vehicles/4wd", - "Sport Utility Vehicle - 2WD", - "Sport Utility Vehicle - 4WD", - "Standard Pickup Trucks", - "Standard Pickup Trucks 2WD", - "Standard Pickup Trucks 4WD", - "Standard Pickup Trucks/2wd", - "Standard Sport Utility Vehicle 2WD", - "Standard Sport Utility Vehicle 4WD", - "Subcompact Cars", - "Two Seaters", - "Vans", - "Vans Passenger", - "Vans, Cargo Type", - "Vans, Passenger Type" -) -new_class_level <- c( - "Compact Cars", - "Large Cars", - "Midsize Cars", - "Midsize Cars", - "Midsize Cars", - "Compact Cars", - "Minivan", - "Minivan", - "Pickup Trucks", - "Pickup Trucks", - "Pickup Trucks", - "Sport Utility Vehicle", - "Sport Utility Vehicle", - "Compact Cars", - "Special Purpose Vehicle", - "Special Purpose Vehicle", - "Special Purpose Vehicle", - "Special Purpose Vehicle", - "Special Purpose Vehicle", - "Special Purpose Vehicle", - "Sport Utility Vehicle", - "Sport Utility Vehicle", - "Pickup Trucks", - "Pickup Trucks", - "Pickup Trucks", - "Pickup Trucks", - "Sport Utility Vehicle", - "Sport Utility Vehicle", - "Compact Cars", - "Two Seaters", - "Vans", - "Vans", - "Vans", - "Vans" -) -new_mpg %>% pull(fuel) %>% as.factor() %>% levels() -new_fuel_level <- c( - "gas", - "Diesel", - "Regular", - "gas", - "gas", - "Regular", - "Regular", - "Hybrid", - "Hybrid", - "Regular", - "Regular", - "Hybrid", - "Hybrid" -) -new_mpg %>% - mutate(class = factor(as.factor(class), labels = new_class_level)) %>% - mutate(fuel = factor(as.factor(fuel), labels = new_fuel_level)) %>% -ggplot() + +```{r new_mpg_plot_d, echo = FALSE, cache = TRUE, fig.width=8, fig.height=4.5} +ggplot(data = mpg) + + geom_point(mapping = aes(x = displ, y = hwy)) + + geom_point(data = mpg %>% filter(class == "2seater"), + mapping = aes(x = displ, y = hwy), color = "red") +``` + +### Aesthetic mappings `color` + +```{r new_mpg_plot_e, cache = TRUE, fig.width=8, fig.height=4.5} +ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy, color = class)) +``` + + +### Aesthetic mappings + +`ggplot2` will automatically assign a unique level of the aesthetic (here a unique color) to each unique value of the variable, a process known as scaling. `ggplot2` will also add a legend that explains which levels correspond to which values. + +Try the following aesthetic: + +- `size` +- `alpha` +- `shape` + +### Aesthetic mappings `size` + +```{r new_mpg_plot_f, cache = TRUE, fig.width=8, fig.height=4.5, warning=FALSE} +ggplot(data = mpg) + + geom_point(mapping = aes(x = displ, y = hwy, size = class)) +``` + +### Aesthetic mappings `alpha` + +```{r new_mpg_plot_g, cache = TRUE, fig.width=8, fig.height=4.5, warning=FALSE} +ggplot(data = mpg) + + geom_point(mapping = aes(x = displ, y = hwy, alpha = class)) +``` + +### Aesthetic mappings `shape` + +```{r new_mpg_plot_h, cache = TRUE, fig.width=8, fig.height=4.5, warning=FALSE} +ggplot(data = mpg) + + geom_point(mapping = aes(x = displ, y = hwy, shape = class)) +``` + +### Aesthetic + +You can also set the aesthetic properties of your geom manually. For example, we can make all of the points in our plot blue: + +```{r new_mpg_plot_i, cache = TRUE, fig.width=8, fig.height=4.5} +ggplot(data = mpg) + + geom_point(mapping = aes(x = displ, y = hwy), color = "blue") +``` + +## Second challenge + +- What’s gone wrong with this code? Why are the points not blue? + +```R +ggplot(data = mpg) + + geom_point(mapping = aes(x = displ, y = hwy, color = "blue")) +``` + +- Which variables in `mpg` are **categorical**? Which variables are **continuous**? (Hint: type `mpg`) +- Map a **continuous** variable to color, size, and shape. +- What does the `stroke` aesthetic do? What shapes does it work with? (Hint: use ?geom_point) +- What happens if you map an aesthetic to something other than a variable name, like `color = displ < 5`? + +## Facets + +```{r new_mpg_plot_j, cache = TRUE, fig.width=8, fig.height=4.5} +ggplot(data = mpg) + + geom_point(mapping = aes(x = displ, y = hwy)) + + facet_wrap(~class) +``` +## Facets + +```{r new_mpg_plot_k, cache = TRUE, fig.width=8, fig.height=4.5} +ggplot(data = mpg) + + geom_point(mapping = aes(x = displ, y = hwy)) + + facet_wrap(~class, nrow = 2) +``` + +## Facets + +```{r new_mpg_plot_l, cache = TRUE, fig.width=8, fig.height=4.5} +ggplot(data = mpg) + + geom_point(mapping = aes(x = displ, y = hwy)) + + facet_wrap(~ fl + class, nrow = 2) +``` + +## Composition + +There are different ways to represent the information + +```{r new_mpg_plot_o, cache = TRUE, fig.width=8, fig.height=4.5} +ggplot(data = mpg) + + geom_point(mapping = aes(x = displ, y = hwy)) +``` + +## Composition + +There are different ways to represent the information + +```{r new_mpg_plot_p, cache = TRUE, fig.width=8, fig.height=4.5, message=FALSE} +ggplot(data = mpg) + + geom_smooth(mapping = aes(x = displ, y = hwy)) +``` + + +## Composition + +We can add as many layers as we want + +```{r new_mpg_plot_q, cache = TRUE, fig.width=8, fig.height=4.5, message=FALSE} +ggplot(data = mpg) + + geom_point(mapping = aes(x = displ, y = hwy)) + + geom_smooth(mapping = aes(x = displ, y = hwy)) +``` + + +## Composition + +We can avoid code duplication + +```{r new_mpg_plot_r, cache = TRUE, fig.width=8, fig.height=4.5, message=FALSE} +ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + + geom_point() + + geom_smooth() +``` + + +## Composition + +We can make `mapping` layer specific + +```{r new_mpg_plot_s, cache = TRUE, fig.width=8, fig.height=4.5, message=FALSE} +ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + + geom_point(mapping = aes(color = class)) + + geom_smooth() +``` + +## Composition + +We can use different `data` for different layer (You will lean more on `filter()` later) + +```{r new_mpg_plot_t, cache = TRUE, fig.width=8, fig.height=4.5, message=FALSE} +ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + + geom_point(mapping = aes(color = class)) + + geom_smooth(data = filter(mpg, class == "subcompact")) +``` + +## Fird challenge + +- Run this code in your head and predict what the output will look like. Then, run the code in R and check your predictions. +```R +ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv)) + + geom_point() + + geom_smooth(se = FALSE) +``` +**http://perso.ens-lyon.fr/laurent.modolo/R/2_d** + +- What does `show.legend = FALSE` do? +- What does the `se` argument to `geom_smooth()` do? + +## Fird challenge + +- Recreate the R code necessary to generate the following graph + +```{r new_mpg_plot_u, echo = FALSE, cache = TRUE, fig.width=8, fig.height=4.5, message=FALSE} +ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv)) + + geom_point() + + geom_smooth(mapping = aes(linetype = drv)) +``` + +## Fird challenge + +```{r new_mpg_plot_v, cache = TRUE, fig.width=8, fig.height=4.5, message=FALSE} +ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv)) + + geom_point() + + geom_smooth(mapping = aes(linetype = drv)) ``` \ No newline at end of file