Skip to content
Snippets Groups Projects
Unverified Commit 76938eea authored by Laurent Modolo's avatar Laurent Modolo
Browse files

update slide for session 2

parent f9ec6ff6
No related branches found
No related tags found
No related merge requests found
......@@ -3,8 +3,6 @@ title: 'R#1: Introduction to R and RStudio'
author: "Laurent Modolo [laurent.modolo@ens-lyon.fr](mailto:laurent.modolo@ens-lyon.fr)"
date: "10 Oct 2019"
output:
slidy_presentation:
highlight: tango
beamer_presentation:
theme: metropolis
slide_level: 3
......@@ -12,6 +10,8 @@ output:
df_print: tibble
highlight: tango
latex_engine: xelatex
slidy_presentation:
highlight: tango
---
## R#1: Introduction to R and RStudio
The goal of this practical is to familiarize yourself with R and the RStudio
......@@ -318,10 +318,10 @@ Test that your `logarithm` function can work in base 10
We can also define our own function with
```R
function_name <- function(a, b){
result_1 <- operation1(a, b)
result_2 <- operation2(result_1, b)
return(result_2)
<FUNCTION_NAME> <- function(a, b){
<RESULT_1> <- <OPERATION_1>(a, b)
<RESULT_2> <- <OPERATION_2>(<RESULT_1>, b)
return(<RESULT_2>)
}
```
......
......@@ -28,14 +28,16 @@ The objectives of this session will be to:
**http://perso.ens-lyon.fr/laurent.modolo/R/session_1_a**
Press **[alt] + [shift] + k**
## Functions are also variables
We can also define our own function with
```R
function_name <- function(a, b){
result_1 <- operation1(a, b)
result_2 <- operation2(result_1, b)
return(result_2)
<FUNCTION_NAME> <- function(a, b){
<RESULT_1> <- <OPERATION_1>(a, b)
<RESULT_2> <- <OPERATION_2>(<RESULT_1>, b)
return(<RESULT_2>)
}
```
......@@ -263,20 +265,78 @@ x[5]
y <- c(a = 1, b = 2, c = 3, d = 4, e = 5)
typeof(y)
is.vector(y)
names(y)
y[1]
y["a"]
names(y) <- c("b")
```
\pause
```R
x == y
all.equal(x, y)
```
\pause
## Vector challenge
- use the `seq()` function to create a vector of even numbers
- You can concatenate vector with `c(<VECTOR_1>, <VECTOR_2>)`, concatenate a vector of integer with a vector of the first 5 letter of the alphabet.
- Check the default vectors `letters` and `LETTERS`, rewrite your previous command using them.
- Create a vector giving you the correspondence between small case letters and upper case letters.
### Vector challenge
- use the `seq()` function to create a vector of even numbers
```R
all.equal(x, y)
seq(from=2, to=10, by=2)
```
- You can concatenate vector with `c(<VECTOR_1>, <VECTOR_2>)`, concatenate a vector of integer with a vector of the first 5 letter of the alphabet. What is the type of this vector.
- Check the default vectors `letters` and `LETTERS`, rewrite your previous command using them.
- Create a vector giving you the correspondence between small case letters and upper case letters.
### Vector challenge
- use the `seq()` function to create a vector of even numbers
```R
seq(from=2, to=10, by=2)
```
- You can concatenate vector with `c(<VECTOR_1>, <VECTOR_2>)`, concatenate a vector of integer with a vector of the first 5 letter of the alphabet. What is the type of this vector.
- Check the default vectors `letters` and `LETTERS`, rewrite your previous command using them.
- Create a vector giving you the correspondence between small case letters and upper case letters.
### Vector challenge
- use the `seq()` function to create a vector of even numbers
- You can concatenate vector with `c(<VECTOR_1>, <VECTOR_2>)`, concatenate a vector of integer with a vector of the first 5 letter of the alphabet. What is the type of this vector.
```R
c(1:5, "a", "b", "c")
typeof(c(1:5, "a", "b", "c"))
```
- Check the default vectors `letters` and `LETTERS`, rewrite your previous command using them.
- Create a vector giving you the correspondence between small case letters and upper case letters.
### Vector challenge
- use the `seq()` function to create a vector of even numbers
- You can concatenate vector with `c(<VECTOR_1>, <VECTOR_2>)`, concatenate a vector of integer with a vector of the first 5 letter of the alphabet. What is the type of this vector.
- Check the default vectors `letters` and `LETTERS`, rewrite your previous command using them.
```R
c(1:5, letters[1:3])
```
- Create a vector giving you the correspondence between small case letters and upper case letters.
### Vector challenge
- use the `seq()` function to create a vector of even numbers
- You can concatenate vector with `c(<VECTOR_1>, <VECTOR_2>)`, concatenate a vector of integer with a vector of the first 5 letter of the alphabet. What is the type of this vector.
- Check the default vectors `letters` and `LETTERS`, rewrite your previous command using them.
- Create a vector giving you the correspondence between small case letters and upper case letters.
```R
rosette <- LETTERS
names(rosette) <- letters
rosette["b"]
rosette[13]
```
## Matrix
......@@ -301,3 +361,28 @@ ncol(matrix_example)
```R
matrix_example[2, 3]
```
## DataFrame
In R `data.frame` are table type with mixed type
```R
data_frame_example <- data.frame(numbers=1:26, letters=letters, LETTERS=LETTERS)
data_frame_example
```
\pause
```R
class(data_frame_example)
nrow(data_frame_example)
ncol(data_frame_example)
names(data_frame_example)
```
\pause
```R
data_frame_example[2, 3]
data_frame_example["numbers"]
```
\ No newline at end of file
......@@ -22,6 +22,57 @@ download.file("http://www.fueleconomy.gov/feg/epadata/vehicles.csv.zip",
tmp,
quiet = TRUE)
unzip(tmp, exdir = "data-raw")
new_class_level <- c(
"Compact Cars",
"Large Cars",
"Midsize Cars",
"Midsize Cars",
"Midsize Cars",
"Compact Cars",
"Minivan",
"Minivan",
"Pickup Trucks",
"Pickup Trucks",
"Pickup Trucks",
"Sport Utility Vehicle",
"Sport Utility Vehicle",
"Compact Cars",
"Special Purpose Vehicle",
"Special Purpose Vehicle",
"Special Purpose Vehicle",
"Special Purpose Vehicle",
"Special Purpose Vehicle",
"Special Purpose Vehicle",
"Sport Utility Vehicle",
"Sport Utility Vehicle",
"Pickup Trucks",
"Pickup Trucks",
"Pickup Trucks",
"Pickup Trucks",
"Sport Utility Vehicle",
"Sport Utility Vehicle",
"Compact Cars",
"Two Seaters",
"Vans",
"Vans",
"Vans",
"Vans"
)
new_fuel_level <- c(
"gas",
"Diesel",
"Regular",
"gas",
"gas",
"Regular",
"Regular",
"Hybrid",
"Hybrid",
"Regular",
"Regular",
"Hybrid",
"Hybrid"
)
read_csv("data-raw/vehicles.csv") %>%
select(
"id",
......@@ -50,9 +101,21 @@ read_csv("data-raw/vehicles.csv") %>%
filter(drive != "") %>%
drop_na() %>%
arrange(make, model, year) %>%
mutate(class = factor(as.factor(class), labels = new_class_level)) %>%
mutate(fuel = factor(as.factor(fuel), labels = new_fuel_level)) %>%
write_csv("2_data.csv")
```
## R#2: introduction to Tidyverse
The goal of this practical is to familiarize yourself with `ggplot2`.
The objectives of this session will be to:
- Create basic plot with `ggplot2`
- Understand the `tibble` type
- Learn the different aesthetics in R plots
- Compose graphics
## Tidyverse
The tidyverse is a collection of R packages designed for data science.
......@@ -91,6 +154,8 @@ new_mpg <- read_csv(
)
```
**http://perso.ens-lyon.fr/laurent.modolo/R/2_a**
## First plot with `ggplot2`
Relationship between engine size `displ` and fuel efficiency `hwy`.
......@@ -152,100 +217,188 @@ ggplot(data = new_mpg) +
## Aesthetic mappings
How can you explain these cars?
```{r new_mpg_plot_b, cache = TRUE, fig.width=8, fig.height=4.5}
new_mpg %>% pull(class) %>% as.factor() %>% levels()
c(
"Compact Cars",
"Large Cars",
"Midsize Cars",
"Midsize Station Wagons",
"Midsize-Large Station Wagons",
"Minicompact Cars",
"Minivan - 2WD",
"Minivan - 4WD",
"Small Pickup Trucks",
"Small Pickup Trucks 2WD",
"Small Pickup Trucks 4WD",
"Small Sport Utility Vehicle 2WD",
"Small Sport Utility Vehicle 4WD",
"Small Station Wagons",
"Special Purpose Vehicle",
"Special Purpose Vehicle 2WD",
"Special Purpose Vehicle 4WD",
"Special Purpose Vehicles",
"Special Purpose Vehicles/2wd",
"Special Purpose Vehicles/4wd",
"Sport Utility Vehicle - 2WD",
"Sport Utility Vehicle - 4WD",
"Standard Pickup Trucks",
"Standard Pickup Trucks 2WD",
"Standard Pickup Trucks 4WD",
"Standard Pickup Trucks/2wd",
"Standard Sport Utility Vehicle 2WD",
"Standard Sport Utility Vehicle 4WD",
"Subcompact Cars",
"Two Seaters",
"Vans",
"Vans Passenger",
"Vans, Cargo Type",
"Vans, Passenger Type"
)
new_class_level <- c(
"Compact Cars",
"Large Cars",
"Midsize Cars",
"Midsize Cars",
"Midsize Cars",
"Compact Cars",
"Minivan",
"Minivan",
"Pickup Trucks",
"Pickup Trucks",
"Pickup Trucks",
"Sport Utility Vehicle",
"Sport Utility Vehicle",
"Compact Cars",
"Special Purpose Vehicle",
"Special Purpose Vehicle",
"Special Purpose Vehicle",
"Special Purpose Vehicle",
"Special Purpose Vehicle",
"Special Purpose Vehicle",
"Sport Utility Vehicle",
"Sport Utility Vehicle",
"Pickup Trucks",
"Pickup Trucks",
"Pickup Trucks",
"Pickup Trucks",
"Sport Utility Vehicle",
"Sport Utility Vehicle",
"Compact Cars",
"Two Seaters",
"Vans",
"Vans",
"Vans",
"Vans"
)
new_mpg %>% pull(fuel) %>% as.factor() %>% levels()
new_fuel_level <- c(
"gas",
"Diesel",
"Regular",
"gas",
"gas",
"Regular",
"Regular",
"Hybrid",
"Hybrid",
"Regular",
"Regular",
"Hybrid",
"Hybrid"
)
new_mpg %>%
mutate(class = factor(as.factor(class), labels = new_class_level)) %>%
mutate(fuel = factor(as.factor(fuel), labels = new_fuel_level)) %>%
ggplot() +
```{r new_mpg_plot_d, echo = FALSE, cache = TRUE, fig.width=8, fig.height=4.5}
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
geom_point(data = mpg %>% filter(class == "2seater"),
mapping = aes(x = displ, y = hwy), color = "red")
```
### Aesthetic mappings `color`
```{r new_mpg_plot_e, cache = TRUE, fig.width=8, fig.height=4.5}
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, color = class))
```
### Aesthetic mappings
`ggplot2` will automatically assign a unique level of the aesthetic (here a unique color) to each unique value of the variable, a process known as scaling. `ggplot2` will also add a legend that explains which levels correspond to which values.
Try the following aesthetic:
- `size`
- `alpha`
- `shape`
### Aesthetic mappings `size`
```{r new_mpg_plot_f, cache = TRUE, fig.width=8, fig.height=4.5, warning=FALSE}
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, size = class))
```
### Aesthetic mappings `alpha`
```{r new_mpg_plot_g, cache = TRUE, fig.width=8, fig.height=4.5, warning=FALSE}
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, alpha = class))
```
### Aesthetic mappings `shape`
```{r new_mpg_plot_h, cache = TRUE, fig.width=8, fig.height=4.5, warning=FALSE}
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, shape = class))
```
### Aesthetic
You can also set the aesthetic properties of your geom manually. For example, we can make all of the points in our plot blue:
```{r new_mpg_plot_i, cache = TRUE, fig.width=8, fig.height=4.5}
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy), color = "blue")
```
## Second challenge
- What’s gone wrong with this code? Why are the points not blue?
```R
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, color = "blue"))
```
- Which variables in `mpg` are **categorical**? Which variables are **continuous**? (Hint: type `mpg`)
- Map a **continuous** variable to color, size, and shape.
- What does the `stroke` aesthetic do? What shapes does it work with? (Hint: use ?geom_point)
- What happens if you map an aesthetic to something other than a variable name, like `color = displ < 5`?
## Facets
```{r new_mpg_plot_j, cache = TRUE, fig.width=8, fig.height=4.5}
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_wrap(~class)
```
## Facets
```{r new_mpg_plot_k, cache = TRUE, fig.width=8, fig.height=4.5}
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_wrap(~class, nrow = 2)
```
## Facets
```{r new_mpg_plot_l, cache = TRUE, fig.width=8, fig.height=4.5}
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_wrap(~ fl + class, nrow = 2)
```
## Composition
There are different ways to represent the information
```{r new_mpg_plot_o, cache = TRUE, fig.width=8, fig.height=4.5}
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy))
```
## Composition
There are different ways to represent the information
```{r new_mpg_plot_p, cache = TRUE, fig.width=8, fig.height=4.5, message=FALSE}
ggplot(data = mpg) +
geom_smooth(mapping = aes(x = displ, y = hwy))
```
## Composition
We can add as many layers as we want
```{r new_mpg_plot_q, cache = TRUE, fig.width=8, fig.height=4.5, message=FALSE}
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
geom_smooth(mapping = aes(x = displ, y = hwy))
```
## Composition
We can avoid code duplication
```{r new_mpg_plot_r, cache = TRUE, fig.width=8, fig.height=4.5, message=FALSE}
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point() +
geom_smooth()
```
## Composition
We can make `mapping` layer specific
```{r new_mpg_plot_s, cache = TRUE, fig.width=8, fig.height=4.5, message=FALSE}
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point(mapping = aes(color = class)) +
geom_smooth()
```
## Composition
We can use different `data` for different layer (You will lean more on `filter()` later)
```{r new_mpg_plot_t, cache = TRUE, fig.width=8, fig.height=4.5, message=FALSE}
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point(mapping = aes(color = class)) +
geom_smooth(data = filter(mpg, class == "subcompact"))
```
## Fird challenge
- Run this code in your head and predict what the output will look like. Then, run the code in R and check your predictions.
```R
ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv)) +
geom_point() +
geom_smooth(se = FALSE)
```
**http://perso.ens-lyon.fr/laurent.modolo/R/2_d**
- What does `show.legend = FALSE` do?
- What does the `se` argument to `geom_smooth()` do?
## Fird challenge
- Recreate the R code necessary to generate the following graph
```{r new_mpg_plot_u, echo = FALSE, cache = TRUE, fig.width=8, fig.height=4.5, message=FALSE}
ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv)) +
geom_point() +
geom_smooth(mapping = aes(linetype = drv))
```
## Fird challenge
```{r new_mpg_plot_v, cache = TRUE, fig.width=8, fig.height=4.5, message=FALSE}
ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv)) +
geom_point() +
geom_smooth(mapping = aes(linetype = drv))
```
\ No newline at end of file
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment