title: "R#2: introduction to Tidyverse"
author: "Laurent Modolo [laurent.modolo@ens-lyon.fr](mailto:laurent.modolo@ens-lyon.fr), Hélène Polvèche [hpolveche@istem.fr](mailto:hpolveche@istem.fr)"
date: "Mars 2020"
output:
html_document: default
pdf_document: default
knitr::opts_chunk$set(echo = TRUE)
library(tidyverse)
# tmp <- tempfile(fileext = ".zip")
# download.file("http://www.fueleconomy.gov/feg/epadata/vehicles.csv.zip",
# tmp,
# quiet = TRUE)
# unzip(tmp, exdir = "data-raw")
# new_class_level <- c(
# "Compact Cars",
# "Large Cars",
# "Midsize Cars",
# "Midsize Cars",
# "Midsize Cars",
# "Compact Cars",
# "Minivan",
# "Minivan",
# "Pickup Trucks",
# "Pickup Trucks",
# "Pickup Trucks",
# "Sport Utility Vehicle",
# "Sport Utility Vehicle",
# "Compact Cars",
# "Special Purpose Vehicle",
# "Special Purpose Vehicle",
# "Special Purpose Vehicle",
# "Special Purpose Vehicle",
# "Special Purpose Vehicle",
# "Special Purpose Vehicle",
# "Sport Utility Vehicle",
# "Sport Utility Vehicle",
# "Pickup Trucks",
# "Pickup Trucks",
# "Pickup Trucks",
# "Pickup Trucks",
# "Sport Utility Vehicle",
# "Sport Utility Vehicle",
# "Compact Cars",
# "Two Seaters",
# "Vans",
# "Vans",
# "Vans",
# "Vans"
# )
# new_fuel_level <- c(
# "gas",
# "Diesel",
# "Regular",
# "gas",
# "gas",
# "Regular",
# "Regular",
# "Hybrid",
# "Hybrid",
# "Regular",
# "Regular",
# "Hybrid",
# "Hybrid"
# )
# read_csv("data-raw/vehicles.csv") %>%
# select(
# "id",
# "make",
# "model",
# "year",
# "VClass",
# "trany",
# "drive",
# "cylinders",
# "displ",
# "fuelType",
# "highway08",
# "city08"
# ) %>%
# rename(
# "class" = "VClass",
# "trans" = "trany",
# "drive" = "drive",
# "cyl" = "cylinders",
# "displ" = "displ",
# "fuel" = "fuelType",
# "hwy" = "highway08",
# "cty" = "city08"
# ) %>%
# filter(drive != "") %>%
# drop_na() %>%
# arrange(make, model, year) %>%
# mutate(class = factor(as.factor(class), labels = new_class_level)) %>%
# mutate(fuel = factor(as.factor(fuel), labels = new_fuel_level)) %>%
# write_csv("2_data.csv")
The goal of this practical is to familiarize yourself with ggplot2
.
The objectives of this session will be to:
- Create basic plot with
ggplot2
- Understand the
tibble
type - Learn the different aesthetics in R plots
- Compose graphics
Write the commands in the grey box in the terminal.
The expected results will always be printed in a white box here.
You can copy-paste
but I advise you to practice writing directly in the terminal. To validate the line at the end of your command: press Return
.
Tidyverse
The tidyverse is a collection of R packages designed for data science.
All packages share an underlying design philosophy, grammar, and data structures.
{width=500px}\
install.packages("tidyverse")
library("tidyverse")
\
mpg
Toy data set This dataset contains a subset of the fuel economy data that the EPA makes available on http://fueleconomy.gov . It contains only models which had a new release every year between 1999 and 2008.
?mpg
mpg
dim(mpg)
View(mpg)
New script
\
\
\
ggplot2
First plot with Relationship between engine size displ
and fuel efficiency hwy
.
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point()
ggplot2
Composition of plot with ggplot(data = <DATA>) +
<GEOM_FUNCTION>(mapping = aes(<MAPPINGS>))
- you begin a plot with the function
ggplot()
- you complete your graph by adding one or more layers
-
geom_point()
adds a layer with a scatterplot - each geom function in
ggplot2
takes amapping
argument - the
mapping
argument is always paired withaes()
\
\
\
\
\
\
\
\
\
ggplot(data = mpg, mapping = aes(x = hwy, y = cyl)) +
geom_point()
\
\
Aesthetic mappings
ggplot2
will automatically assign a unique level of the aesthetic (here a unique color) to each unique value of the variable, a process known as scaling. ggplot2
will also add a legend that explains which levels correspond to which values.
Try the following aesthetic:
size
alpha
shape
color
Aesthetic mappings : ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = class)) +
geom_point()
size
Aesthetic mappings : ggplot(data = mpg, mapping = aes(x = displ, y = hwy, size = class)) +
geom_point()
alpha
Aesthetic mapping : ggplot(data = mpg, mapping = aes(x = displ, y = hwy, alpha = class)) +
geom_point()
shape
Aesthetic mapping : ggplot(data = mpg, mapping = aes(x = displ, y = hwy, shape = class)) +
geom_point()
\
You can also set the aesthetic properties of your geom manually. For example, we can make all of the points in our plot blue and squares:
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point(color = "blue", shape=0)
\
{width=300px}\
ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = "blue")) +
geom_point()
\
\
\
\
\
\
\
ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = "blue")) +
geom_point()
\
- Map a continuous variable to color.
ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = cyl)) +
geom_point()
\
\
Facets
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point() +
facet_wrap(~class, nrow = 2)
\
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point() +
facet_wrap(~ fl + class, nrow = 2)
Composition
There are different ways to represent the information :
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point()
\
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_smooth()
\
We can add as many layers as we want
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point() +
geom_smooth()
\
We can make mapping
layer specific
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point(mapping = aes(color = class)) +
geom_smooth()
\
We can use different data
for different layer (You will lean more on filter()
later)
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point(mapping = aes(color = class)) +
geom_smooth(data = filter(mpg, class == "subcompact"))
Challenge !
- What does
show.legend = FALSE
do? - What does the
se
argument togeom_smooth()
do?
Third challenge
- Recreate the R code necessary to generate the following graph
ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv)) +
geom_point() +
geom_smooth(mapping = aes(linetype = drv))
Third challenge
ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv)) +
geom_point() +
geom_smooth(mapping = aes(linetype = drv))