Skip to content
Snippets Groups Projects
title: "R#2: introduction to Tidyverse"
author: "Laurent Modolo [laurent.modolo@ens-lyon.fr](mailto:laurent.modolo@ens-lyon.fr), Hélène Polvèche [hpolveche@istem.fr](mailto:hpolveche@istem.fr)"
date: "Mars 2020"
output:
  html_document: default
  pdf_document: default
h3 { /* Header 3 */ position: relative ; color: #729FCF ; left: 5%; } h2 { /* Header 2 */ color: darkblue ; left: 10%; } h1 { /* Header 1 */ color: #034b6f ; } #pencadre{ border:1px; border-style:solid; border-color: #034b6f; background-color: #EEF3F9; padding: 1em; text-align: center ; border-radius : 5px 4px 3px 2px; } legend{ color: #034b6f ; } #pquestion { color: darkgreen; font-weight: bold; } }
knitr::opts_chunk$set(echo = TRUE)

library(tidyverse)
# tmp <- tempfile(fileext = ".zip")
# download.file("http://www.fueleconomy.gov/feg/epadata/vehicles.csv.zip",
#               tmp,
#               quiet = TRUE)
# unzip(tmp, exdir = "data-raw")
# new_class_level <- c(
#   "Compact Cars",
#   "Large Cars",
#   "Midsize Cars",
#   "Midsize Cars",
#   "Midsize Cars",
#   "Compact Cars",
#   "Minivan",
#   "Minivan",
#   "Pickup Trucks",
#   "Pickup Trucks",
#   "Pickup Trucks",
#   "Sport Utility Vehicle",
#   "Sport Utility Vehicle",
#   "Compact Cars",
#   "Special Purpose Vehicle",
#   "Special Purpose Vehicle",
#   "Special Purpose Vehicle",
#   "Special Purpose Vehicle",
#   "Special Purpose Vehicle",
#   "Special Purpose Vehicle",
#   "Sport Utility Vehicle",
#   "Sport Utility Vehicle",
#   "Pickup Trucks",
#   "Pickup Trucks",
#   "Pickup Trucks",
#   "Pickup Trucks",
#   "Sport Utility Vehicle",
#   "Sport Utility Vehicle",
#   "Compact Cars",
#   "Two Seaters",
#   "Vans",
#   "Vans",
#   "Vans",
#   "Vans"
# )
# new_fuel_level <- c(
#   "gas",
#   "Diesel",
#   "Regular",
#   "gas",
#   "gas",
#   "Regular",
#   "Regular",
#   "Hybrid",
#   "Hybrid",
#   "Regular",
#   "Regular",
#   "Hybrid",
#   "Hybrid"
# )
# read_csv("data-raw/vehicles.csv") %>%
#   select(
#     "id",
#     "make",
#     "model",
#     "year",
#     "VClass",
#     "trany",
#     "drive",
#     "cylinders",
#     "displ",
#     "fuelType",
#     "highway08",
#     "city08"
#   ) %>% 
#   rename(
#     "class" = "VClass",
#     "trans" = "trany",
#     "drive" = "drive",
#     "cyl" = "cylinders",
#     "displ" = "displ",
#     "fuel" = "fuelType",
#     "hwy" = "highway08",
#     "cty" = "city08"
#   ) %>%
#   filter(drive != "") %>%
#   drop_na() %>% 
#   arrange(make, model, year) %>%
#   mutate(class = factor(as.factor(class), labels = new_class_level)) %>%
#   mutate(fuel = factor(as.factor(fuel), labels = new_fuel_level)) %>%
#   write_csv("2_data.csv")

The goal of this practical is to familiarize yourself with ggplot2.

The objectives of this session will be to:

  • Create basic plot with ggplot2
  • Understand the tibble type
  • Learn the different aesthetics in R plots
  • Compose graphics

Write the commands in the grey box in the terminal.

The expected results will always be printed in a white box here.

You can copy-paste but I advise you to practice writing directly in the terminal. To validate the line at the end of your command: press Return.

Tidyverse

The tidyverse is a collection of R packages designed for data science.

All packages share an underlying design philosophy, grammar, and data structures.

![](./img/tidyverse.jpg){width=500px}

\

install.packages("tidyverse")
library("tidyverse")

\

Toy data set mpg

This dataset contains a subset of the fuel economy data that the EPA makes available on http://fueleconomy.gov . It contains only models which had a new release every year between 1999 and 2008.

?mpg
mpg
dim(mpg)
View(mpg)

New script

\

\

\

First plot with ggplot2

Relationship between engine size displ and fuel efficiency hwy.

ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + 
  geom_point()

Composition of plot with ggplot2

ggplot(data = <DATA>) + 
  <GEOM_FUNCTION>(mapping = aes(<MAPPINGS>))
  • you begin a plot with the function ggplot()
  • you complete your graph by adding one or more layers
  • geom_point() adds a layer with a scatterplot
  • each geom function in ggplot2 takes a mapping argument
  • the mapping argument is always paired with aes()

\

- Make a scatterplot of `hwy` ( fuel efficiency ) vs. `cyl` ( number of cylinders ).

\

\

\

\

\

\

\

\

ggplot(data = mpg, mapping = aes(x = hwy, y = cyl)) + 
  geom_point()

\

\

Aesthetic mappings

ggplot2 will automatically assign a unique level of the aesthetic (here a unique color) to each unique value of the variable, a process known as scaling. ggplot2 will also add a legend that explains which levels correspond to which values.

Try the following aesthetic:

  • size
  • alpha
  • shape

Aesthetic mappings : color

ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = class)) + 
  geom_point()

Aesthetic mappings : size

ggplot(data = mpg, mapping = aes(x = displ, y = hwy, size = class)) + 
  geom_point()

Aesthetic mapping : alpha

ggplot(data = mpg, mapping = aes(x = displ, y = hwy, alpha = class)) + 
  geom_point()

Aesthetic mapping : shape

ggplot(data = mpg, mapping = aes(x = displ, y = hwy, shape = class)) + 
  geom_point()

\

You can also set the aesthetic properties of your geom manually. For example, we can make all of the points in our plot blue and squares:

ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + 
  geom_point(color = "blue", shape=0)

\

![](./img/shapes.png){width=300px}

\

- What’s gone wrong with this code? Why are the points not blue?
ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = "blue")) + 
  geom_point()

\

\

\

\

\

\

\

ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = "blue")) + 
  geom_point()

\

  • Map a continuous variable to color.
ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = cyl)) + 
  geom_point()
- What happens if you map an aesthetic to something other than a variable name, like `color = displ < 5`?
```{r condiColor, cache = TRUE, fig.width=8, fig.height=4.5} ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = displ < 5)) + geom_point() ```

\

\

Facets

ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + 
  geom_point() + 
  facet_wrap(~class, nrow = 2)

\

ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + 
  geom_point() + 
  facet_wrap(~ fl + class, nrow = 2)

Composition

There are different ways to represent the information :

ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + 
  geom_point()

\

ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + 
  geom_smooth()

\

We can add as many layers as we want

ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + 
  geom_point() +
  geom_smooth()

\

We can make mapping layer specific

ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + 
  geom_point(mapping = aes(color = class)) +
  geom_smooth()

\

We can use different data for different layer (You will lean more on filter() later)

ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + 
  geom_point(mapping = aes(color = class)) +
  geom_smooth(data = filter(mpg, class == "subcompact"))

Challenge !

- Run this code in your head and predict what the output will look like. Then, run the code in R and check your predictions.
```R ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv)) + geom_point() + geom_smooth(se = FALSE) ``` **http://perso.ens-lyon.fr/laurent.modolo/R/2_d**
  • What does show.legend = FALSE do?
  • What does the se argument to geom_smooth() do?

Third challenge

  • Recreate the R code necessary to generate the following graph
ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv)) + 
  geom_point() +
  geom_smooth(mapping = aes(linetype = drv))

Third challenge

ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv)) + 
  geom_point() +
  geom_smooth(mapping = aes(linetype = drv))