-
Carine Rey authoredCarine Rey authored
- Introduction
- Tidyverse
- Toy data set mpg
- New script
- First plot with ggplot2
- Aesthetic mappings
- color mapping
- size mapping
- alpha mapping
- shape mapping
- Mapping a continuous variable to a color.
- Facets
- Composition
- Challenge !
- First challenge
- Second challenge
- Third challenge
- See you in R.3: Transformations with ggplot2
- To go further: publication ready plots
title: "R.2: introduction to Tidyverse"
author: "Laurent Modolo [laurent.modolo@ens-lyon.fr](mailto:laurent.modolo@ens-lyon.fr);\nHélène Polvèche [hpolveche@istem.fr](mailto:hpolveche@istem.fr)"
date: "2022"
output:
rmdformats::downcute:
self_contain: true
use_bookdown: true
default_style: "light"
lightbox: true
css: "../www/style_Rmd.css"
library(fontawesome)
r fa(name = "fas fa-house", fill = "grey", height = "1em")
https://can.gitbiopages.ens-lyon.fr/R_basis/
rm(list=ls())
knitr::opts_chunk$set(echo = TRUE)
knitr::opts_chunk$set(comment = NA)
klippy::klippy(
position = c('top', 'right'),
color = "white",
tooltip_message = 'Click to copy',
tooltip_success = 'Copied !')
library("tidyverse")
tmp <- tempfile(fileext = ".zip")
download.file("http://www.fueleconomy.gov/feg/epadata/vehicles.csv.zip",
tmp,
quiet = TRUE)
unzip(tmp, exdir = "data-raw")
new_class_level <- c(
"Compact Cars",
"Large Cars",
"Midsize Cars",
"Midsize Cars",
"Midsize Cars",
"Compact Cars",
"Minivan",
"Minivan",
"Pickup Trucks",
"Pickup Trucks",
"Pickup Trucks",
"Sport Utility Vehicle",
"Sport Utility Vehicle",
"Compact Cars",
"Special Purpose Vehicle",
"Special Purpose Vehicle",
"Special Purpose Vehicle",
"Special Purpose Vehicle",
"Special Purpose Vehicle",
"Special Purpose Vehicle",
"Sport Utility Vehicle",
"Sport Utility Vehicle",
"Pickup Trucks",
"Pickup Trucks",
"Pickup Trucks",
"Pickup Trucks",
"Sport Utility Vehicle",
"Sport Utility Vehicle",
"Compact Cars",
"Two Seaters",
"Vans",
"Vans",
"Vans",
"Vans"
)
new_fuel_level <- c(
"gas",
"Diesel",
"Regular",
"gas",
"gas",
"Regular",
"Regular",
"Hybrid",
"Hybrid",
"Regular",
"Regular",
"Hybrid",
"Hybrid"
)
read_csv("data-raw/vehicles.csv") %>%
select(
"id",
"make",
"model",
"year",
"VClass",
"trany",
"drive",
"cylinders",
"displ",
"fuelType",
"highway08",
"city08"
) %>%
rename(
"class" = "VClass",
"trans" = "trany",
"drive" = "drive",
"cyl" = "cylinders",
"displ" = "displ",
"fuel" = "fuelType",
"hwy" = "highway08",
"cty" = "city08"
) %>%
filter(drive != "") %>%
drop_na() %>%
arrange(make, model, year) %>%
mutate(class = factor(as.factor(class), labels = new_class_level)) %>%
mutate(fuel = factor(as.factor(fuel), labels = new_fuel_level)) %>%
write_csv("mpg.csv")
Introduction
In the last session, we have gone through the basis of R. Instead of continuing to learn more about R programming, in this session we are going to jump directly to rendering plots.
We make this choice for three reasons:
- Rendering nice plots is directly rewarding
- You will be able to apply what you learn in this session to your own data (given that they are correctly formatted)
- We will come back to R programming later, when you have all the necessary tools to visualize your results.
The objectives of this session will be to:
- Create basic plot with the
ggplot2
library
- Understand the
tibble
type - Learn the different aesthetics in R plots
- Compose complex graphics
Tidyverse
The tidyverse
package is a collection of R packages designed for data science that include ggplot2
.
All packages share an underlying design philosophy, grammar, and data structures (plus the same shape of logo).
{width=500px}tidyverse
is a meta library, which can be long to install with the following command:
install.packages("tidyverse")
Luckily for you, tidyverse
is preinstalled on your Rstudio server. So you just have to load the library
library("tidyverse")
mpg
Toy data set This dataset contains a subset of the fuel economy data that the EPA makes available on fueleconomy.gov. It contains only models which had a new release every year between 1999 and 2008.
You can use the ?
command to know more about this dataset.
?mpg
But instead of using a dataset included in a R package, you may want to be able to use any dataset with the same format.
For that we are going to use the command read_csv
which is able to read a csv file.
This command also works for file URL
new_mpg <- read_csv("./mpg.csv")
new_mpg <- read_csv(
"https://can.gitbiopages.ens-lyon.fr/R_basis/session_2/mpg.csv"
)
You can check the number of lines and columns of the data with dim
:
dim(new_mpg)
To visualize the data in Rstudio you can use the command. View
View(new_mpg)
Or by simply calling the variable.
Like for simple data type calling a variable print it.
But complex data type like new_mpg
can use complex print function.
new_mpg
Here we can see that new_mpg
is a tibble
we will come back to tibble
later.
New script
Like in the last session, instead of typing your commands directly in the console, you are going to write them in an R script.
ggplot2
First plot with We are going to make the simplest plot possible to study the relationship between two variables: the scatterplot.
The following command generates a plot between engine size displ
and fuel efficiency hwy
present the new_mpg
tibble
.
ggplot(data = new_mpg) +
geom_point(mapping = aes(x = displ, y = hwy))
ggplot2
is a system for declaratively creating graphics, based on The Grammar of Graphics. You provide the data, tell ggplot2
how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.
ggplot(data = <DATA>) +
<GEOM_FUNCTION>(mapping = aes(<MAPPINGS>))
- you begin a plot with the function
ggplot()
- you complete your graph by adding one or more layers
-
geom_point()
adds a layer with a scatterplot - each **geom **function in
ggplot2
takes amapping
argument - the
mapping
argument is always paired withaes()
Solution
```{r new_mpg_plot_b, cache = TRUE, fig.width=8, fig.height=4.5} ggplot(data = new_mpg, mapping = aes(x = hwy, y = cyl)) + geom_point() ```
Aesthetic mappings
ggplot2
will automatically assign a unique level of the aesthetic (here a unique color) to each unique value of the variable, a process known as scaling. ggplot2
will also add a legend that explains which levels correspond to which values.
Try the following aesthetic:
size
alpha
shape
color
mapping
ggplot(data = new_mpg, mapping = aes(x = displ, y = hwy, color = class)) +
geom_point()
size
mapping
ggplot(data = new_mpg, mapping = aes(x = displ, y = hwy, size = class)) +
geom_point()
alpha
mapping
ggplot(data = new_mpg, mapping = aes(x = displ, y = hwy, alpha = class)) +
geom_point()
shape
mapping
ggplot(data = new_mpg, mapping = aes(x = displ, y = hwy, shape = class)) +
geom_point()
You can also set the aesthetic properties of your geom manually. For example, we can make all of the points in our plot blue and squares:
ggplot(data = new_mpg, mapping = aes(x = displ, y = hwy)) +
geom_point(color = "blue", shape=0)
Here is a list of different shapes available in R:
{width=300px}ggplot(data = new_mpg, mapping = aes(x = displ, y = hwy, color = "blue")) +
geom_point()
Solution
```{r new_mpg_plot_blue, cache = TRUE, fig.width=8, fig.height=4.5} ggplot(data = new_mpg, mapping = aes(x = displ, y = hwy)) + geom_point(color = "blue") ```
Mapping a continuous variable to a color.
You can also map continuous variable to a color
ggplot(data = new_mpg, mapping = aes(x = displ, y = hwy, color = cyl)) +
geom_point()
Solution
```{r condiColor, cache = TRUE, fig.width=8, fig.height=4.5} ggplot(data = new_mpg, mapping = aes(x = displ, y = hwy, color = displ < 5)) + geom_point() ```
Facets
You can create multiple plots at once by faceting. For this you can use the command facet_wrap
.
This command takes a formula as input.
We will come back to formulas in R later, for now, you have to know that formulas start with a ~
symbol.
To make a scatterplot of displ
versus hwy
per car class
you can use the following code:
ggplot(data = new_mpg, mapping = aes(x = displ, y = hwy)) +
geom_point() +
facet_wrap(~class, nrow = 2)
Solution
Formulas allow you to express complex relationship between variables in R !
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point() +
facet_wrap(~ fl + class, nrow = 2)
Composition
There are different ways to represent the information :
ggplot(data = new_mpg, mapping = aes(x = displ, y = hwy)) +
geom_point()
\
ggplot(data = new_mpg, mapping = aes(x = displ, y = hwy)) +
geom_smooth()
\
We can add as many layers as we want
ggplot(data = new_mpg, mapping = aes(x = displ, y = hwy)) +
geom_point() +
geom_smooth()
\
We can make mapping
layer specific
ggplot(data = new_mpg, mapping = aes(x = displ, y = hwy)) +
geom_point(mapping = aes(color = class)) +
geom_smooth()
\
We can use different data
for different layers (you will lean more on filter()
later)
ggplot(data = new_mpg, mapping = aes(x = displ, y = hwy)) +
geom_point(mapping = aes(color = class)) +
geom_smooth(data = filter(mpg, class == "subcompact"))
Challenge !
First challenge
Second challenge
Make a plot colorizing this information
Solution
```{r new_mpg_plot_color_2seater, cache = TRUE, fig.width=8, fig.height=4.5} ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + geom_point() + geom_point(data = filter(mpg, class == "2seater"), color = "red") ```
Solution
```{r new_mpg_plot_color_2seater_fx, cache = TRUE, fig.width=8, fig.height=4.5} plot_color_2seater <- function(mpg) { ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + geom_point() + geom_point(data = filter(mpg, class == "2seater"), color = "red") } plot_color_2seater(mpg) ```
Third challenge
ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv)) +
geom_point() +
geom_smooth(mapping = aes(linetype = drv))
Solution
```{r new_mpg_plot_v, eval=F} ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv)) + geom_point() + geom_smooth(mapping = aes(linetype = drv)) ```
R.3: Transformations with ggplot2
See you inTo go further: publication ready plots
Once you have created the graph you need for your publication, you have to save it.
You can do it with the the ggsave
function.
First save your plot in a variable :
p1 <- ggplot(data = new_mpg, mapping = aes(x = displ, y = hwy, color = class)) +
geom_point()
Then save it in the wanted format:
ggsave("test_plot_1.png", p1, width = 12, height = 8, units = "cm")
ggsave("test_plot_1.pdf", p1, width = 12, height = 8, units = "cm")
You may also change the appearance of your plot by adding a theme
layer to your plot:
p1 + theme_bw()
p1 + theme_minimal()
You may have to combine several plots, for that you can use the cowplot
package which is a ggplot2
extension.
First install it :
install.packages("cowplot")
if (! require("cowplot")) {
install.packages("cowplot")
}
Then you can use the function plot
grid to combine plots in a publication ready style:
library(cowplot)
p1 <- ggplot(data = new_mpg) +
geom_point(mapping = aes(x = displ, y = hwy))
p1
p2 <- ggplot(data = new_mpg, mapping = aes(x = cty, y = hwy)) +
geom_point()
p2
plot_grid(p1, p2, labels = c('A', 'B'), label_size = 12)
You can also save it in a file.
p_final = plot_grid(p1, p2, labels = c('A', 'B'), label_size = 12)
ggsave("test_plot_1_and_2.png", p_final, width = 20, height = 8, units = "cm")
You can learn more features about cowplot
on https://wilkelab.org/cowplot/articles/introduction.html.
p1 <- ggplot(data = new_mpg, mapping = aes(x = displ, y = hwy, color = class)) +
geom_point() + theme_bw()
p2 <- ggplot(data = new_mpg, mapping = aes(x = cty, y = hwy, color = class)) +
geom_point() + theme_bw()
p_row <- plot_grid(p1 + theme(legend.position = "none"), p2 + theme(legend.position = "none"), labels = c('A', 'B'), label_size = 12)
p_legend <- get_legend(p1 + theme(legend.position = "top"))
plot_grid(p_row, p_legend, nrow = 2, rel_heights = c(1,0.2))
Solution
```{r , echo = TRUE, eval = F} p1 <- ggplot(data = new_mpg, mapping = aes(x = displ, y = hwy, color = class)) + geom_point() + theme_bw()
p2 <- ggplot(data = new_mpg, mapping = aes(x = cty, y = hwy, color = class)) + geom_point() + theme_bw()
p_row <- plot_grid(p1 + theme(legend.position = "none"), p2 + theme(legend.position = "none"), labels = c('A', 'B'), label_size = 12) p_legend <- get_legend(p1 + theme(legend.position = "top"))
p_final <- plot_grid(p_row, p_legend, nrow = 2, rel_heights = c(1,0.2)) p_final
```{r , echo = TRUE, eval = F}
ggsave("plot_1_2_and_legend.png", p_final, width = 20, height = 8, units = "cm")
There are a lot of other available ggplot2
extensions which can be useful (and also beautiful).
You can take a look at them here: https://exts.ggplot2.tidyverse.org/gallery/