The goal of this practical is to familiarize yourself with ggplot2
.
The objectives of this session will be to:
ggplot2
tibble
typeThe tidyverse is a collection of R packages designed for data science.
All packages share an underlying design philosophy, grammar, and data structures.
mpg
This dataset contains a subset of the fuel economy data that the EPA makes available on http://fueleconomy.gov. It contains only models which had a new release every year between 1999 and 2008.
mpg
is loaded with tidyverse, we want to be able to read our own data from http://perso.ens-lyon.fr/laurent.modolo/R/2_data.csv
ggplot2
Relationship between engine size displ
and fuel efficiency hwy
.
ggplot2
Composition of plot with ggplot2
ggplot()
geom_point()
adds a layer with a scatterplotggplot2
takes a mapping
argumentmapping
argument is always paired with aes()
ggplot(data = new_mpg)
. What do you see?new_mpg
? How many columns?cty
variable describe? Read the help for ?mpg
to find out.hwy
vs. cyl
.class
vs. drive
? Why is the plot not useful?new_mpg
? How many columns?## # A tibble: 40,440 x 12
## id make model year class trans drive cyl displ fuel hwy cty
## <dbl> <chr> <chr> <dbl> <chr> <chr> <chr> <dbl> <dbl> <chr> <dbl> <dbl>
## 1 13309 Acura 2.2CL… 1997 Subc… Auto… Fron… 4 2.2 Regu… 26 20
## 2 13310 Acura 2.2CL… 1997 Subc… Manu… Fron… 4 2.2 Regu… 28 22
## 3 13311 Acura 2.2CL… 1997 Subc… Auto… Fron… 6 3 Regu… 26 18
## 4 14038 Acura 2.3CL… 1998 Subc… Auto… Fron… 4 2.3 Regu… 27 19
## 5 14039 Acura 2.3CL… 1998 Subc… Manu… Fron… 4 2.3 Regu… 29 21
## 6 14040 Acura 2.3CL… 1998 Subc… Auto… Fron… 6 3 Regu… 26 17
## 7 14834 Acura 2.3CL… 1999 Subc… Auto… Fron… 4 2.3 Regu… 27 20
## 8 14835 Acura 2.3CL… 1999 Subc… Manu… Fron… 4 2.3 Regu… 29 21
## 9 14836 Acura 2.3CL… 1999 Subc… Auto… Fron… 6 3 Regu… 26 17
## 10 11789 Acura 2.5TL 1995 Comp… Auto… Fron… 5 2.5 Prem… 23 18
## # … with 40,430 more rows
hwy
vs. cyl
.class
vs. drive
?Why is the plot not useful?
How can you explain these cars?
color
ggplot2
will automatically assign a unique level of the aesthetic (here a unique color) to each unique value of the variable, a process known as scaling. ggplot2
will also add a legend that explains which levels correspond to which values.
Try the following aesthetic:
size
alpha
shape
size
alpha
shape
You can also set the aesthetic properties of your geom manually. For example, we can make all of the points in our plot blue:
mpg
are categorical? Which variables are continuous? (Hint: type mpg
)stroke
aesthetic do? What shapes does it work with? (Hint: use ?geom_point)color = displ < 5
?ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_wrap(~ fl + class, nrow = 2)
There are different ways to represent the information
There are different ways to represent the information
We can add as many layers as we want
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
geom_smooth(mapping = aes(x = displ, y = hwy))
We can avoid code duplication
We can make mapping
layer specific
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point(mapping = aes(color = class)) +
geom_smooth()
We can use different data
for different layer (You will lean more on filter()
later)
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point(mapping = aes(color = class)) +
geom_smooth(data = filter(mpg, class == "subcompact"))
ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv)) +
geom_point() +
geom_smooth(se = FALSE)
http://perso.ens-lyon.fr/laurent.modolo/R/2_d
show.legend = FALSE
do?se
argument to geom_smooth()
do?