In the last session, we have gone through the basis of R.
Instead of continuing to learn more about R programming, in this session we are going to jump directly to rendering plots.
We make this choice for three reasons:
- Rendering nice plots is direclty rewarding
- You will be able to apply what you learn in this session to your own data (given that they are *correctly formated*)
- We will come back to R programming later, when you have all the necessary tools to visualize your results
The goal of this practical is to familiarize yourself with `ggplot2`.
The objectives of this session will be to:
- Create basic plot with `ggplot2`
- Create basic plot with the `ggplot2` `library`
- Understand the `tibble` type
- Learn the different aesthetics in R plots
- Compose graphics
<div id='pencadre'>
**Write the commands in the grey box in the terminal.**
**The expected results will always be printed in a white box here.**
**You can `copy-paste` but I advise you to practice writing directly in the terminal. To validate the line at the end of your command: press `Return`.**
</div>
- Compose complex graphics
## Tidyverse
The tidyverse is a collection of R packages designed for data science.
The `tidyverse` is a collection of R packages designed for data science that include `ggplot2`.
All packages share an underlying design philosophy, grammar, and data structures.
All packages share an underlying design philosophy, grammar, and data structures (plus the same shape of logo).
<center>
{width=500px}
</center>
\
`tidyverse` is a meta library, which can be long to install with the following command:
```R
install.packages("tidyverse")
```
```R
Luckily for your `tidyverse` is preinstalled on your Rstudio server. So you just have to load the ` library`
```{R load_tidyverse}
library("tidyverse")
```
\
### Toy data set `mpg`
This dataset contains a subset of the fuel economy data that the EPA makes available on http://fueleconomy.gov . It contains only models which had a new release every year between 1999 and 2008.
This dataset contains a subset of the fuel economy data that the EPA makes available on [fueleconomy.gov](http://fueleconomy.gov).
It contains only models which had a new release every year between 1999 and 2008.
You can use the `?` command to know more about this dataset.
```{r mpg_inspect, include=TRUE}
?mpg
mpg
```
```{r mpg_inspect2, include=TRUE}
dim(mpg)
```
But instead of using a dataset included in a R package, you may want to be able to use any dataset with the same format.
For that we are going to use the command `read_csv` which is able to read a [csv](https://en.wikipedia.org/wiki/Comma-separated_values) file.
`ggplot2` is a system for declaratively creating graphics, based on [The Grammar of Graphics](https://www.amazon.com/Grammar-Graphics-Statistics-Computing/dp/0387245448/ref=as_li_ss_tl). You provide the data, tell `ggplot2` how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.
```
ggplot(data = <DATA>) +
...
...
@@ -239,43 +217,27 @@ ggplot(data = <DATA>) +
- each geom function in `ggplot2` takes a `mapping` argument
- the `mapping` argument is always paired with `aes()`
\
<div id="pquestion"> - Make a scatterplot of `hwy` ( fuel efficiency ) vs. `cyl` ( number of cylinders ). </div>
\
\
\
\
\
\
\
\
<div class="pencadre">
Make a scatterplot of `hwy` ( fuel efficiency ) vs. `cyl` ( number of cylinders ).
`ggplot2` will automatically assign a unique level of the aesthetic (here a unique color) to each unique value of the variable, a process known as scaling. `ggplot2` will also add a legend that explains which levels correspond to which values.