Like in the previous sessions, it's good practice to create a new **.R** file to write your code instead of using directly the R terminal.
Like in the previous sessions, it's good practice to create a new **.R** file to write your code instead of using the R terminal directly.
# `ggplot2` statistical transformations
In the previous session, we have ploted the data as they are by using the variables values as **x** or **y** coordinates, color shade, size or transparency.
In the previous session, we have plotted the data as they are by using the variable values as **x** or **y** coordinates, color shade, size or transparency.
When dealing with categorical variables, also called **factors**, it can be interesting to perform some simple statistical transformations.
For example we may want to have coordinates on an axis proportional to the number of records for a given category.
For example, we may want to have coordinates on an axis proportional to the number of records for a given category.
We are going to use the `diamonds` data set included in `tidyverse`.
<div class="pencadre">
- Use the `help` and `View` command to explore this data set.
- How much records does this dataset contains ?
- How much records does this dataset contain ?
- Try the `str` command, which information are displayed ?
</div>
...
...
@@ -101,7 +101,7 @@ Every **geom** has a default **stat**; and every **stat** has a default **geom**
## Why **stat** ?
You might want to override the default stat.
For example in the following `demo` dataset we allready have a varible for the **counts** per `cut`.
For example, in the following `demo` dataset we already have a variable for the **counts** per `cut`.
If group is not used, the proportion is calculated with respect to the data that contains that field and is ultimately going to be 100% in any case. For instance, The proportion of an ideal cut in the ideal cut specific data will be 1.
If group is not used, the proportion is calculated with respect to the data that contains that field and is ultimately going to be 100% in any case. For instance, the proportion of an ideal cut in the ideal cut specific data will be 1.
ggplot(data = diamonds, mapping = aes(x = cut, y = depth, color = clarity)) +
geom_jitter(width = .1, height = .1)
```
</p>
</details>
In the `geom_jitter` plot that we made, we cannot really see the limits of the different clarity groups. Instead we can use the `geom_violin` to see their density.
ggplot(data = diamonds, mapping = aes(x = cut, y = depth, color = clarity)) +
geom_violin()
```
</p>
</details>
# Coordinate systems
Cartesian coordinate system where the x and y positions act independently to determine the location of each point. There are a number of other coordinate systems that are occasionally helpful.