@@ -49,42 +49,59 @@ Like in the previous sessions, it's good practice to create a new **.R** file to
# `ggplot2` statistical transformations
In the previous session, we have ploted the data as they are by using the variables values as **x** or **y** coordinates, color shade, size or transparency.
When dealing with categorical variables, also called **factors**, it can be interesting to perform some simple statistical transformations.
For example we may want to have coordinates on an axis proportional to the number of records for a given category.
We are going to use the `diamonds` data set included in `tidyverse`.
- Use the `help` and `view` command to explore this data set.
<div class="pencadre">
- Use the `help` and `View` command to explore this data set.
- How much records does this dataset contains ?
- Try the `str` command, which information are displayed ?
</div>
```{r str_diamon}
str(diamonds)
```
We saw scatterplot (`geom_point()`), smoothplot (`geom_smooth()`). Now barplot with `geom_bar()` :
More diamonds are available with high quality cuts.
On the x-axis, the chart displays cut, a variable from diamonds. On the y-axis, it displays count, but count is not a variable in diamonds!
On the x-axis, the chart displays **cut**, a variable from diamonds. On the y-axis, it displays **count**, **but count is not a variable in diamonds!**
The algorithm used to calculate new values for a graph is called a **stat**, short for statistical transformation. The figure below describes how this process works with `geom_bar()`.

## **geom** and **stat**
The algorithm used to calculate new values for a graph is called a **stat**, short for statistical transformation.
The figure below describes how this process works with `geom_bar()`.
You can generally use geoms and stats interchangeably. For example, you can recreate the previous plot using `stat_count()` instead of `geom_bar()`:

You can generally use **geoms** and **stats** interchangeably. For example, you can recreate the previous plot using `stat_count()` instead of `geom_bar()`:
Every geom has a default stat; and every stat has a default geom. This means that you can typically use geoms without worrying about the underlying statistical transformation. There are three reasons you might need to use a stat explicitly:
Every **geom** has a default **stat**; and every **stat** has a default **geom**. This means that you can typically use **geoms** without worrying about the underlying statistical transformation. There are three reasons you might need to use a **stat** explicitly:
## Why **stat** ?
- You might want to override the default stat.
You might want to override the default stat.
For example in the following `demo` dataset we allready have a varible for the **counts** per `cut`.
If group is not used, the proportion is calculated with respect to the data that contains that field and is ultimately going to be 100% in any case. For instance, The proportion of an ideal cut in the ideal cut specific data will be 1.
</p>
</details>
## More details with `stat_summary`
- You might want to draw greater attention to the statistical transformation in your code.
you might use stat_summary(), which summarises the y values for each unique x
value, to draw attention to the summary that you are computing:
<div class="pencadre">
You might want to draw greater attention to the statistical transformation in your code.
you might use `stat_summary()`, which summarize the **y** values for each unique **x**
value, to draw attention to the summary that you are computing