@@ -503,7 +503,7 @@ First let's create a thiner dataset to work on `flights_thin` that contains
...
@@ -503,7 +503,7 @@ First let's create a thiner dataset to work on `flights_thin` that contains
- the `distance` and `air_time` columns
- the `distance` and `air_time` columns
- the `dep_time` and `sched_dep_time` columns
- the `dep_time` and `sched_dep_time` columns
Then let's create an even smaller dataset as toy dataset to test your commands before using them on the large dataset (It a good reflex to take). For that you can use the function `head`
Then let's create an even smaller dataset as toy dataset to test your commands before using them on the large dataset (It a good reflex to take). For that you can use the function `head` or `sample_n` for a more random sampling.
- select only 5 rows
- select only 5 rows
...
@@ -516,6 +516,7 @@ Then let's create an even smaller dataset as toy dataset to test your commands b
...
@@ -516,6 +516,7 @@ Then let's create an even smaller dataset as toy dataset to test your commands b
- Offsets: `lead()` and `lag()` allow you to refer to leading or lagging values. This allows you to compute running differences (e.g. `x - lag(x)`) or find when values change (`x != lag(x)`).
- Offsets: lead(x) and lag(x) allow you to refer to the previous or next values of the column x. This allows you to compute running differences (e.g. `x - lag(x)`) or find when values change (`x != lag(x)`).
- Cumulative and rolling aggregates: R provides functions for running sums, products, mins and maxes: `cumsum()`, `cumprod()`, `cummin()`, `cummax()`; and dplyr provides `cummean()` for cumulative means.
- R provides functions for running cumulative sums, products, mins and maxes: `cumsum()`, `cumprod()`, `cummin()`, `cummax()`; and dplyr provides `cummean()` for cumulative means.
- Logical comparisons, `<`, `<=`, `>`, `>=`, `!=`, and `==`
- Logical comparisons, `<`, `<=`, `>`, `>=`, `!=`, and `==`
- Ranking: there are a number of ranking functions, but you should start with `min_rank()`. There is also `row_number()`, `dense_rank()`, `percent_rank()`, `cume_dist()`, `ntile()`
- Ranking: there are a number of ranking functions, the most frequently used being min_rank(). They differ by the way ties are treated, etc. Try ?mutate, ?min_rank, ?rank, for more information.
## See you in [R#5: Pipping and grouping](https://can.gitbiopages.ens-lyon.fr/R_basis/session_5.html)
## See you in [R#5: Pipping and grouping](https://can.gitbiopages.ens-lyon.fr/R_basis/session_5.html)
# To go further: Data transformation and color sets.
# To go further: Data transformation and color sets.
...
@@ -718,13 +730,21 @@ Open the csv file using the `read_csv2()` function. The file is located at "http
...
@@ -718,13 +730,21 @@ Open the csv file using the `read_csv2()` function. The file is located at "http
<p>
<p>
Download the Expression_matrice_pivot_longer_DEGs_GSE86356.csv file and save it in your working directory.
Download the Expression_matrice_pivot_longer_DEGs_GSE86356.csv file and save it in your working directory.
You may have to set you working directory using `setwd()`
We want to see the top10 DEGs on the graph. For this, we will use the package `ggrepel`.
We want to see the top10 DEGs on the graph. For this, we will use the package `ggrepel`.
Install and load the `ggrepl` package.
Install and load the `ggrepel` package.
<details><summary>Solution</summary>
<details><summary>Solution</summary>
<p>
<p>
...
@@ -857,19 +868,26 @@ library(ggrepel)
...
@@ -857,19 +868,26 @@ library(ggrepel)
</p>
</p>
</details>
</details>
Let s **filter** our table into a new variable, top10, to keep only the top 10 according to the adjusted pvalue. The **smaller** the adjusted pvalue, the more significant.
Let's **filter** out table into a new variable, top10, to keep only the significant differentialy expressed genes with the top 10 adjusted pvalue. The **smaller** the adjusted pvalue, the more significant.
**Tips :** You can use the [function](https://dplyr.tidyverse.org/reference/slice.html) `slice_min()`
**Tips :** You can use the [function](https://dplyr.tidyverse.org/reference/slice.html) `slice_min()`
<details><summary>Solution</summary>
<details><summary>Solution</summary>
<p>
<p>
```{r top10_1}
(top10 <- arrange(tab.sig, desc(sig), padj))
(top10 <- mutate(top10, row_N = row_number()))
(top10 <- filter(top10, row_N <= 10))
```
```{r top10}
```{r top10}
top10 <- tab.sig %>%
(top10 <- filter(tab.sig, sig == TRUE))
filter(sig == TRUE) %>%
(top10 <- slice_min(top10, padj, n = 10))
slice_min(n = 10, padj)
```
```
</p>
</p>
</details>
</details>
The data is ready to be used to make a volcano plot!
The data is ready to be used to make a volcano plot!
...
@@ -878,9 +896,11 @@ The data is ready to be used to make a volcano plot!
...
@@ -878,9 +896,11 @@ The data is ready to be used to make a volcano plot!
To make the graph below, use `ggplot2`, the functions `geom_point()`, `geom_hline()`, `geom_vline()`, `theme_minimal()`, `theme()` (to remove the legend), `geom_label_repel()` and the function `scale_color_manual()` for the colors.
To make the graph below, use `ggplot2`, the functions `geom_point()`, `geom_hline()`, `geom_vline()`, `theme_minimal()`, `theme()` (to remove the legend), `geom_label_repel()` and the function `scale_color_manual()` for the colors.
</div>
</div>
**Tips 1 :** Don t forget the transformation of the adjusted pvalue.
**Tips 2 :** Feel free to search your favorite Web browser for help.
- **Tips 1 :** Don t forget the transformation of the adjusted pvalue.
**Tips 3 :** `geom_label_repel()` function needs a new parameter 'data' and 'label' in aes parameters.
- **Tips 2 :** Feel free to search your favorite Web browser for help.
- **Tips 3 :** `geom_label_repel()` function needs a new parameter 'data' and 'label' in aes parameters.
```{r VolcanoPlotDemo, echo = FALSE}
```{r VolcanoPlotDemo, echo = FALSE}
ggplot(tab.sig, aes(x = log2FoldChange, y = -log10(padj), color = UpDown)) +
ggplot(tab.sig, aes(x = log2FoldChange, y = -log10(padj), color = UpDown)) +
...
@@ -889,9 +909,7 @@ ggplot(tab.sig, aes(x = log2FoldChange, y = -log10(padj), color = UpDown)) +
...
@@ -889,9 +909,7 @@ ggplot(tab.sig, aes(x = log2FoldChange, y = -log10(padj), color = UpDown)) +