session_4.Rmd

title: "R.4: data transformation"
author: "Laurent Modolo [laurent.modolo@ens-lyon.fr](mailto:laurent.modolo@ens-lyon.fr), Hélène Polvèche [hpolveche@istem.fr](mailto:hpolveche@istem.fr)"
date: "2021"
output:
  rmdformats::downcute:
    self_contain: true
    use_bookdown: true
    default_style: "dark"
    lightbox: true
    css: "http://perso.ens-lyon.fr/laurent.modolo/R/src/style.css"
rm(list=ls())
knitr::opts_chunk$set(echo = TRUE)
knitr::opts_chunk$set(comment = NA)
klippy::klippy(
  position = c('top', 'right'),
  color = "white",
  tooltip_message = 'Click to copy',
  tooltip_success = 'Copied !')
library("tidyverse")
library("nycflights13")
flights
filter(flights, month == 1, day == 1)
(dec25 <- filter(flights, month == 12, day == 25))
filter(flights, month == 11 | month == 12)
filter(flights, month %in% c(11, 12))
filter(flights, !(arr_delay > 120 | dep_delay > 120))
filter(flights, arr_delay <= 120, dep_delay <= 120)
NA > 5
10 == NA
NA + 10
is.na(NA)
df <- tibble(x = c(1, NA, 3))
filter(df, x > 1)
filter(df, is.na(x) | x > 1)
arrange(flights, year, month, day)
arrange(tibble(x = c(5, 2, NA)), x)
arrange(tibble(x = c(5, 2, NA)), desc(x))
select(flights, year, month, day)
select(flights, year:day)
select(flights, -(year:day))
mutate(tbl, new_var_a = opperation_a, ..., new_var_n = opperation_n)
mutate(flights_sml, gain = dep_delay - arr_delay)
install.packages(c("ghibli", "RColorBrewer", "viridis"))
library(tidyverse)

library(RColorBrewer)
library(ghibli)
library(viridis)
ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = class)) +
  geom_point()
ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = class)) +
  geom_point() +
  scale_colour_ghibli_d("MononokeMedium")
display.brewer.all(colorblindFriendly = TRUE)
  </p>
</details>

With this tibble, use `ggplot2` and the `geom_tile()` function to make a heatmap.
Fit the samples on the x-axis and the genes on the y-axis.

**Tip:** Transform the counts into log10(x + 1) for a better visualization.

<details><summary>Solution</summary>
  <p>
```{r heatmap1}
ggplot(expr_DM1, aes(samples, Genes, fill= log1p(counts))) +
  geom_tile() +
  labs(y="Genes", x = "Samples") +
  theme(
    axis.text.y = element_text(size= 4),
    axis.text.x = element_text(size = 4, angle = 90)
  )
  </p>
</details>

To make a Volcano plot, displaying different information about the significativity of the variation thanks to the colors, we will have to make a series of modifications on this table.

With `mutate()` and `ifelse()` [fonctions](https://dplyr.tidyverse.org/reference/if_else.html), we will have to create :

- a column 'sig' : it indicates if the gene is significant ( TRUE or FALSE ).
**Thresholds :** baseMean > 20 and  padj < 0.05 and abs(log2FoldChange) >= 1.5

- a column 'UpDown' : it indicates if the gene is Up regulated or Down regulated.

<details><summary>Solution</summary>
  <p>
```{r sig}
tab.sig <- tab %>%
  mutate(sig = baseMean > 20 & padj < 0.05 & abs(log2FoldChange) >= 1.5 ) %>%
  mutate(UpDown = ifelse(
                        baseMean > 20 & padj < 0.05 & log2FoldChange >= 1.5,
                        "Up",
                         ifelse(
                           baseMean > 20 & padj < 0.05 & log2FoldChange <= -1.5,
                           "Down",
                           "NO"
                          )))

tab.sig
ggplot(tab.sig, aes(x = log2FoldChange, y = -log10(padj), col = UpDown)) +
  geom_point() +
  scale_color_manual(values=c("steelblue", "lightgrey", "firebrick" )) +
  geom_hline(yintercept=-log10(0.05), col="black") +
  geom_vline(xintercept=c(-1.5, 1.5), col="black") +
  theme_minimal() +
  theme(
    legend.position="none"
  ) +
  labs(y="-log10(p-value)", x = "log2(FoldChange)") +
  geom_label_repel(data = top10, mapping = aes(label = gene_symbol))

  </p>
</details>