update dea

faa9be07 · Laurent Modolo · 52a1d388 · faa9be07 · 52a1d388 · faa9be07
Verified Commit faa9be07 authored 2 years ago by Laurent Modolo
--- a/6_dea/dea.Rmd
+++ b/6_dea/dea.Rmd
@@ -26,7 +26,7 @@ classoption: aspectratio=169
 - Differential expression analysis between groups
 - Regression analysis
 - Multiple testing
- - Multivariate Differential expression analysis
+ - Multivariate differential expression analysis
 # Hypothesis testing
@@ -38,7 +38,7 @@ classoption: aspectratio=169
  We reject the hypothesis at risk $\alpha$, the probability that the null hypothesis was true for the observed value.
 ### $p$-value
-  The $p$-value is the probability to observe a value as or more extreme under the null hypothesis model.
+  the $p$-value is the probability to observe a value as or more extreme under the null hypothesis model.
 ## Hypothesis testing
@@ -162,7 +162,7 @@ receiver operating characteristic
 \end{tikzpicture}
 \end{center}
 \column{0.5\textwidth}
-For a given gene $x_i$ we can test:
+For a given gene $x_i$ we can test
 \vspace{1em}
 \begin{itemize}
  \item $H_0$: $E\left(x_i\right) = E\left(x_{i'}\right)$
@@ -206,7 +206,7 @@ $P(X = x)$ for $\mathcal{NB}(\lambda, \alpha = 1)$
 \includegraphics[width=0.7\textwidth]{img/NB_sigma_1.png}
 \end{center}
-## Non-parametric approaches
+## Nonparametric approaches
 ### We don't try to model the data distribution
@@ -214,16 +214,16 @@ Instead we work with:
 - ranks of the values 
 - the sign of the difference between two groups (Wilcoxon)
- the distribution of differances
+- the distribution of differences
-If we know the distribution the parametric approach is often more powerfull
+If we know the distribution, the parametric approach is often more powerful.
 ### Often limited to the 2 groups setting
 ## Wilcoxon rank sum test
-### $H_0$: the median are equal
+### $H_0$: the medians are equal
 \begin{center}
  \href{https://www.nature.com/articles/s41467-021-27464-5}{
@@ -233,7 +233,7 @@ If we know the distribution the parametric approach is often more powerfull
 ## WaddR
-### Base on 2-Wasserstein distance
+### Based on 2-Wasserstein distance
 \begin{center}
  \href{https://pubmed.ncbi.nlm.nih.gov/33792651/}{
@@ -241,7 +241,7 @@ If we know the distribution the parametric approach is often more powerfull
  }
 \end{center}
-## Model based approaches
+## Model-based approaches
 \begin{center}
 \begin{columns}
@@ -282,14 +282,14 @@ X \sim \pi \delta_0 + \left(1 - \pi\right) \mathcal{NB}(\lambda, \alpha)
 \end{center}
-## Model based approaches
+## Model-based approaches
 ### NB distributed counts with excess of zeros
 \begin{center}
  \includegraphics[width=0.8\textwidth]{img/ziNB_1}
 \end{center}
-## Model based approaches
+## Model-based approaches
 ### Mixture of two NB distributions
@@ -297,9 +297,32 @@ X \sim \pi \delta_0 + \left(1 - \pi\right) \mathcal{NB}(\lambda, \alpha)
  \includegraphics[width=0.8\textwidth]{img/ziNB_2}
 \end{center}
-## Model based approaches
-### $y = \beta_0 + \beta_1 x$
+## Model-based approaches
+### GLM framework
+\[
+X_i \sim \mathcal{NB}(\lambda, \alpha)
+\]
+\[E(X_i|\mathbf{Y}) = \boldsymbol{\mu}_i = g^{-1}(\mathbf{Y}\boldsymbol{\beta})\]
+with :
+\begin{itemize}
+  \item $\boldsymbol{\mu}_i$ the mean of the gene $i$ distribution
+  \item $g$ is the link function
+  \item $\beta$ the unknown parameters of the model
+\end{itemize}
+\[E(X_i|\mathbf{Y}) = \boldsymbol{\mu}_i = g^{-1}(Y_1 \beta_1 + \dots Y_n \beta_n)\]
+### We can also model the variance as a function of the mean 
+\[ Var(X_i|\mathbf{Y}) = V( \boldsymbol{\mu}_i ) = \operatorname{V}(g^{-1}(\mathbf{X}\boldsymbol{\beta})).\]
+## Model-based approaches
+### $\boldsymbol{\mu}_i = \beta_0 + \beta_1 y_1$
 \begin{center}
 \includegraphics[width=0.7\textwidth]{img/lm_2_groups_b0_3_b1_05.png}
@@ -307,19 +330,19 @@ X \sim \pi \delta_0 + \left(1 - \pi\right) \mathcal{NB}(\lambda, \alpha)
 $\beta_0 = 3$, $\beta_1 = 0.5$
-## Model based approaches
+## Model-based approaches
-### $y = \beta_0 + \beta_1 x$
+### $\boldsymbol{\mu}_i = \beta_0 + \beta_1 y_1$
 ### Wald test:
 \[H_0: \beta_1 = 0\]
 ### Likelihood ratio test (LTR)
-\[H_0: L\left(y = \beta_0\right) = L\left(y = \beta_0 + \beta_1 x\right)\]
+\[H_0: L\left(\boldsymbol{\mu}_i = \beta_0\right) = L\left(\boldsymbol{\mu}_i = \beta_0 + \beta_1 y_1\right)\]
-## Model based approaches
+## Model-based approaches
-### $y = \beta_0 + \beta_1 x$
+### $\boldsymbol{\mu}_i = \beta_0 + \beta_1 y_1$
 \begin{center}
 \includegraphics[width=0.7\textwidth]{img/lm_b0_3_b1_05.png}
@@ -327,62 +350,83 @@ $\beta_0 = 3$, $\beta_1 = 0.5$
 $\beta_0 = 3$, $\beta_1 = 0.5$
-## Model based approaches
+## Model-based approaches
-### $y = \beta_0 + \beta_1 x$
+### $\boldsymbol{\mu}_i = \beta_0 + \beta_1 y_1$
 \begin{center}
  \href{https://cole-trapnell-lab.github.io/monocle3/}{
-    \includegraphics[width=0.7\textwidth]{img/deg_pseudotime.png}
+    \includegraphics[width=0.6\textwidth]{img/deg_pseudotime.png}
  }
 \end{center}
-## Model based approaches
+## Model-based approaches
-### $y = \beta_0 + \beta_1 x_1 + \beta_2 x_2$
+### $\boldsymbol{\mu}_i = \beta_0 + \beta_1 y_1 + \beta_2 y_2$
 \begin{center}
 \includegraphics[width=0.7\textwidth]{img/lm_2_groups_b0_b0_3_b1_05.png}
 \end{center}
-$\beta_0 = 3$, $\beta_1 = 0.5$ $\beta_2 = 5$
+$\beta_0 = 3$, $\beta_1 = 0.5$, $\beta_2 = 5$
+## Model-based approaches
+### $\boldsymbol{\mu}_i = \beta_0 + \beta_1 y_1 + \beta_2 y_2$
+\begin{center}
+  \href{https://www.sciencedirect.com/science/article/pii/S2211124721005192}{
+    \includegraphics[width=0.9\textwidth]{img/deg_time_group.png}
+  }
+\end{center}
-## Model based approaches
+## Model-based approaches
-### $y = \beta_0 + \beta_1 x_1 + \beta_2 x_2$
+### $\boldsymbol{\mu}_i = \beta_0 + \beta_1 y_1 + \beta_2 y_2 + \beta_3 y_1 y_2$
 \begin{center}
 \includegraphics[width=0.7\textwidth]{img/lm_2_groups_b0_b0_3_b1_05_interaction.png}
 \end{center}
-$\beta_0 = 3$, $\beta_1 = 0.5$ $\beta_2 = 5$
+$\beta_0 = 3$, $\beta_1 = 0.5$, $\beta_2 = 5$, $\beta_3 = -0.4$
-## Model based approaches
+## Model-based approaches
-### $y = \beta_0 + \beta_1 x_1 + \beta_2 x_2$
+### $\boldsymbol{\mu}_i = \beta_0 + \beta_1 y_1 + \beta_2 y_2 + \beta_3 y_1 y_2$
 \begin{center}
 \includegraphics[width=0.7\textwidth]{img/lm_2_groups_2_factors_b0_b0_3_b1_05_interaction.png}
 \end{center}
-$\beta_0 = 3$, $\beta_1 = 0.5$ $\beta_2 = 5$
+$\beta_0 = 3$, $\beta_1 = 0.5$, $\beta_2 = 5$, $\beta_3 = -0.4$
-## Model based approaches
+## Model-based approaches
-### $y = \beta_0 + \beta_1 x_1 + \beta_2 x_2$
+### $\boldsymbol{\mu}_i = \beta_0 + \beta_1 y_1 + \beta_2 y_2$
 \begin{center}
-  href{doi: 10.1093/nar/gky675}{
+  \href{doi: 10.1093/nar/gky675}{
-    \includegraphics[width=0.6\textwidth]{img/deg_time_group.png}
+    \includegraphics[width=0.35\textwidth]{img/deg_time_group_inter.png}
+  }
+\end{center}
+## Model-based approaches
+### $\boldsymbol{\mu}_i = \beta_0 + \beta_1 y_1 + \beta_2 Z$
+$Z \sim \mathcal{N}(\mu_z, \sigma_z)$
+\begin{center}
+  \href{https://www.sciencedirect.com/science/article/pii/S2211124721005192}{
+    \includegraphics[width=0.35\textwidth]{img/deg_time_mixed.png}
  }
 \end{center}
 # Multiple hypotheses testing
-## Multiple hypotheses problem
+## Multiple hypothesis problem
 \begin{center}
-  \only<1>{\includegraphics[width=10cm]{img/dnorm_abs}\\[-2.5em]}
+  \only<1>{\includegraphics[width=10cm]{img/pval_2_0.05}}
-  \only<1>{\includegraphics[width=10cm]{img/pval_alpha}}
  \only<2>{\includegraphics[width=10cm]{img/pval_alpha_random_H0_1}
  \begin{center}
    n = 10
@@ -426,7 +470,7 @@ $\beta_0 = 3$, $\beta_1 = 0.5$ $\beta_2 = 5$
 \end{center}
-## Multiple hypotheses solutions
+## Multiple hypothesis solutions
 \begin{block}{Family Wise Error Rate (FWER)}
  \begin{itemize}
@@ -438,13 +482,13 @@ $\beta_0 = 3$, $\beta_1 = 0.5$ $\beta_2 = 5$
 \end{block}
 \begin{example}
  \begin{center}
-    \emph{``We reject 14 hypothesis with a FWER of 0.05''}
+    \emph{``We reject 14 hypotheses with a FWER of 0.05''}
-    \emph{``We reject 14 hypothesis at a level of 0.05 after Bonferoni correction''}
+    \emph{``We reject 14 hypotheses at a level of 0.05 after Bonferoni correction''}
  \end{center}
  Means: 14 hypotheses are not following the null distribution and we make this statement with a probability 0.05 of having fewer than one false positives in the 14 tests.
 \end{example}
-## Multiple hypotheses solutions
+## Multiple hypothesis solutions
 \begin{block}{False Discovery Rate (FDR)}
  \begin{itemize}
@@ -456,26 +500,28 @@ $\beta_0 = 3$, $\beta_1 = 0.5$ $\beta_2 = 5$
 \end{block}
 \only<1>{
 \vspace{2em}
+\begin{center}
 \begin{tabular}{l|ccc}
-  hypothesis & Claimed non-significant & Claimed significant & Total\\
+  hypothesis & Claimed nonsignificant & Claimed significant & Total\\
  \hline
  Null & TN & FP & $m_0$\\
  Non-null & FN & TP & $m_1$\\
  Total & S & R & $m$
 \end{tabular}
+\end{center}
 }
 \only<2-3>{
  \begin{example}
    \begin{center}
-      \emph{``We reject 254 hypothesis with a FDR of 0.05''}
+      \emph{``We reject 254 hypotheses with a FDR of 0.05''}
-      \emph{``We reject 254 hypothesis with a level of 0.05 after BH correction''}
+      \emph{``We reject 254 hypotheses with a level of 0.05 after BH correction''}
    \end{center}
    Means: 254 hypotheses are not following the null distribution and we expect on average 5\% or less of false positives in the 254.
  \end{example}
 }
 \only<3>{
  \begin{center}
-    The number of FPs increases with the number of TPs
+    {\bf The number of FPs increases with the number of TPs}
  \end{center}
 }
@@ -485,18 +531,18 @@ $\beta_0 = 3$, $\beta_1 = 0.5$ $\beta_2 = 5$
 \end{center}
 $$\Pr\left(FP < 1\right) < \alpha_{FWER}$$
 $$\Pr\left(\mathbb{E}\left[\frac{FP}{R}\right | R > 0]\right)\Pr\left(R > 0\right) < \alpha_{FDR}$$
-When $TP \leq 1$ FWER and FDR control are identical.\\
+when $TP \leq 1$ FWER and FDR control are identical.\\
 The difference increases with the number of $TP$s
 ## FDR control
 \begin{center}
-\includegraphics[width=12cm]{img/pval_hist_H0_H1}\\[-1em]
+\includegraphics[width=11cm]{img/pval_hist_H0_H1}\\[-1em]
 \pause
-When we analyse data we hope to get a mixture between:\\
+When we analyze data we hope to get a mixture between:\\
-\includegraphics[width=12cm]{img/pval_hist_H0}\\[-2em]
+\includegraphics[width=11cm]{img/pval_hist_H0}\\[-2em]
 \pause
-\includegraphics[width=12cm]{img/pval_hist_H1}
+\includegraphics[width=11cm]{img/pval_hist_H1}
 \end{center}
 ## FDR control: local FDR ($\ell FDR$) of Efron
@@ -525,11 +571,19 @@ When we analyse data we hope to get a mixture between:\\
  }
 \end{center}
+## Post-selection inference
+\begin{center}
+  \href{https://pubmed.ncbi.nlm.nih.gov/30206223/}{
+    \includegraphics[width=0.75\textwidth]{img/post_inference_example.png}
+  }
+\end{center}
 ## SimCD
 \begin{center}
  \href{https://arxiv.org/abs/2104.01512v1}{
-    \includegraphics[width=0.6\textwidth]{img/simCD.png}
+    \includegraphics[width=\textwidth]{img/simCD.png}
  }
 \end{center}

--- a/6_dea/img/deg_time_group.png
+++ b/6_dea/img/deg_time_group.png
--- a/6_dea/img/deg_time_group_inter.png
+++ b/6_dea/img/deg_time_group_inter.png
--- a/6_dea/img/deg_time_mixed.png
+++ b/6_dea/img/deg_time_mixed.png
--- a/6_dea/img/post_inference_example.png
+++ b/6_dea/img/post_inference_example.png