Verified Commit faa9be07 authored by Laurent Modolo's avatar Laurent Modolo
Browse files

update dea

parent 52a1d388
Pipeline #350 passed with stage
in 3 minutes and 18 seconds
......@@ -26,7 +26,7 @@ classoption: aspectratio=169
- Differential expression analysis between groups
- Regression analysis
- Multiple testing
- Multivariate Differential expression analysis
- Multivariate differential expression analysis
# Hypothesis testing
......@@ -38,7 +38,7 @@ classoption: aspectratio=169
We reject the hypothesis at risk $\alpha$, the probability that the null hypothesis was true for the observed value.
### $p$-value
The $p$-value is the probability to observe a value as or more extreme under the null hypothesis model.
the $p$-value is the probability to observe a value as or more extreme under the null hypothesis model.
## Hypothesis testing
......@@ -162,7 +162,7 @@ receiver operating characteristic
\end{tikzpicture}
\end{center}
\column{0.5\textwidth}
For a given gene $x_i$ we can test:
For a given gene $x_i$ we can test
\vspace{1em}
\begin{itemize}
\item $H_0$: $E\left(x_i\right) = E\left(x_{i'}\right)$
......@@ -206,7 +206,7 @@ $P(X = x)$ for $\mathcal{NB}(\lambda, \alpha = 1)$
\includegraphics[width=0.7\textwidth]{img/NB_sigma_1.png}
\end{center}
## Non-parametric approaches
## Nonparametric approaches
### We don't try to model the data distribution
......@@ -214,16 +214,16 @@ Instead we work with:
- ranks of the values
- the sign of the difference between two groups (Wilcoxon)
- the distribution of differances
- the distribution of differences
If we know the distribution the parametric approach is often more powerfull
If we know the distribution, the parametric approach is often more powerful.
### Often limited to the 2 groups setting
## Wilcoxon rank sum test
### $H_0$: the median are equal
### $H_0$: the medians are equal
\begin{center}
\href{https://www.nature.com/articles/s41467-021-27464-5}{
......@@ -233,7 +233,7 @@ If we know the distribution the parametric approach is often more powerfull
## WaddR
### Base on 2-Wasserstein distance
### Based on 2-Wasserstein distance
\begin{center}
\href{https://pubmed.ncbi.nlm.nih.gov/33792651/}{
......@@ -241,7 +241,7 @@ If we know the distribution the parametric approach is often more powerfull
}
\end{center}
## Model based approaches
## Model-based approaches
\begin{center}
\begin{columns}
......@@ -282,14 +282,14 @@ X \sim \pi \delta_0 + \left(1 - \pi\right) \mathcal{NB}(\lambda, \alpha)
\end{center}
## Model based approaches
## Model-based approaches
### NB distributed counts with excess of zeros
\begin{center}
\includegraphics[width=0.8\textwidth]{img/ziNB_1}
\end{center}
## Model based approaches
## Model-based approaches
### Mixture of two NB distributions
......@@ -297,9 +297,32 @@ X \sim \pi \delta_0 + \left(1 - \pi\right) \mathcal{NB}(\lambda, \alpha)
\includegraphics[width=0.8\textwidth]{img/ziNB_2}
\end{center}
## Model based approaches
### $y = \beta_0 + \beta_1 x$
## Model-based approaches
### GLM framework
\[
X_i \sim \mathcal{NB}(\lambda, \alpha)
\]
\[E(X_i|\mathbf{Y}) = \boldsymbol{\mu}_i = g^{-1}(\mathbf{Y}\boldsymbol{\beta})\]
with :
\begin{itemize}
\item $\boldsymbol{\mu}_i$ the mean of the gene $i$ distribution
\item $g$ is the link function
\item $\beta$ the unknown parameters of the model
\end{itemize}
\[E(X_i|\mathbf{Y}) = \boldsymbol{\mu}_i = g^{-1}(Y_1 \beta_1 + \dots Y_n \beta_n)\]
### We can also model the variance as a function of the mean
\[ Var(X_i|\mathbf{Y}) = V( \boldsymbol{\mu}_i ) = \operatorname{V}(g^{-1}(\mathbf{X}\boldsymbol{\beta})).\]
## Model-based approaches
### $\boldsymbol{\mu}_i = \beta_0 + \beta_1 y_1$
\begin{center}
\includegraphics[width=0.7\textwidth]{img/lm_2_groups_b0_3_b1_05.png}
......@@ -307,19 +330,19 @@ X \sim \pi \delta_0 + \left(1 - \pi\right) \mathcal{NB}(\lambda, \alpha)
$\beta_0 = 3$, $\beta_1 = 0.5$
## Model based approaches
## Model-based approaches
### $y = \beta_0 + \beta_1 x$
### $\boldsymbol{\mu}_i = \beta_0 + \beta_1 y_1$
### Wald test:
\[H_0: \beta_1 = 0\]
### Likelihood ratio test (LTR)
\[H_0: L\left(y = \beta_0\right) = L\left(y = \beta_0 + \beta_1 x\right)\]
\[H_0: L\left(\boldsymbol{\mu}_i = \beta_0\right) = L\left(\boldsymbol{\mu}_i = \beta_0 + \beta_1 y_1\right)\]
## Model based approaches
## Model-based approaches
### $y = \beta_0 + \beta_1 x$
### $\boldsymbol{\mu}_i = \beta_0 + \beta_1 y_1$
\begin{center}
\includegraphics[width=0.7\textwidth]{img/lm_b0_3_b1_05.png}
......@@ -327,62 +350,83 @@ $\beta_0 = 3$, $\beta_1 = 0.5$
$\beta_0 = 3$, $\beta_1 = 0.5$
## Model based approaches
## Model-based approaches
### $y = \beta_0 + \beta_1 x$
### $\boldsymbol{\mu}_i = \beta_0 + \beta_1 y_1$
\begin{center}
\href{https://cole-trapnell-lab.github.io/monocle3/}{
\includegraphics[width=0.7\textwidth]{img/deg_pseudotime.png}
\includegraphics[width=0.6\textwidth]{img/deg_pseudotime.png}
}
\end{center}
## Model based approaches
## Model-based approaches
### $y = \beta_0 + \beta_1 x_1 + \beta_2 x_2$
### $\boldsymbol{\mu}_i = \beta_0 + \beta_1 y_1 + \beta_2 y_2$
\begin{center}
\includegraphics[width=0.7\textwidth]{img/lm_2_groups_b0_b0_3_b1_05.png}
\end{center}
$\beta_0 = 3$, $\beta_1 = 0.5$ $\beta_2 = 5$
$\beta_0 = 3$, $\beta_1 = 0.5$, $\beta_2 = 5$
## Model-based approaches
### $\boldsymbol{\mu}_i = \beta_0 + \beta_1 y_1 + \beta_2 y_2$
\begin{center}
\href{https://www.sciencedirect.com/science/article/pii/S2211124721005192}{
\includegraphics[width=0.9\textwidth]{img/deg_time_group.png}
}
\end{center}
## Model based approaches
## Model-based approaches
### $y = \beta_0 + \beta_1 x_1 + \beta_2 x_2$
### $\boldsymbol{\mu}_i = \beta_0 + \beta_1 y_1 + \beta_2 y_2 + \beta_3 y_1 y_2$
\begin{center}
\includegraphics[width=0.7\textwidth]{img/lm_2_groups_b0_b0_3_b1_05_interaction.png}
\end{center}
$\beta_0 = 3$, $\beta_1 = 0.5$ $\beta_2 = 5$
$\beta_0 = 3$, $\beta_1 = 0.5$, $\beta_2 = 5$, $\beta_3 = -0.4$
## Model based approaches
## Model-based approaches
### $y = \beta_0 + \beta_1 x_1 + \beta_2 x_2$
### $\boldsymbol{\mu}_i = \beta_0 + \beta_1 y_1 + \beta_2 y_2 + \beta_3 y_1 y_2$
\begin{center}
\includegraphics[width=0.7\textwidth]{img/lm_2_groups_2_factors_b0_b0_3_b1_05_interaction.png}
\end{center}
$\beta_0 = 3$, $\beta_1 = 0.5$ $\beta_2 = 5$
$\beta_0 = 3$, $\beta_1 = 0.5$, $\beta_2 = 5$, $\beta_3 = -0.4$
## Model based approaches
## Model-based approaches
### $y = \beta_0 + \beta_1 x_1 + \beta_2 x_2$
### $\boldsymbol{\mu}_i = \beta_0 + \beta_1 y_1 + \beta_2 y_2$
\begin{center}
href{doi: 10.1093/nar/gky675}{
\includegraphics[width=0.6\textwidth]{img/deg_time_group.png}
\href{doi: 10.1093/nar/gky675}{
\includegraphics[width=0.35\textwidth]{img/deg_time_group_inter.png}
}
\end{center}
## Model-based approaches
### $\boldsymbol{\mu}_i = \beta_0 + \beta_1 y_1 + \beta_2 Z$
$Z \sim \mathcal{N}(\mu_z, \sigma_z)$
\begin{center}
\href{https://www.sciencedirect.com/science/article/pii/S2211124721005192}{
\includegraphics[width=0.35\textwidth]{img/deg_time_mixed.png}
}
\end{center}
# Multiple hypotheses testing
## Multiple hypotheses problem
## Multiple hypothesis problem
\begin{center}
\only<1>{\includegraphics[width=10cm]{img/dnorm_abs}\\[-2.5em]}
\only<1>{\includegraphics[width=10cm]{img/pval_alpha}}
\only<1>{\includegraphics[width=10cm]{img/pval_2_0.05}}
\only<2>{\includegraphics[width=10cm]{img/pval_alpha_random_H0_1}
\begin{center}
n = 10
......@@ -426,7 +470,7 @@ $\beta_0 = 3$, $\beta_1 = 0.5$ $\beta_2 = 5$
\end{center}
## Multiple hypotheses solutions
## Multiple hypothesis solutions
\begin{block}{Family Wise Error Rate (FWER)}
\begin{itemize}
......@@ -438,13 +482,13 @@ $\beta_0 = 3$, $\beta_1 = 0.5$ $\beta_2 = 5$
\end{block}
\begin{example}
\begin{center}
\emph{``We reject 14 hypothesis with a FWER of 0.05''}
\emph{``We reject 14 hypothesis at a level of 0.05 after Bonferoni correction''}
\emph{``We reject 14 hypotheses with a FWER of 0.05''}
\emph{``We reject 14 hypotheses at a level of 0.05 after Bonferoni correction''}
\end{center}
Means: 14 hypotheses are not following the null distribution and we make this statement with a probability 0.05 of having fewer than one false positives in the 14 tests.
\end{example}
## Multiple hypotheses solutions
## Multiple hypothesis solutions
\begin{block}{False Discovery Rate (FDR)}
\begin{itemize}
......@@ -456,26 +500,28 @@ $\beta_0 = 3$, $\beta_1 = 0.5$ $\beta_2 = 5$
\end{block}
\only<1>{
\vspace{2em}
\begin{center}
\begin{tabular}{l|ccc}
hypothesis & Claimed non-significant & Claimed significant & Total\\
hypothesis & Claimed nonsignificant & Claimed significant & Total\\
\hline
Null & TN & FP & $m_0$\\
Non-null & FN & TP & $m_1$\\
Total & S & R & $m$
\end{tabular}
\end{center}
}
\only<2-3>{
\begin{example}
\begin{center}
\emph{``We reject 254 hypothesis with a FDR of 0.05''}
\emph{``We reject 254 hypothesis with a level of 0.05 after BH correction''}
\emph{``We reject 254 hypotheses with a FDR of 0.05''}
\emph{``We reject 254 hypotheses with a level of 0.05 after BH correction''}
\end{center}
Means: 254 hypotheses are not following the null distribution and we expect on average 5\% or less of false positives in the 254.
\end{example}
}
\only<3>{
\begin{center}
The number of FPs increases with the number of TPs
{\bf The number of FPs increases with the number of TPs}
\end{center}
}
......@@ -485,18 +531,18 @@ $\beta_0 = 3$, $\beta_1 = 0.5$ $\beta_2 = 5$
\end{center}
$$\Pr\left(FP < 1\right) < \alpha_{FWER}$$
$$\Pr\left(\mathbb{E}\left[\frac{FP}{R}\right | R > 0]\right)\Pr\left(R > 0\right) < \alpha_{FDR}$$
When $TP \leq 1$ FWER and FDR control are identical.\\
when $TP \leq 1$ FWER and FDR control are identical.\\
The difference increases with the number of $TP$s
## FDR control
\begin{center}
\includegraphics[width=12cm]{img/pval_hist_H0_H1}\\[-1em]
\includegraphics[width=11cm]{img/pval_hist_H0_H1}\\[-1em]
\pause
When we analyse data we hope to get a mixture between:\\
\includegraphics[width=12cm]{img/pval_hist_H0}\\[-2em]
When we analyze data we hope to get a mixture between:\\
\includegraphics[width=11cm]{img/pval_hist_H0}\\[-2em]
\pause
\includegraphics[width=12cm]{img/pval_hist_H1}
\includegraphics[width=11cm]{img/pval_hist_H1}
\end{center}
## FDR control: local FDR ($\ell FDR$) of Efron
......@@ -525,11 +571,19 @@ When we analyse data we hope to get a mixture between:\\
}
\end{center}
## Post-selection inference
\begin{center}
\href{https://pubmed.ncbi.nlm.nih.gov/30206223/}{
\includegraphics[width=0.75\textwidth]{img/post_inference_example.png}
}
\end{center}
## SimCD
\begin{center}
\href{https://arxiv.org/abs/2104.01512v1}{
\includegraphics[width=0.6\textwidth]{img/simCD.png}
\includegraphics[width=\textwidth]{img/simCD.png}
}
\end{center}
......
6_dea/img/deg_time_group.png

1.13 MB | W: | H:

6_dea/img/deg_time_group.png

1.92 MB | W: | H:

6_dea/img/deg_time_group.png
6_dea/img/deg_time_group.png
6_dea/img/deg_time_group.png
6_dea/img/deg_time_group.png
  • 2-up
  • Swipe
  • Onion skin
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment