Verified Commit 52a1d388 authored by Laurent Modolo's avatar Laurent Modolo
Browse files

update dea

parent eeae83e8
Pipeline #349 passed with stage
in 3 minutes and 25 seconds
......@@ -32,51 +32,507 @@ classoption: aspectratio=169
## Hypothesis testing
\begin{itemize}
\item $H_0$: the gene pression is the same between the $2$ groups
\item $H_1$: the gene pression is not the same between the $2$ groups
\end{itemize}
### Rejection of a null hypothesis $H_0$
Given the null model of our data how likely are we to observe a value? \\
We can compute this likelihood from the probability distribution of the null hypothesis.\\
We reject the hypothesis at risk $\alpha$, the probability that the null hypothesis was true for the observed value.
### $p$-value
The $p$-value is the probability to observe a value as or more extreme under the null hypothesis model.
## Hypothesis testing
\begin{center}
\only<1>{\includegraphics[width=12cm]{img/dnorm}}
\only<2>{\includegraphics[width=12cm]{img/dnorm_alpha}}
\only<3>{\includegraphics[width=12cm]{img/dnorm_alpha_accept}}
\only<4>{\includegraphics[width=12cm]{img/dnorm_alpha_reject}}
\end{center}
\begin{itemize}
\item $H_0$: the gene pression is the same between the $2$ groups
\item $H_1$: the gene pression is not the same between the $2$ groups
\item distribution under the null hypothesis $H_0$
\pause
\item rejection zone at a risk $\alpha$
\end{itemize}
## $p$-value construction
\begin{center}
\only<1>{\includegraphics[width=10cm]{img/pval_1_0.05}}
\only<2>{\includegraphics[width=10cm]{img/pval_1_0.15}}
\only<3>{\includegraphics[width=10cm]{img/pval_1_0.25}}
\only<4>{\includegraphics[width=10cm]{img/pval_1_0.35}}
\only<5>{\includegraphics[width=10cm]{img/pval_1_0.45}}
\only<6>{\includegraphics[width=10cm]{img/pval_1_0.55}}
\only<7>{\includegraphics[width=10cm]{img/pval_1_0.65}}
\only<8>{\includegraphics[width=10cm]{img/pval_1_0.75}}
\only<9>{\includegraphics[width=10cm]{img/pval_1_0.85}}
\only<10>{\includegraphics[width=10cm]{img/pval_1_0.95}}
\end{center}
\begin{center}
probability to observe a value as or more extreme under the null hypothesis model
\end{center}
## $p$-value construction
\begin{center}
{\bf We reject $H_0$ with a given risk $\alpha$ and a power $\beta$}
\only<1>{\includegraphics[width=10cm]{img/pval_2_0.05}}
\only<2>{\includegraphics[width=10cm]{img/pval_2_0.15}}
\only<3>{\includegraphics[width=10cm]{img/pval_2_0.25}}
\only<4>{\includegraphics[width=10cm]{img/pval_2_0.35}}
\only<5>{\includegraphics[width=10cm]{img/pval_2_0.45}}
\only<6>{\includegraphics[width=10cm]{img/pval_2_0.55}}
\only<7>{\includegraphics[width=10cm]{img/pval_2_0.65}}
\only<8>{\includegraphics[width=10cm]{img/pval_2_0.75}}
\only<9>{\includegraphics[width=10cm]{img/pval_2_0.85}}
\only<10>{\includegraphics[width=10cm]{img/pval_2_0.95}}
\end{center}
## Hypothesis testing
\begin{center}
\href{https://en.wikipedia.org/}{
\includegraphics[width=0.6\textwidth]{img/alpha_beta.png}
}
\end{center}
\vspace{-2.5em}
We specify $H_0$ not $H_1$
## Hypothesis testing
\begin{center}
\begin{columns}
\column{0.6\textwidth}
\begin{itemize}
\item $\beta$ probability of a Type II error, known as a {\it false negative}
\item $1 - \beta$ probability of a {\it true positive}, i.e., correctly rejecting the null hypothesis. $1 - \beta$ is also known as the power of the test.
\item $\alpha$ probability of a Type I error, known as a {\it false positive}
\item $1 - \alpha$ probability of a {\it true negative}, i.e., correctly not rejecting the null hypothesis
\end{itemize}
\column{0.4\textwidth}
## Hypothesis testing
\begin{center}
{\bf ROC curve}
\href{https://en.wikipedia.org/}{
\includegraphics[width=\textwidth]{img/roc_curve.png}
}
\end{center}
receiver operating characteristic
\end{columns}
\end{center}
## Differential expression analysis
### Finding difference in gene expression
\vspace{1em}
\begin{center}
\begin{columns}
\column{0.5\textwidth}
\begin{center}
\begin{tikzpicture}
\fill
(0.5,3.5) node {\bf $\text{gene}_1$}
-- (0.5,2.5) node {\bf $\text{gene}_2$}
-- (0.5,1.5) node {\bf $\vdots$}
-- (0.5,0.5) node {\bf $\text{gene}_n$};
\fill
(1.5,4.5) node {\bf{$\text{cell}_1$}}
-- (1.5,3.5) node {mRNA}
-- (1.5,2.5) node {mRNA}
-- (1.5,1.5) node {$\vdots$}
-- (1.5,0.5) node {mRNA};
\fill
(2.5,4.5) node {\bf{$\text{cell}_2$}}
-- (2.5,3.5) node {mRNA}
-- (2.5,2.5) node {mRNA}
-- (2.5,1.5) node {$\vdots$}
-- (2.5,0.5) node {mRNA};
\fill
(3.5,4.5) node {\bf{$\cdots$}}
-- (3.5,3.5) node {$\cdots$}
-- (3.5,2.5) node {$\cdots$}
-- (3.5,1.5) node {$\ddots$}
-- (3.5,0.5) node {$\cdots$};
\fill
(4.5,4.5) node {\bf{$\text{cell}_c$}}
-- (4.5,3.5) node {mRNA}
-- (4.5,2.5) node {mRNA}
-- (4.5,1.5) node {$\vdots$}
-- (4.5,0.5) node {mRNA};
\draw (1,0) grid (5,4);
\end{tikzpicture}
\end{center}
\column{0.5\textwidth}
For a given gene $x_i$ we can test:
\vspace{1em}
\begin{itemize}
\item $H_0$: $E\left(x_i\right) = E\left(x_{i'}\right)$
\item $H_0$: $E\left(x_i\right) = E\left(x_{i'}\right) = \dots = E\left(x_{i''}\right)$
\item $H_0$: $E\left(x_i \times y\right) = E\left(x_i \times 1\right)$
\end{itemize}
\vspace{2em}
or any combination of the above cases
\end{columns}
\end{center}
## Differential expression analysis
\begin{center}
\href{https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8701051/}{
\includegraphics[width=0.9\textwidth]{img/dea_tree.png}
}
\end{center}
## Counts distributions
$P(X = x)$ for $\mathcal{NB}(\lambda, \alpha = 10)$
\begin{center}
\includegraphics[width=0.7\textwidth]{img/NB_sigma_10.png}
\end{center}
## Counts distributions
$P(X = x)$ for $\mathcal{NB}(\lambda, \alpha = 2)$
\begin{center}
\includegraphics[width=0.7\textwidth]{img/NB_sigma_2.png}
\end{center}
## Counts distributions
$P(X = x)$ for $\mathcal{NB}(\lambda, \alpha = 1)$
\begin{center}
\includegraphics[width=0.7\textwidth]{img/NB_sigma_1.png}
\end{center}
## Non-parametric approaches
### We don't try to model the data distribution
Instead we work with:
- ranks of the values
- the sign of the difference between two groups (Wilcoxon)
- the distribution of differances
If we know the distribution the parametric approach is often more powerfull
### Often limited to the 2 groups setting
## Wilcoxon rank sum test
### $H_0$: the median are equal
\begin{center}
\href{https://www.nature.com/articles/s41467-021-27464-5}{
\includegraphics[width=\textwidth]{img/wilcoxon_example.png}
}
\end{center}
## WaddR
### Base on 2-Wasserstein distance
\begin{center}
\href{https://pubmed.ncbi.nlm.nih.gov/33792651/}{
\includegraphics[width=\textwidth]{img/waddR.png}
}
\end{center}
## Model based approaches
\begin{center}
\begin{columns}
\column{0.5\textwidth}
\vspace{1em}
{\bf Gaussian models}
\[
log(X + 1) \sim \mathcal{N}(\mu, \sigma)
\]
{\bf Poisson models}
\[
X \sim \mathcal{P}(\lambda)
\]
{\bf NB models}
\[
X \sim \mathcal{NB}(\lambda, \alpha)
\]
{\bf ZINB models}
\[
X \sim \pi \delta_0 + \left(1 - \pi\right) \mathcal{NB}(\lambda, \alpha)
\]
\column{0.5\textwidth}
\begin{center}
\href{https://en.wikipedia.org/}{
\includegraphics[width=0.6\textwidth]{img/alpha_beta.png}
\includegraphics[width=\textwidth]{img/dirac.png}
}
\end{center}
\end{columns}
\end{center}
## Model based approaches
### NB distributed counts with excess of zeros
\begin{center}
\includegraphics[width=0.8\textwidth]{img/ziNB_1}
\end{center}
## Model based approaches
### Mixture of two NB distributions
\begin{center}
\includegraphics[width=0.8\textwidth]{img/ziNB_2}
\end{center}
## Model based approaches
### $y = \beta_0 + \beta_1 x$
\begin{center}
\includegraphics[width=0.7\textwidth]{img/lm_2_groups_b0_3_b1_05.png}
\end{center}
$\beta_0 = 3$, $\beta_1 = 0.5$
## Model based approaches
### $y = \beta_0 + \beta_1 x$
### Wald test:
\[H_0: \beta_1 = 0\]
### Likelihood ratio test (LTR)
\[H_0: L\left(y = \beta_0\right) = L\left(y = \beta_0 + \beta_1 x\right)\]
## Model based approaches
### $y = \beta_0 + \beta_1 x$
\begin{center}
\includegraphics[width=0.7\textwidth]{img/lm_b0_3_b1_05.png}
\end{center}
$\beta_0 = 3$, $\beta_1 = 0.5$
## Model based approaches
### $y = \beta_0 + \beta_1 x$
\begin{center}
\href{https://cole-trapnell-lab.github.io/monocle3/}{
\includegraphics[width=0.7\textwidth]{img/deg_pseudotime.png}
}
\end{center}
## Model based approaches
### $y = \beta_0 + \beta_1 x_1 + \beta_2 x_2$
\begin{center}
\includegraphics[width=0.7\textwidth]{img/lm_2_groups_b0_b0_3_b1_05.png}
\end{center}
$\beta_0 = 3$, $\beta_1 = 0.5$ $\beta_2 = 5$
## Model based approaches
### $y = \beta_0 + \beta_1 x_1 + \beta_2 x_2$
\begin{center}
\includegraphics[width=0.7\textwidth]{img/lm_2_groups_b0_b0_3_b1_05_interaction.png}
\end{center}
$\beta_0 = 3$, $\beta_1 = 0.5$ $\beta_2 = 5$
## Model based approaches
### $y = \beta_0 + \beta_1 x_1 + \beta_2 x_2$
\begin{center}
\includegraphics[width=0.7\textwidth]{img/lm_2_groups_2_factors_b0_b0_3_b1_05_interaction.png}
\end{center}
$\beta_0 = 3$, $\beta_1 = 0.5$ $\beta_2 = 5$
## Model based approaches
### $y = \beta_0 + \beta_1 x_1 + \beta_2 x_2$
\begin{center}
href{doi: 10.1093/nar/gky675}{
\includegraphics[width=0.6\textwidth]{img/deg_time_group.png}
}
\end{center}
# Parametric versus non-parametric testing
# Differential expression analysis between groups
# Multiple hypotheses testing
## Multiple hypotheses problem
\begin{center}
\only<1>{\includegraphics[width=10cm]{img/dnorm_abs}\\[-2.5em]}
\only<1>{\includegraphics[width=10cm]{img/pval_alpha}}
\only<2>{\includegraphics[width=10cm]{img/pval_alpha_random_H0_1}
\begin{center}
n = 10
\end{center}}
\only<3>{\includegraphics[width=12cm]{img/pval_alpha_random_H0_2}
\begin{center}
n = 30
\end{center}}
\only<4>{\includegraphics[width=12cm]{img/pval_alpha_random_H0_3}
\begin{center}
n = 60
\end{center}}
\only<5>{\includegraphics[width=12cm]{img/pval_alpha_random_H0_4}
\begin{center}
n = 100
\end{center}}
\only<6>{\includegraphics[width=12cm]{img/pval_alpha_random_H0_5}
\begin{center}
n = 150
\end{center}}
\only<7>{\includegraphics[width=12cm]{img/pval_alpha_random_H0_6}
\begin{center}
n = 210
\end{center}}
\only<8>{\includegraphics[width=12cm]{img/pval_alpha_random_H0_7}
\begin{center}
n = 280
\end{center}}
\only<9>{\includegraphics[width=12cm]{img/pval_alpha_random_H0_8}
\begin{center}
n = 360
\end{center}}
\only<10>{\includegraphics[width=12cm]{img/pval_alpha_random_H0_9}
\begin{center}
n = 450
\end{center}}
\only<11>{\includegraphics[width=12cm]{img/pval_alpha_random_H0_10}
\begin{center}
n = 550
\end{center}}
\end{center}
## Multiple hypotheses solutions
\begin{block}{Family Wise Error Rate (FWER)}
\begin{itemize}
\item Bonferoni like procedure
\item Control the risk of having fewer than one false positive
\item $\Pr\left(FP < 1\right) < \alpha_{FWER}$
\item $\alpha_{FWER} = \frac{\alpha}{m}$
\end{itemize}
\end{block}
\begin{example}
\begin{center}
\emph{``We reject 14 hypothesis with a FWER of 0.05''}
\emph{``We reject 14 hypothesis at a level of 0.05 after Bonferoni correction''}
\end{center}
Means: 14 hypotheses are not following the null distribution and we make this statement with a probability 0.05 of having fewer than one false positives in the 14 tests.
\end{example}
## Multiple hypotheses solutions
\begin{block}{False Discovery Rate (FDR)}
\begin{itemize}
\item Benjamini-Hochberg like procedure
\item Control the risk of having less than a proportion of false positive
\item $\Pr\left(\mathbb{E}\left[\frac{FP}{R}\right | R > 0]\right)\Pr\left(R > 0\right) < \alpha_{FDR}$
\item adaptive procedure $\alpha_{FDR} \sim f_0$
\end{itemize}
\end{block}
\only<1>{
\vspace{2em}
\begin{tabular}{l|ccc}
hypothesis & Claimed non-significant & Claimed significant & Total\\
\hline
Null & TN & FP & $m_0$\\
Non-null & FN & TP & $m_1$\\
Total & S & R & $m$
\end{tabular}
}
\only<2-3>{
\begin{example}
\begin{center}
\emph{``We reject 254 hypothesis with a FDR of 0.05''}
\emph{``We reject 254 hypothesis with a level of 0.05 after BH correction''}
\end{center}
Means: 254 hypotheses are not following the null distribution and we expect on average 5\% or less of false positives in the 254.
\end{example}
}
\only<3>{
\begin{center}
The number of FPs increases with the number of TPs
\end{center}
}
## Differential expression analysis between $2$ groups
## FWER versus FDR control
\begin{center}
\includegraphics[width=12cm]{img/pval_hist_H0}
\end{center}
$$\Pr\left(FP < 1\right) < \alpha_{FWER}$$
$$\Pr\left(\mathbb{E}\left[\frac{FP}{R}\right | R > 0]\right)\Pr\left(R > 0\right) < \alpha_{FDR}$$
When $TP \leq 1$ FWER and FDR control are identical.\\
The difference increases with the number of $TP$s
## Between $n$ groups
## FDR control
# Regression analysis
\begin{center}
\includegraphics[width=12cm]{img/pval_hist_H0_H1}\\[-1em]
\pause
When we analyse data we hope to get a mixture between:\\
\includegraphics[width=12cm]{img/pval_hist_H0}\\[-2em]
\pause
\includegraphics[width=12cm]{img/pval_hist_H1}
\end{center}
# Multiple testing
## FDR control: local FDR ($\ell FDR$) of Efron
\begin{center}
\only<1>{\includegraphics[width=12cm]{img/pval_hist_H0_H1}}
\only<2-3>{\includegraphics[width=12cm]{img/pval_hist_H0_H1_mixture}}
\only<4>{\includegraphics[width=12cm]{img/zval_hist_H0_H1_mixture}}
\end{center}
\only<3-4>{
$$\ell FDR\left(x_i\right) = \frac{{\color{blue}\Pr\left(x_i | H_i = 0\right)}}{{\color{blue}\Pr\left(x_i | H_i = 0\right)}{\color{red}\Pr\left(x_i | H_i = 1\right)}}$$
}
\only<4>{
\begin{center}
work with $z$-values instead of $p$-values
\end{center}
}
# Post-selection inference
## Post-selection inference
\begin{center}
\href{https://en.wikipedia.org/}{
\includegraphics[width=\textwidth]{img/post_inference.png}
}
\end{center}
## SimCD
\begin{center}
\href{https://arxiv.org/abs/2104.01512v1}{
\includegraphics[width=0.6\textwidth]{img/simCD.png}
}
\end{center}
area under ROC curves
# Multivariate Differential expression analysis
\ No newline at end of file
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment