Commit 52a1d388 by Laurent Modolo

### update dea

parent eeae83e8
Pipeline #349 passed with stage
in 3 minutes and 25 seconds
 ... ... @@ -32,51 +32,507 @@ classoption: aspectratio=169 ## Hypothesis testing \begin{itemize} \item $H_0$: the gene pression is the same between the $2$ groups \item $H_1$: the gene pression is not the same between the $2$ groups \end{itemize} ### Rejection of a null hypothesis $H_0$ Given the null model of our data how likely are we to observe a value? \\ We can compute this likelihood from the probability distribution of the null hypothesis.\\ We reject the hypothesis at risk $\alpha$, the probability that the null hypothesis was true for the observed value. ### $p$-value The $p$-value is the probability to observe a value as or more extreme under the null hypothesis model. ## Hypothesis testing \begin{center} \only<1>{\includegraphics[width=12cm]{img/dnorm}} \only<2>{\includegraphics[width=12cm]{img/dnorm_alpha}} \only<3>{\includegraphics[width=12cm]{img/dnorm_alpha_accept}} \only<4>{\includegraphics[width=12cm]{img/dnorm_alpha_reject}} \end{center} \begin{itemize} \item $H_0$: the gene pression is the same between the $2$ groups \item $H_1$: the gene pression is not the same between the $2$ groups \item distribution under the null hypothesis $H_0$ \pause \item rejection zone at a risk $\alpha$ \end{itemize} ## $p$-value construction \begin{center} \only<1>{\includegraphics[width=10cm]{img/pval_1_0.05}} \only<2>{\includegraphics[width=10cm]{img/pval_1_0.15}} \only<3>{\includegraphics[width=10cm]{img/pval_1_0.25}} \only<4>{\includegraphics[width=10cm]{img/pval_1_0.35}} \only<5>{\includegraphics[width=10cm]{img/pval_1_0.45}} \only<6>{\includegraphics[width=10cm]{img/pval_1_0.55}} \only<7>{\includegraphics[width=10cm]{img/pval_1_0.65}} \only<8>{\includegraphics[width=10cm]{img/pval_1_0.75}} \only<9>{\includegraphics[width=10cm]{img/pval_1_0.85}} \only<10>{\includegraphics[width=10cm]{img/pval_1_0.95}} \end{center} \begin{center} probability to observe a value as or more extreme under the null hypothesis model \end{center} ## $p$-value construction \begin{center} {\bf We reject $H_0$ with a given risk $\alpha$ and a power $\beta$} \only<1>{\includegraphics[width=10cm]{img/pval_2_0.05}} \only<2>{\includegraphics[width=10cm]{img/pval_2_0.15}} \only<3>{\includegraphics[width=10cm]{img/pval_2_0.25}} \only<4>{\includegraphics[width=10cm]{img/pval_2_0.35}} \only<5>{\includegraphics[width=10cm]{img/pval_2_0.45}} \only<6>{\includegraphics[width=10cm]{img/pval_2_0.55}} \only<7>{\includegraphics[width=10cm]{img/pval_2_0.65}} \only<8>{\includegraphics[width=10cm]{img/pval_2_0.75}} \only<9>{\includegraphics[width=10cm]{img/pval_2_0.85}} \only<10>{\includegraphics[width=10cm]{img/pval_2_0.95}} \end{center} ## Hypothesis testing \begin{center} \href{https://en.wikipedia.org/}{ \includegraphics[width=0.6\textwidth]{img/alpha_beta.png} } \end{center} \vspace{-2.5em} We specify $H_0$ not $H_1$ ## Hypothesis testing \begin{center} \begin{columns} \column{0.6\textwidth} \begin{itemize} \item $\beta$ probability of a Type II error, known as a {\it false negative} \item $1 - \beta$ probability of a {\it true positive}, i.e., correctly rejecting the null hypothesis. $1 - \beta$ is also known as the power of the test. \item $\alpha$ probability of a Type I error, known as a {\it false positive} \item $1 - \alpha$ probability of a {\it true negative}, i.e., correctly not rejecting the null hypothesis \end{itemize} \column{0.4\textwidth} ## Hypothesis testing \begin{center} {\bf ROC curve} \href{https://en.wikipedia.org/}{ \includegraphics[width=\textwidth]{img/roc_curve.png} } \end{center} receiver operating characteristic \end{columns} \end{center} ## Differential expression analysis ### Finding difference in gene expression \vspace{1em} \begin{center} \begin{columns} \column{0.5\textwidth} \begin{center} \begin{tikzpicture} \fill (0.5,3.5) node {\bf $\text{gene}_1$} -- (0.5,2.5) node {\bf $\text{gene}_2$} -- (0.5,1.5) node {\bf $\vdots$} -- (0.5,0.5) node {\bf $\text{gene}_n$}; \fill (1.5,4.5) node {\bf{$\text{cell}_1$}} -- (1.5,3.5) node {mRNA} -- (1.5,2.5) node {mRNA} -- (1.5,1.5) node {$\vdots$} -- (1.5,0.5) node {mRNA}; \fill (2.5,4.5) node {\bf{$\text{cell}_2$}} -- (2.5,3.5) node {mRNA} -- (2.5,2.5) node {mRNA} -- (2.5,1.5) node {$\vdots$} -- (2.5,0.5) node {mRNA}; \fill (3.5,4.5) node {\bf{$\cdots$}} -- (3.5,3.5) node {$\cdots$} -- (3.5,2.5) node {$\cdots$} -- (3.5,1.5) node {$\ddots$} -- (3.5,0.5) node {$\cdots$}; \fill (4.5,4.5) node {\bf{$\text{cell}_c$}} -- (4.5,3.5) node {mRNA} -- (4.5,2.5) node {mRNA} -- (4.5,1.5) node {$\vdots$} -- (4.5,0.5) node {mRNA}; \draw (1,0) grid (5,4); \end{tikzpicture} \end{center} \column{0.5\textwidth} For a given gene $x_i$ we can test: \vspace{1em} \begin{itemize} \item $H_0$: $E\left(x_i\right) = E\left(x_{i'}\right)$ \item $H_0$: $E\left(x_i\right) = E\left(x_{i'}\right) = \dots = E\left(x_{i''}\right)$ \item $H_0$: $E\left(x_i \times y\right) = E\left(x_i \times 1\right)$ \end{itemize} \vspace{2em} or any combination of the above cases \end{columns} \end{center} ## Differential expression analysis \begin{center} \href{https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8701051/}{ \includegraphics[width=0.9\textwidth]{img/dea_tree.png} } \end{center} ## Counts distributions $P(X = x)$ for $\mathcal{NB}(\lambda, \alpha = 10)$ \begin{center} \includegraphics[width=0.7\textwidth]{img/NB_sigma_10.png} \end{center} ## Counts distributions $P(X = x)$ for $\mathcal{NB}(\lambda, \alpha = 2)$ \begin{center} \includegraphics[width=0.7\textwidth]{img/NB_sigma_2.png} \end{center} ## Counts distributions $P(X = x)$ for $\mathcal{NB}(\lambda, \alpha = 1)$ \begin{center} \includegraphics[width=0.7\textwidth]{img/NB_sigma_1.png} \end{center} ## Non-parametric approaches ### We don't try to model the data distribution Instead we work with: - ranks of the values - the sign of the difference between two groups (Wilcoxon) - the distribution of differances If we know the distribution the parametric approach is often more powerfull ### Often limited to the 2 groups setting ## Wilcoxon rank sum test ### $H_0$: the median are equal \begin{center} \href{https://www.nature.com/articles/s41467-021-27464-5}{ \includegraphics[width=\textwidth]{img/wilcoxon_example.png} } \end{center} ## WaddR ### Base on 2-Wasserstein distance \begin{center} \href{https://pubmed.ncbi.nlm.nih.gov/33792651/}{ \includegraphics[width=\textwidth]{img/waddR.png} } \end{center} ## Model based approaches \begin{center} \begin{columns} \column{0.5\textwidth} \vspace{1em} {\bf Gaussian models} $log(X + 1) \sim \mathcal{N}(\mu, \sigma)$ {\bf Poisson models} $X \sim \mathcal{P}(\lambda)$ {\bf NB models} $X \sim \mathcal{NB}(\lambda, \alpha)$ {\bf ZINB models} $X \sim \pi \delta_0 + \left(1 - \pi\right) \mathcal{NB}(\lambda, \alpha)$ \column{0.5\textwidth} \begin{center} \href{https://en.wikipedia.org/}{ \includegraphics[width=0.6\textwidth]{img/alpha_beta.png} \includegraphics[width=\textwidth]{img/dirac.png} } \end{center} \end{columns} \end{center} ## Model based approaches ### NB distributed counts with excess of zeros \begin{center} \includegraphics[width=0.8\textwidth]{img/ziNB_1} \end{center} ## Model based approaches ### Mixture of two NB distributions \begin{center} \includegraphics[width=0.8\textwidth]{img/ziNB_2} \end{center} ## Model based approaches ### $y = \beta_0 + \beta_1 x$ \begin{center} \includegraphics[width=0.7\textwidth]{img/lm_2_groups_b0_3_b1_05.png} \end{center} $\beta_0 = 3$, $\beta_1 = 0.5$ ## Model based approaches ### $y = \beta_0 + \beta_1 x$ ### Wald test: $H_0: \beta_1 = 0$ ### Likelihood ratio test (LTR) $H_0: L\left(y = \beta_0\right) = L\left(y = \beta_0 + \beta_1 x\right)$ ## Model based approaches ### $y = \beta_0 + \beta_1 x$ \begin{center} \includegraphics[width=0.7\textwidth]{img/lm_b0_3_b1_05.png} \end{center} $\beta_0 = 3$, $\beta_1 = 0.5$ ## Model based approaches ### $y = \beta_0 + \beta_1 x$ \begin{center} \href{https://cole-trapnell-lab.github.io/monocle3/}{ \includegraphics[width=0.7\textwidth]{img/deg_pseudotime.png} } \end{center} ## Model based approaches ### $y = \beta_0 + \beta_1 x_1 + \beta_2 x_2$ \begin{center} \includegraphics[width=0.7\textwidth]{img/lm_2_groups_b0_b0_3_b1_05.png} \end{center} $\beta_0 = 3$, $\beta_1 = 0.5$ $\beta_2 = 5$ ## Model based approaches ### $y = \beta_0 + \beta_1 x_1 + \beta_2 x_2$ \begin{center} \includegraphics[width=0.7\textwidth]{img/lm_2_groups_b0_b0_3_b1_05_interaction.png} \end{center} $\beta_0 = 3$, $\beta_1 = 0.5$ $\beta_2 = 5$ ## Model based approaches ### $y = \beta_0 + \beta_1 x_1 + \beta_2 x_2$ \begin{center} \includegraphics[width=0.7\textwidth]{img/lm_2_groups_2_factors_b0_b0_3_b1_05_interaction.png} \end{center} $\beta_0 = 3$, $\beta_1 = 0.5$ $\beta_2 = 5$ ## Model based approaches ### $y = \beta_0 + \beta_1 x_1 + \beta_2 x_2$ \begin{center} href{doi: 10.1093/nar/gky675}{ \includegraphics[width=0.6\textwidth]{img/deg_time_group.png} } \end{center} # Parametric versus non-parametric testing # Differential expression analysis between groups # Multiple hypotheses testing ## Multiple hypotheses problem \begin{center} \only<1>{\includegraphics[width=10cm]{img/dnorm_abs}\\[-2.5em]} \only<1>{\includegraphics[width=10cm]{img/pval_alpha}} \only<2>{\includegraphics[width=10cm]{img/pval_alpha_random_H0_1} \begin{center} n = 10 \end{center}} \only<3>{\includegraphics[width=12cm]{img/pval_alpha_random_H0_2} \begin{center} n = 30 \end{center}} \only<4>{\includegraphics[width=12cm]{img/pval_alpha_random_H0_3} \begin{center} n = 60 \end{center}} \only<5>{\includegraphics[width=12cm]{img/pval_alpha_random_H0_4} \begin{center} n = 100 \end{center}} \only<6>{\includegraphics[width=12cm]{img/pval_alpha_random_H0_5} \begin{center} n = 150 \end{center}} \only<7>{\includegraphics[width=12cm]{img/pval_alpha_random_H0_6} \begin{center} n = 210 \end{center}} \only<8>{\includegraphics[width=12cm]{img/pval_alpha_random_H0_7} \begin{center} n = 280 \end{center}} \only<9>{\includegraphics[width=12cm]{img/pval_alpha_random_H0_8} \begin{center} n = 360 \end{center}} \only<10>{\includegraphics[width=12cm]{img/pval_alpha_random_H0_9} \begin{center} n = 450 \end{center}} \only<11>{\includegraphics[width=12cm]{img/pval_alpha_random_H0_10} \begin{center} n = 550 \end{center}} \end{center} ## Multiple hypotheses solutions \begin{block}{Family Wise Error Rate (FWER)} \begin{itemize} \item Bonferoni like procedure \item Control the risk of having fewer than one false positive \item $\Pr\left(FP < 1\right) < \alpha_{FWER}$ \item $\alpha_{FWER} = \frac{\alpha}{m}$ \end{itemize} \end{block} \begin{example} \begin{center} \emph{We reject 14 hypothesis with a FWER of 0.05''} \emph{We reject 14 hypothesis at a level of 0.05 after Bonferoni correction''} \end{center} Means: 14 hypotheses are not following the null distribution and we make this statement with a probability 0.05 of having fewer than one false positives in the 14 tests. \end{example} ## Multiple hypotheses solutions \begin{block}{False Discovery Rate (FDR)} \begin{itemize} \item Benjamini-Hochberg like procedure \item Control the risk of having less than a proportion of false positive \item $\Pr\left(\mathbb{E}\left[\frac{FP}{R}\right | R > 0]\right)\Pr\left(R > 0\right) < \alpha_{FDR}$ \item adaptive procedure $\alpha_{FDR} \sim f_0$ \end{itemize} \end{block} \only<1>{ \vspace{2em} \begin{tabular}{l|ccc} hypothesis & Claimed non-significant & Claimed significant & Total\\ \hline Null & TN & FP & $m_0$\\ Non-null & FN & TP & $m_1$\\ Total & S & R & $m$ \end{tabular} } \only<2-3>{ \begin{example} \begin{center} \emph{We reject 254 hypothesis with a FDR of 0.05''} \emph{We reject 254 hypothesis with a level of 0.05 after BH correction''} \end{center} Means: 254 hypotheses are not following the null distribution and we expect on average 5\% or less of false positives in the 254. \end{example} } \only<3>{ \begin{center} The number of FPs increases with the number of TPs \end{center} } ## Differential expression analysis between $2$ groups ## FWER versus FDR control \begin{center} \includegraphics[width=12cm]{img/pval_hist_H0} \end{center} $$\Pr\left(FP < 1\right) < \alpha_{FWER}$$ $$\Pr\left(\mathbb{E}\left[\frac{FP}{R}\right | R > 0]\right)\Pr\left(R > 0\right) < \alpha_{FDR}$$ When $TP \leq 1$ FWER and FDR control are identical.\\ The difference increases with the number of $TP$s ## Between $n$ groups ## FDR control # Regression analysis \begin{center} \includegraphics[width=12cm]{img/pval_hist_H0_H1}\\[-1em] \pause When we analyse data we hope to get a mixture between:\\ \includegraphics[width=12cm]{img/pval_hist_H0}\\[-2em] \pause \includegraphics[width=12cm]{img/pval_hist_H1} \end{center} # Multiple testing ## FDR control: local FDR ($\ell FDR$) of Efron \begin{center} \only<1>{\includegraphics[width=12cm]{img/pval_hist_H0_H1}} \only<2-3>{\includegraphics[width=12cm]{img/pval_hist_H0_H1_mixture}} \only<4>{\includegraphics[width=12cm]{img/zval_hist_H0_H1_mixture}} \end{center} \only<3-4>{ $$\ell FDR\left(x_i\right) = \frac{{\color{blue}\Pr\left(x_i | H_i = 0\right)}}{{\color{blue}\Pr\left(x_i | H_i = 0\right)}{\color{red}\Pr\left(x_i | H_i = 1\right)}}$$ } \only<4>{ \begin{center} work with $z$-values instead of $p$-values \end{center} } # Post-selection inference ## Post-selection inference \begin{center} \href{https://en.wikipedia.org/}{ \includegraphics[width=\textwidth]{img/post_inference.png} } \end{center} ## SimCD \begin{center} \href{https://arxiv.org/abs/2104.01512v1}{ \includegraphics[width=0.6\textwidth]{img/simCD.png} } \end{center} area under ROC curves # Multivariate Differential expression analysis \ No newline at end of file

74.5 KB

70.6 KB

73 KB

1.87 MB

298 KB

1.13 MB

53.7 KB