### update dea

parent 52a1d388
Pipeline #350 passed with stage
in 3 minutes and 18 seconds
 ... ... @@ -26,7 +26,7 @@ classoption: aspectratio=169 - Differential expression analysis between groups - Regression analysis - Multiple testing - Multivariate Differential expression analysis - Multivariate differential expression analysis # Hypothesis testing ... ... @@ -38,7 +38,7 @@ classoption: aspectratio=169 We reject the hypothesis at risk $\alpha$, the probability that the null hypothesis was true for the observed value. ### $p$-value The $p$-value is the probability to observe a value as or more extreme under the null hypothesis model. the $p$-value is the probability to observe a value as or more extreme under the null hypothesis model. ## Hypothesis testing ... ... @@ -162,7 +162,7 @@ receiver operating characteristic \end{tikzpicture} \end{center} \column{0.5\textwidth} For a given gene $x_i$ we can test: For a given gene $x_i$ we can test \vspace{1em} \begin{itemize} \item $H_0$: $E\left(x_i\right) = E\left(x_{i'}\right)$ ... ... @@ -206,7 +206,7 @@ $P(X = x)$ for $\mathcal{NB}(\lambda, \alpha = 1)$ \includegraphics[width=0.7\textwidth]{img/NB_sigma_1.png} \end{center} ## Non-parametric approaches ## Nonparametric approaches ### We don't try to model the data distribution ... ... @@ -214,16 +214,16 @@ Instead we work with: - ranks of the values - the sign of the difference between two groups (Wilcoxon) - the distribution of differances - the distribution of differences If we know the distribution the parametric approach is often more powerfull If we know the distribution, the parametric approach is often more powerful. ### Often limited to the 2 groups setting ## Wilcoxon rank sum test ### $H_0$: the median are equal ### $H_0$: the medians are equal \begin{center} \href{https://www.nature.com/articles/s41467-021-27464-5}{ ... ... @@ -233,7 +233,7 @@ If we know the distribution the parametric approach is often more powerfull ## WaddR ### Base on 2-Wasserstein distance ### Based on 2-Wasserstein distance \begin{center} \href{https://pubmed.ncbi.nlm.nih.gov/33792651/}{ ... ... @@ -241,7 +241,7 @@ If we know the distribution the parametric approach is often more powerfull } \end{center} ## Model based approaches ## Model-based approaches \begin{center} \begin{columns} ... ... @@ -282,14 +282,14 @@ X \sim \pi \delta_0 + \left(1 - \pi\right) \mathcal{NB}(\lambda, \alpha) \end{center} ## Model based approaches ## Model-based approaches ### NB distributed counts with excess of zeros \begin{center} \includegraphics[width=0.8\textwidth]{img/ziNB_1} \end{center} ## Model based approaches ## Model-based approaches ### Mixture of two NB distributions ... ... @@ -297,9 +297,32 @@ X \sim \pi \delta_0 + \left(1 - \pi\right) \mathcal{NB}(\lambda, \alpha) \includegraphics[width=0.8\textwidth]{img/ziNB_2} \end{center} ## Model based approaches ### $y = \beta_0 + \beta_1 x$ ## Model-based approaches ### GLM framework $X_i \sim \mathcal{NB}(\lambda, \alpha)$ $E(X_i|\mathbf{Y}) = \boldsymbol{\mu}_i = g^{-1}(\mathbf{Y}\boldsymbol{\beta})$ with : \begin{itemize} \item $\boldsymbol{\mu}_i$ the mean of the gene $i$ distribution \item $g$ is the link function \item $\beta$ the unknown parameters of the model \end{itemize} $E(X_i|\mathbf{Y}) = \boldsymbol{\mu}_i = g^{-1}(Y_1 \beta_1 + \dots Y_n \beta_n)$ ### We can also model the variance as a function of the mean $Var(X_i|\mathbf{Y}) = V( \boldsymbol{\mu}_i ) = \operatorname{V}(g^{-1}(\mathbf{X}\boldsymbol{\beta})).$ ## Model-based approaches ### $\boldsymbol{\mu}_i = \beta_0 + \beta_1 y_1$ \begin{center} \includegraphics[width=0.7\textwidth]{img/lm_2_groups_b0_3_b1_05.png} ... ... @@ -307,19 +330,19 @@ X \sim \pi \delta_0 + \left(1 - \pi\right) \mathcal{NB}(\lambda, \alpha) $\beta_0 = 3$, $\beta_1 = 0.5$ ## Model based approaches ## Model-based approaches ### $y = \beta_0 + \beta_1 x$ ### $\boldsymbol{\mu}_i = \beta_0 + \beta_1 y_1$ ### Wald test: $H_0: \beta_1 = 0$ ### Likelihood ratio test (LTR) $H_0: L\left(y = \beta_0\right) = L\left(y = \beta_0 + \beta_1 x\right)$ $H_0: L\left(\boldsymbol{\mu}_i = \beta_0\right) = L\left(\boldsymbol{\mu}_i = \beta_0 + \beta_1 y_1\right)$ ## Model based approaches ## Model-based approaches ### $y = \beta_0 + \beta_1 x$ ### $\boldsymbol{\mu}_i = \beta_0 + \beta_1 y_1$ \begin{center} \includegraphics[width=0.7\textwidth]{img/lm_b0_3_b1_05.png} ... ... @@ -327,62 +350,83 @@ $\beta_0 = 3$, $\beta_1 = 0.5$ $\beta_0 = 3$, $\beta_1 = 0.5$ ## Model based approaches ## Model-based approaches ### $y = \beta_0 + \beta_1 x$ ### $\boldsymbol{\mu}_i = \beta_0 + \beta_1 y_1$ \begin{center} \href{https://cole-trapnell-lab.github.io/monocle3/}{ \includegraphics[width=0.7\textwidth]{img/deg_pseudotime.png} \includegraphics[width=0.6\textwidth]{img/deg_pseudotime.png} } \end{center} ## Model based approaches ## Model-based approaches ### $y = \beta_0 + \beta_1 x_1 + \beta_2 x_2$ ### $\boldsymbol{\mu}_i = \beta_0 + \beta_1 y_1 + \beta_2 y_2$ \begin{center} \includegraphics[width=0.7\textwidth]{img/lm_2_groups_b0_b0_3_b1_05.png} \end{center} $\beta_0 = 3$, $\beta_1 = 0.5$ $\beta_2 = 5$ $\beta_0 = 3$, $\beta_1 = 0.5$, $\beta_2 = 5$ ## Model-based approaches ### $\boldsymbol{\mu}_i = \beta_0 + \beta_1 y_1 + \beta_2 y_2$ \begin{center} \href{https://www.sciencedirect.com/science/article/pii/S2211124721005192}{ \includegraphics[width=0.9\textwidth]{img/deg_time_group.png} } \end{center} ## Model based approaches ## Model-based approaches ### $y = \beta_0 + \beta_1 x_1 + \beta_2 x_2$ ### $\boldsymbol{\mu}_i = \beta_0 + \beta_1 y_1 + \beta_2 y_2 + \beta_3 y_1 y_2$ \begin{center} \includegraphics[width=0.7\textwidth]{img/lm_2_groups_b0_b0_3_b1_05_interaction.png} \end{center} $\beta_0 = 3$, $\beta_1 = 0.5$ $\beta_2 = 5$ $\beta_0 = 3$, $\beta_1 = 0.5$, $\beta_2 = 5$, $\beta_3 = -0.4$ ## Model based approaches ## Model-based approaches ### $y = \beta_0 + \beta_1 x_1 + \beta_2 x_2$ ### $\boldsymbol{\mu}_i = \beta_0 + \beta_1 y_1 + \beta_2 y_2 + \beta_3 y_1 y_2$ \begin{center} \includegraphics[width=0.7\textwidth]{img/lm_2_groups_2_factors_b0_b0_3_b1_05_interaction.png} \end{center} $\beta_0 = 3$, $\beta_1 = 0.5$ $\beta_2 = 5$ $\beta_0 = 3$, $\beta_1 = 0.5$, $\beta_2 = 5$, $\beta_3 = -0.4$ ## Model based approaches ## Model-based approaches ### $y = \beta_0 + \beta_1 x_1 + \beta_2 x_2$ ### $\boldsymbol{\mu}_i = \beta_0 + \beta_1 y_1 + \beta_2 y_2$ \begin{center} href{doi: 10.1093/nar/gky675}{ \includegraphics[width=0.6\textwidth]{img/deg_time_group.png} \href{doi: 10.1093/nar/gky675}{ \includegraphics[width=0.35\textwidth]{img/deg_time_group_inter.png} } \end{center} ## Model-based approaches ### $\boldsymbol{\mu}_i = \beta_0 + \beta_1 y_1 + \beta_2 Z$ $Z \sim \mathcal{N}(\mu_z, \sigma_z)$ \begin{center} \href{https://www.sciencedirect.com/science/article/pii/S2211124721005192}{ \includegraphics[width=0.35\textwidth]{img/deg_time_mixed.png} } \end{center} # Multiple hypotheses testing ## Multiple hypotheses problem ## Multiple hypothesis problem \begin{center} \only<1>{\includegraphics[width=10cm]{img/dnorm_abs}\\[-2.5em]} \only<1>{\includegraphics[width=10cm]{img/pval_alpha}} \only<1>{\includegraphics[width=10cm]{img/pval_2_0.05}} \only<2>{\includegraphics[width=10cm]{img/pval_alpha_random_H0_1} \begin{center} n = 10 ... ... @@ -426,7 +470,7 @@ $\beta_0 = 3$, $\beta_1 = 0.5$ $\beta_2 = 5$ \end{center} ## Multiple hypotheses solutions ## Multiple hypothesis solutions \begin{block}{Family Wise Error Rate (FWER)} \begin{itemize} ... ... @@ -438,13 +482,13 @@ $\beta_0 = 3$, $\beta_1 = 0.5$ $\beta_2 = 5$ \end{block} \begin{example} \begin{center} \emph{We reject 14 hypothesis with a FWER of 0.05''} \emph{We reject 14 hypothesis at a level of 0.05 after Bonferoni correction''} \emph{We reject 14 hypotheses with a FWER of 0.05''} \emph{We reject 14 hypotheses at a level of 0.05 after Bonferoni correction''} \end{center} Means: 14 hypotheses are not following the null distribution and we make this statement with a probability 0.05 of having fewer than one false positives in the 14 tests. \end{example} ## Multiple hypotheses solutions ## Multiple hypothesis solutions \begin{block}{False Discovery Rate (FDR)} \begin{itemize} ... ... @@ -456,26 +500,28 @@ $\beta_0 = 3$, $\beta_1 = 0.5$ $\beta_2 = 5$ \end{block} \only<1>{ \vspace{2em} \begin{center} \begin{tabular}{l|ccc} hypothesis & Claimed non-significant & Claimed significant & Total\\ hypothesis & Claimed nonsignificant & Claimed significant & Total\\ \hline Null & TN & FP & $m_0$\\ Non-null & FN & TP & $m_1$\\ Total & S & R & $m$ \end{tabular} \end{center} } \only<2-3>{ \begin{example} \begin{center} \emph{We reject 254 hypothesis with a FDR of 0.05''} \emph{We reject 254 hypothesis with a level of 0.05 after BH correction''} \emph{We reject 254 hypotheses with a FDR of 0.05''} \emph{We reject 254 hypotheses with a level of 0.05 after BH correction''} \end{center} Means: 254 hypotheses are not following the null distribution and we expect on average 5\% or less of false positives in the 254. \end{example} } \only<3>{ \begin{center} The number of FPs increases with the number of TPs {\bf The number of FPs increases with the number of TPs} \end{center} } ... ... @@ -485,18 +531,18 @@ $\beta_0 = 3$, $\beta_1 = 0.5$ $\beta_2 = 5$ \end{center} $$\Pr\left(FP < 1\right) < \alpha_{FWER}$$ $$\Pr\left(\mathbb{E}\left[\frac{FP}{R}\right | R > 0]\right)\Pr\left(R > 0\right) < \alpha_{FDR}$$ When $TP \leq 1$ FWER and FDR control are identical.\\ when $TP \leq 1$ FWER and FDR control are identical.\\ The difference increases with the number of $TP$s ## FDR control \begin{center} \includegraphics[width=12cm]{img/pval_hist_H0_H1}\\[-1em] \includegraphics[width=11cm]{img/pval_hist_H0_H1}\\[-1em] \pause When we analyse data we hope to get a mixture between:\\ \includegraphics[width=12cm]{img/pval_hist_H0}\\[-2em] When we analyze data we hope to get a mixture between:\\ \includegraphics[width=11cm]{img/pval_hist_H0}\\[-2em] \pause \includegraphics[width=12cm]{img/pval_hist_H1} \includegraphics[width=11cm]{img/pval_hist_H1} \end{center} ## FDR control: local FDR ($\ell FDR$) of Efron ... ... @@ -525,11 +571,19 @@ When we analyse data we hope to get a mixture between:\\ } \end{center} ## Post-selection inference \begin{center} \href{https://pubmed.ncbi.nlm.nih.gov/30206223/}{ \includegraphics[width=0.75\textwidth]{img/post_inference_example.png} } \end{center} ## SimCD \begin{center} \href{https://arxiv.org/abs/2104.01512v1}{ \includegraphics[width=0.6\textwidth]{img/simCD.png} \includegraphics[width=\textwidth]{img/simCD.png} } \end{center} ... ... 1.13 MB | W: | H:

1.92 MB | W: | H:  • 2-up
• Swipe
• Onion skin

1.13 MB

575 KB

2.22 MB

Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment