Skip to content
Snippets Groups Projects
Verified Commit ead99e62 authored by Laurent Modolo's avatar Laurent Modolo
Browse files

2_normalization.Rmd: update

parent fe2e795a
Branches
No related tags found
No related merge requests found
Pipeline #324 failed
Showing with 286 additions and 15 deletions
......@@ -957,6 +957,8 @@ With $K$ the $k$-compatibility class counts and $\beta$ the transcript quantific
\includegraphics[width=\textwidth]{img/scasa_vs_other.png}
\end{center}
# scRNA data normalization: Friday 8 June 2022
## References
......
2_normalization/img/NB_sigma_1.png

74.5 KiB

2_normalization/img/NB_sigma_10.png

70.6 KiB

2_normalization/img/NB_sigma_2.png

73 KiB

2_normalization/img/doublet_detection_comparison.png

227 KiB

2_normalization/img/features_for_QC_1.png

139 KiB

2_normalization/img/features_for_QC_2.png

64.3 KiB

2_normalization/img/mouse_human_mix.png

92 KiB

2_normalization/img/mu_vs_var.png

43.2 KiB

2_normalization/img/poisson.png

69.2 KiB

2_normalization/img/sanity_model_a.png

391 KiB

2_normalization/img/sanity_model_a_bis.png

352 KiB

---
title: "single-cell RNA-Seq: Normalization"
author: "Laurent Modolo [laurent.modolo@ens-lyon.fr](mailto:laurent.modolo@ens-lyon.fr)"
date: "Friday 3 June 2022"
date: "Friday 8 June 2022"
output:
beamer_presentation:
df_print: tibble
......@@ -107,7 +107,6 @@ classoption: aspectratio=169
\begin{center}
\begin{columns}
\column{0.5\textwidth}
\begin{center}
\begin{tikzpicture}
\fill
(0.5,3.5) node {\bf $\text{gene}_1$}
......@@ -115,13 +114,13 @@ classoption: aspectratio=169
-- (0.5,1.5) node {\bf $\vdots$}
-- (0.5,0.5) node {\bf $\text{gene}_n$};
\fill
(1.5,4.5) node {\bf{$\text{bc}_1$}}
(1.5,4.5) node {\bf $\text{bc}_1$}
-- (1.5,3.5) node {mRNA}
-- (1.5,2.5) node {mRNA}
-- (1.5,1.5) node {$\vdots$}
-- (1.5,0.5) node {mRNA};
\fill
(2.5,4.5) node {\color{red}\bf{$\text{bc}_2$}}
(2.5,4.5) node {\color{red}\bf $\text{bc}_2$}
-- (2.5,3.5) node {\color{red}mRNA}
-- (2.5,2.5) node {\color{red}mRNA}
-- (2.5,1.5) node {\color{red}$\vdots$}
......@@ -133,14 +132,13 @@ classoption: aspectratio=169
-- (3.5,1.5) node {$\ddots$}
-- (3.5,0.5) node {$\cdots$};
\fill
(4.5,4.5) node {\bf{$\text{bc}_c$}}
(4.5,4.5) node {\bf $\text{bc}_c$}
-- (4.5,3.5) node {mRNA}
-- (4.5,2.5) node {mRNA}
-- (4.5,1.5) node {$\vdots$}
-- (4.5,0.5) node {mRNA};
\draw (1,0) grid (5,4);
\end{tikzpicture}
\end{center}
\column{0.5\textwidth}
......@@ -206,7 +204,6 @@ Most of the droplets will be empty
\begin{center}
\begin{columns}
\column{0.5\textwidth}
\begin{center}
\begin{tikzpicture}
\fill
(0.5,3.5) node {\bf $\text{gene}_1$}
......@@ -214,55 +211,327 @@ Most of the droplets will be empty
-- (0.5,1.5) node {\bf $\vdots$}
-- (0.5,0.5) node {\bf $\text{gene}_n$};
\fill
(1.5,4.5) node {\bf{$\text{cell}_1$}}
(1.5,4.5) node {\bf $\text{cell}_1$}
-- (1.5,3.5) node {mRNA}
-- (1.5,2.5) node {mRNA}
-- (1.5,1.5) node {$\vdots$}
-- (1.5,0.5) node {mRNA};
\fill
(2.5,4.5) node {\color{red}\bf{$\text{2 cells}_2$}}
(2.5,4.5) node {\color{red}\bf $\text{2 cells}_2$}
-- (2.5,3.5) node {\color{red}mRNA}
-- (2.5,2.5) node {\color{red}mRNA}
-- (2.5,1.5) node {\color{red}$\vdots$}
-- (2.5,0.5) node {\color{red}mRNA};
\fill
(3.5,4.5) node {\bf{$\cdots$}}
(3.5,4.5) node {\bf $\cdots$}
-- (3.5,3.5) node {$\cdots$}
-- (3.5,2.5) node {$\cdots$}
-- (3.5,1.5) node {$\ddots$}
-- (3.5,0.5) node {$\cdots$};
\fill
(4.5,4.5) node {\bf{$\text{cell}_c$}}
(4.5,4.5) node {\bf $\text{cell}_c$}
-- (4.5,3.5) node {mRNA}
-- (4.5,2.5) node {mRNA}
-- (4.5,1.5) node {$\vdots$}
-- (4.5,0.5) node {mRNA};
\draw (1,0) grid (5,4);
\end{tikzpicture}
\end{center}
\column{0.5\textwidth}
{\large Some cells are many cells.}
{\large Some cells are many cellsr:}
\begin{itemize}
\item not all tissues are easily dissociable
\item two cells glued together will share the same droplet
\item two different cells can share the same droplet by chance
\end{itemize}
\vspace{1em}
cell barcode corresponding to $n$-plet should be in monority the the preparation went well.
Cell barcode corresponding to $n$-plet should be in monority the the preparation went well.
\end{columns}
\end{center}
## Cell filtering
apoptotic cells express MT genes
\begin{center}
\includegraphics[width=0.75\textwidth]{img/mouse_human_mix.png}
\end{center}
## Cell filtering
\begin{block}{hypothesis}
Cell barcode corresponding to $n$-plet should be in monority the the preparation went well.
\end{block}
### Algorithm
1. Simulate thousands of doublets by adding together two randomly chosen single-cell profiles.
2. For each original cell, compute the density of simulated doublets in the surrounding neighborhood.
3. For each original cell, compute the density of other observed cells in the neighborhood.
4. Return the ratio between the two density as a **doublet score** for each cell.
## Cell filtering
\begin{center}
\includegraphics[width=\textwidth]{img/doublet_detection_comparison.png}
\end{center}
Different algorithm are available to compare cells to synthetic doublets
## Cell filtering
\begin{center}
\includegraphics[width=0.8\textwidth]{img/features_for_QC_1.png}
\end{center}
\vspace{-1.5em}
We can use hard thresholds to remove putative poor quality cells
\vspace{-0.5em}
\begin{itemize}
\item apoptotic cells express MT genes
\item incefficient RT or PCR amplification
\end{itemize}
## Cell filtering
\begin{center}
\includegraphics[width=0.8\textwidth]{img/features_for_QC_2.png}
\end{center}
Cell expressing few genes also contains few mRNA molecule
# Normalization
## Counts model
\begin{center}
\includegraphics[width=\textwidth]{img/sanity_model_a_bis.png}
\end{center}
## Counts distribution
### Random variable
A variable whose values depends on outcomes of a random phenomenon or experiment.
### For a given gene:
We consider $X$ a **random variable** with $x$ a realisation of $X$ the number of mRNA's observed in a cell.
\begin{itemize}
\item The random variable $X$ follow a statitical distribution $F$
\item We write $X \sim F$
\end{itemize}
## Counts model
\begin{center}
\includegraphics[width=\textwidth]{img/sanity_model_a_bis.png}
\end{center}
With a transcription rate $\lambda_g(t)$ the observed mRNA count follow a Poisson distribution $\mathcal{P}(\lambda_g(t))$.
## Counts distribution
$P(X = x)$ for $\mathcal{P}(\lambda_g)$
\begin{center}
\includegraphics[width=0.6\textwidth]{./img/poisson.png}
\end{center}
$\lambda_g$ the rate of mRNA production is equal to the variance in the number
of mRNA.
## Counts
\begin{center}
\begin{columns}
\column{0.5\textwidth}
\begin{center}
\begin{tikzpicture}
\fill
(0.5,3.5) node {\bf $\text{gene}_1$}
-- (0.5,2.5) node {\bf $\text{gene}_2$}
-- (0.5,1.5) node {\bf $\vdots$}
-- (0.5,0.5) node {\bf $\text{gene}_n$};
\fill
(1.5,4.5) node {\bf{$\text{cell}_1$}}
-- (1.5,3.5) node {mRNA}
-- (1.5,2.5) node {\color{red}mRNA}
-- (1.5,1.5) node {$\vdots$}
-- (1.5,0.5) node {mRNA};
\fill
(2.5,4.5) node {\bf{$\text{cell}_2$}}
-- (2.5,3.5) node {mRNA}
-- (2.5,2.5) node {\color{red}mRNA}
-- (2.5,1.5) node {$\vdots$}
-- (2.5,0.5) node {mRNA};
\fill
(3.5,4.5) node {\bf{$\cdots$}}
-- (3.5,3.5) node {$\cdots$}
-- (3.5,2.5) node {\color{red}$\cdots$}
-- (3.5,1.5) node {$\ddots$}
-- (3.5,0.5) node {$\cdots$};
\fill
(4.5,4.5) node {\bf{$\text{cell}_c$}}
-- (4.5,3.5) node {mRNA}
-- (4.5,2.5) node {\color{red}mRNA}
-- (4.5,1.5) node {$\vdots$}
-- (4.5,0.5) node {mRNA};
\draw (1,0) grid (5,4);
\end{tikzpicture}
\end{center}
\column{0.6\textwidth}
For a gene $g$, {\bf each cell is an observation} of the mRNA count of $g$
As we have a large number of cells, we have access to the:
\begin{itemize}
\item empirical mean
\item empirical variance
\item empirical distribution
\end{itemize}
\vspace{1em}
bulk RNASeq $\sim 3$ observation per gene
\end{columns}
\end{center}
## Counts distributions
$P(X = x)$ for $\mathcal{P}(\mu)$
\begin{center}
\includegraphics[width=0.6\textwidth]{./img/poisson.png}
\end{center}
$\mu$ the rate of mRNA production is equal to the variability in the number
of mRNA.
**We often have more variability! (broader distributions)**
## Counts model
\begin{center}
\includegraphics[width=\textwidth]{img/sanity_model_a_bis.png}
\end{center}
Cells are not exact replicates of one anothers: a large numbers of factors can be different between two cells
$\lambda_g(t)$ is a **random variable**
## Counts distributions
\begin{center}
\begin{columns}
\column{0.4\textwidth}
$X \sim \mathcal{P}(\lambda)$: $\sigma^2 = \lambda$
\vspace{2em}
$X \sim \mathcal{NB}(\lambda, \sigma)$: $\sigma^2 = \lambda + \alpha \lambda^2$
\column{0.6\textwidth}
\vspace{1em}
\includegraphics[width=0.9\textwidth]{./img/mu_vs_var.png}
\end{columns}
\end{center}
## Counts distributions
$P(X = x)$ for $\mathcal{NB}(\mu, \sigma)$
\begin{center}
\includegraphics[width=0.8\textwidth]{./img/poisson.png}
\end{center}
## Counts distributions
$P(X = x)$ for $\mathcal{NB}(\mu, \sigma = 10)$
\begin{center}
\includegraphics[width=0.8\textwidth]{./img/NB_sigma_10.png}
\end{center}
## Counts distributions
$P(X = x)$ for $\mathcal{NB}(\mu, \sigma = 2)$
\begin{center}
\includegraphics[width=0.8\textwidth]{./img/NB_sigma_2.png}
\end{center}
## Counts distributions
$P(X = x)$ for $\mathcal{NB}(\mu, \sigma = 1)$
\begin{center}
\includegraphics[width=0.8\textwidth]{./img/NB_sigma_1.png}
\end{center}
## Variance of count data
\begin{center}
\begin{columns}
\column{0.5\textwidth}
\begin{center}
\begin{tikzpicture}
\fill
(0.5,3.5) node {\bf $\text{gene}_1$}
-- (0.5,2.5) node {\bf $\text{gene}_2$}
-- (0.5,1.5) node {\bf $\vdots$}
-- (0.5,0.5) node {\bf $\text{gene}_n$};
\fill
(1.5,4.5) node {\bf{$\text{cell}_1$}}
-- (1.5,3.5) node {mRNA}
-- (1.5,2.5) node {\color{red}mRNA}
-- (1.5,1.5) node {$\vdots$}
-- (1.5,0.5) node {mRNA};
\fill
(2.5,4.5) node {\bf{$\text{cell}_2$}}
-- (2.5,3.5) node {mRNA}
-- (2.5,2.5) node {\color{red}mRNA}
-- (2.5,1.5) node {$\vdots$}
-- (2.5,0.5) node {mRNA};
\fill
(3.5,4.5) node {\bf{$\cdots$}}
-- (3.5,3.5) node {$\cdots$}
-- (3.5,2.5) node {\color{red}$\cdots$}
-- (3.5,1.5) node {$\ddots$}
-- (3.5,0.5) node {$\cdots$};
\fill
(4.5,4.5) node {\bf{$\text{cell}_c$}}
-- (4.5,3.5) node {mRNA}
-- (4.5,2.5) node {\color{red}mRNA}
-- (4.5,1.5) node {$\vdots$}
-- (4.5,0.5) node {mRNA};
\draw (1,0) grid (5,4);
\end{tikzpicture}
\end{center}
\column{0.6\textwidth}
Which mRNA comparison seems the most significant to you :
\begin{itemize}
\item $50$ vs $5$
\item $10050$ vs $10000$
\end{itemize}
We want to consider that larger
\end{columns}
\end{center}
# Variance stabilization
# Depth normalization
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment