nextflow.config
title: "single-cell RNA-Seq: Normalization"
author: "Laurent Modolo [laurent.modolo@ens-lyon.fr](mailto:laurent.modolo@ens-lyon.fr)"
date: "Wednesday 8 June 2022"
output:
beamer_presentation:
df_print: tibble
fig_caption: no
highlight: tango
latex_engine: xelatex
slide_level: 2
theme: metropolis
ioslides_presentation:
highlight: tango
slidy_presentation:
highlight: tango
classoption: aspectratio=169
Introduction
Introduction
Program
- Single-cell RNASeq data from 10X Sequencing (Friday 3 June 2022 - 14:00)
- Normalization and spurious effects (Wednesday 8 June 2022 - 14:00)
- Dimension reduction and data visualization (Monday 13 June 2022 - 15:00)
- Clustering and annotation (Thursday 23 June 2022 - 14:00)
- Pseudo-time and velocity inference (Thursday 30 June 2022 - 14:00)
- Differential expression analysis (Friday 8 July 2022 - 14:00)
Introduction
Program
- Single-cell RNASeq data from 10X Sequencing (Friday 3 June 2022 - 14:00)
- Normalization and spurious effects (Wednesday 8 June 2022 - 14:00)
- Quality control
- Normalization
- Variance stabilization
- Depth normalization
- The monotonicity of the normalization
- batch effects
- Heterogeneous data
- Dimension reduction and data visualization (Monday 13 June 2022 - 15:00)
- Clustering and annotation (Thursday 23 June 2022 - 14:00)
- Pseudo-time and velocity inference (Thursday 30 June 2022 - 14:00)
- Differential expression analysis (Friday 8 July 2022 - 14:00)
Quality control
Cell filtering
\begin{center} \begin{columns} \column{0.5\textwidth} \begin{center} \begin{tikzpicture} \fill (0.5,3.5) node {\bf \text{gene}_1} -- (0.5,2.5) node {\bf \text{gene}_2} -- (0.5,1.5) node {\bf \vdots} -- (0.5,0.5) node {\bf \text{gene}_n}; \fill (1.5,4.5) node {\bf{\text{cell}_1}} -- (1.5,3.5) node {mRNA} -- (1.5,2.5) node {mRNA} -- (1.5,1.5) node {\vdots} -- (1.5,0.5) node {mRNA}; \fill (2.5,4.5) node {\color{red}\bf{\text{0 cell}_2}} -- (2.5,3.5) node {\color{red}mRNA} -- (2.5,2.5) node {\color{red}mRNA} -- (2.5,1.5) node {\color{red}\vdots} -- (2.5,0.5) node {\color{red}mRNA}; \fill (3.5,4.5) node {\bf{\cdots}} -- (3.5,3.5) node {\cdots} -- (3.5,2.5) node {\cdots} -- (3.5,1.5) node {\ddots} -- (3.5,0.5) node {\cdots}; \fill (4.5,4.5) node {\bf{\text{cell}_c}} -- (4.5,3.5) node {mRNA} -- (4.5,2.5) node {mRNA} -- (4.5,1.5) node {\vdots} -- (4.5,0.5) node {mRNA}; \draw (1,0) grid (5,4); \end{tikzpicture} \end{center}
\column{0.5\textwidth}
{\large Some cells are not cells.}
\begin{itemize} \item matrix columns are defined by {\bf cell barcode sequences} \item {\bf cell barcode sequences identify droplet} in the 10X protocol \end{itemize}
\end{columns} \end{center}
Cell filtering
\begin{center} \begin{columns} \column{0.5\textwidth} \begin{tikzpicture} \fill (0.5,3.5) node {\bf \text{gene}_1} -- (0.5,2.5) node {\bf \text{gene}_2} -- (0.5,1.5) node {\bf \vdots} -- (0.5,0.5) node {\bf \text{gene}_n}; \fill (1.5,4.5) node {\bf \text{bc}_1} -- (1.5,3.5) node {mRNA} -- (1.5,2.5) node {mRNA} -- (1.5,1.5) node {\vdots} -- (1.5,0.5) node {mRNA}; \fill (2.5,4.5) node {\color{red}\bf \text{bc}_2} -- (2.5,3.5) node {\color{red}mRNA} -- (2.5,2.5) node {\color{red}mRNA} -- (2.5,1.5) node {\color{red}\vdots} -- (2.5,0.5) node {\color{red}mRNA}; \fill (3.5,4.5) node {\bf{\cdots}} -- (3.5,3.5) node {\cdots} -- (3.5,2.5) node {\cdots} -- (3.5,1.5) node {\ddots} -- (3.5,0.5) node {\cdots}; \fill (4.5,4.5) node {\bf \text{bc}_c} -- (4.5,3.5) node {mRNA} -- (4.5,2.5) node {mRNA} -- (4.5,1.5) node {\vdots} -- (4.5,0.5) node {mRNA}; \draw (1,0) grid (5,4); \end{tikzpicture}
\column{0.5\textwidth}
{\large Some cells are not cells.}
\begin{itemize} \item {\bf v2} chemistry \sim 737,000 cell barcodes \item {\bf v3} chemistry \sim 3,500,000 cell barcodes \end{itemize}
\vspace{1em}
To avoid cell barcode collision we need [ \text{\bf cell number} \ll \text{\bf cell barcode number} ]
Most of the droplets will be empty
\end{columns} \end{center}
Cell filtering
\begin{center} \begin{columns} \column{0.35\textwidth} Sequenced empty droplets: \begin{itemize} \item do not express many genes \item looks like experimental noise \end{itemize}
\vspace{1em}
The number of UMI per cell barcode
\column{0.7\textwidth} \vspace{1.5em} \includegraphics[width=\textwidth]{img/cell_barcode_rank_vs_umi.png} \end{columns} \end{center}
Cell filtering
\begin{center} \begin{columns} \column{0.35\textwidth} We have {\bf two populations} of cell barcode: \begin{itemize} \item a {\bf low} total UMI counts one \item a {\bf high} total UMI counts one \end{itemize}
\column{0.7\textwidth} \vspace{1.5em} \includegraphics[width=\textwidth]{img/cell_barcode_rank_vs_umi.png} \end{columns} \end{center}
Cell filtering
\begin{center} \begin{columns} \column{0.5\textwidth} \begin{tikzpicture} \fill (0.5,3.5) node {\bf \text{gene}_1} -- (0.5,2.5) node {\bf \text{gene}_2} -- (0.5,1.5) node {\bf \vdots} -- (0.5,0.5) node {\bf \text{gene}_n}; \fill (1.5,4.5) node {\bf \text{cell}_1} -- (1.5,3.5) node {mRNA} -- (1.5,2.5) node {mRNA} -- (1.5,1.5) node {\vdots} -- (1.5,0.5) node {mRNA}; \fill (2.5,4.5) node {\color{red}\bf \text{2 cells}_2} -- (2.5,3.5) node {\color{red}mRNA} -- (2.5,2.5) node {\color{red}mRNA} -- (2.5,1.5) node {\color{red}\vdots} -- (2.5,0.5) node {\color{red}mRNA}; \fill (3.5,4.5) node {\bf \cdots} -- (3.5,3.5) node {\cdots} -- (3.5,2.5) node {\cdots} -- (3.5,1.5) node {\ddots} -- (3.5,0.5) node {\cdots}; \fill (4.5,4.5) node {\bf \text{cell}_c} -- (4.5,3.5) node {mRNA} -- (4.5,2.5) node {mRNA} -- (4.5,1.5) node {\vdots} -- (4.5,0.5) node {mRNA}; \draw (1,0) grid (5,4); \end{tikzpicture}
\column{0.5\textwidth}
{\large Some cells are many cells:}
\begin{itemize} \item not all tissues are easily dissociable \item two cells glued together will share the same droplet \item two different cells can share the same droplet by chance \end{itemize}
\vspace{1em}
Cell barcode corresponding to n-plet should be in the minority if the preparation went well.
\end{columns} \end{center}
Cell filtering
\begin{center} \includegraphics[width=0.75\textwidth]{img/mouse_human_mix.png} \end{center}
Cell filtering
hypothesis
Cell barcode corresponding to n-plet should be in the minority if the preparation went well.
Algorithm
- Simulate thousands of doublets by adding together two randomly chosen single-cell profiles.
- For each original cell, compute the density of simulated doublets in the surrounding neighborhood.
- For each original cell, compute the density of other observed cells in the neighborhood.
- Return the ratio between the two density as a doublet score for each cell.
Cell filtering
\begin{center} \includegraphics[width=\textwidth]{img/doublet_detection_comparison.png} \end{center}
Different algorithms are available to compare cells to synthetic doublets
Cell filtering
\begin{center} \includegraphics[width=0.8\textwidth]{img/features_for_QC_1.png} \end{center} \vspace{-1.5em} We can use hard thresholds to remove putative poor quality cells \vspace{-0.5em} \begin{itemize} \item apoptotic cells express MT genes \item incefficient RT or PCR amplification \end{itemize}
Cell filtering
\begin{center} \includegraphics[width=0.8\textwidth]{img/features_for_QC_2.png} \end{center}
Cell expressing few genes also contains few mRNA molecules
Gene filtering
\begin{center} \begin{columns} \column{0.5\textwidth} \begin{center} \begin{tikzpicture} \fill (0.5,3.5) node {\bf \text{gene}_1} -- (0.5,2.5) node {\bf \text{gene}_2} -- (0.5,1.5) node {\bf \vdots} -- (0.5,0.5) node {\bf \text{gene}_n}; \fill (1.5,4.5) node {\bf{\text{cell}_1}} -- (1.5,3.5) node {mRNA} -- (1.5,2.5) node {\color{red}0} -- (1.5,1.5) node {\vdots} -- (1.5,0.5) node {mRNA}; \fill (2.5,4.5) node {\bf{\text{0 cell}_2}} -- (2.5,3.5) node {mRNA} -- (2.5,2.5) node {\color{red}0} -- (2.5,1.5) node {\vdots} -- (2.5,0.5) node {mRNA}; \fill (3.5,4.5) node {\bf{\cdots}} -- (3.5,3.5) node {\cdots} -- (3.5,2.5) node {\cdots} -- (3.5,1.5) node {\ddots} -- (3.5,0.5) node {\cdots}; \fill (4.5,4.5) node {\bf{\text{cell}_c}} -- (4.5,3.5) node {mRNA} -- (4.5,2.5) node {\color{red}0} -- (4.5,1.5) node {\vdots} -- (4.5,0.5) node {mRNA}; \draw (1,0) grid (5,4); \end{tikzpicture} \end{center}
\column{0.5\textwidth}