Skip to content
Snippets Groups Projects
Select Git revision
  • 88935526d056248aad6e5c93ecd6d1fa7c7525c5
  • master default protected
  • fmortreu-master-patch-11d2
  • origin_hicstuff
  • dev
  • modified_containers
  • hicstuff
  • nico
  • TEMPLATE
  • nf-core-template-merge-2.7.2
  • nf-core-template-merge-2.7.1
  • nf-core-template-merge-2.6
  • nf-core-template-merge-2.5.1
  • nf-core-template-merge-2.5
  • nf-core-template-merge-2.4
  • nf-core-template-merge-2.3.2
  • nf-core-template-merge-2.3.1
  • nf-core-template-merge-2.3
  • nf-core-template-merge-2.2
  • nf-core-template-merge-2.1
  • nf-core-template-merge-2.0.1
  • 1.3.0
  • 1.2.2
  • 1.2.1
  • 1.2.0
  • 1.1.0
  • 1.0.0
27 results

nextflow.config

Blame
  • normalization.Rmd 20.42 KiB
    title: "single-cell RNA-Seq: Normalization"
    author: "Laurent Modolo [laurent.modolo@ens-lyon.fr](mailto:laurent.modolo@ens-lyon.fr)"
    date: "Wednesday 8 June 2022"
    output:
      beamer_presentation:
        df_print: tibble
        fig_caption: no
        highlight: tango
        latex_engine: xelatex
        slide_level: 2
        theme: metropolis
      ioslides_presentation:
        highlight: tango
      slidy_presentation:
        highlight: tango
    classoption: aspectratio=169  

    Introduction

    Introduction

    Program

    1. Single-cell RNASeq data from 10X Sequencing (Friday 3 June 2022 - 14:00)
    2. Normalization and spurious effects (Wednesday 8 June 2022 - 14:00)
    3. Dimension reduction and data visualization (Monday 13 June 2022 - 15:00)
    4. Clustering and annotation (Thursday 23 June 2022 - 14:00)
    5. Pseudo-time and velocity inference (Thursday 30 June 2022 - 14:00)
    6. Differential expression analysis (Friday 8 July 2022 - 14:00)

    Introduction

    Program

    1. Single-cell RNASeq data from 10X Sequencing (Friday 3 June 2022 - 14:00)
    2. Normalization and spurious effects (Wednesday 8 June 2022 - 14:00)
    • Quality control
    • Normalization
      • Variance stabilization
      • Depth normalization
      • The monotonicity of the normalization
    • batch effects
    • Heterogeneous data
    1. Dimension reduction and data visualization (Monday 13 June 2022 - 15:00)
    2. Clustering and annotation (Thursday 23 June 2022 - 14:00)
    3. Pseudo-time and velocity inference (Thursday 30 June 2022 - 14:00)
    4. Differential expression analysis (Friday 8 July 2022 - 14:00)

    Quality control

    Cell filtering

    \begin{center} \begin{columns} \column{0.5\textwidth} \begin{center} \begin{tikzpicture} \fill (0.5,3.5) node {\bf \text{gene}_1} -- (0.5,2.5) node {\bf \text{gene}_2} -- (0.5,1.5) node {\bf \vdots} -- (0.5,0.5) node {\bf \text{gene}_n}; \fill (1.5,4.5) node {\bf{\text{cell}_1}} -- (1.5,3.5) node {mRNA} -- (1.5,2.5) node {mRNA} -- (1.5,1.5) node {\vdots} -- (1.5,0.5) node {mRNA}; \fill (2.5,4.5) node {\color{red}\bf{\text{0 cell}_2}} -- (2.5,3.5) node {\color{red}mRNA} -- (2.5,2.5) node {\color{red}mRNA} -- (2.5,1.5) node {\color{red}\vdots} -- (2.5,0.5) node {\color{red}mRNA}; \fill (3.5,4.5) node {\bf{\cdots}} -- (3.5,3.5) node {\cdots} -- (3.5,2.5) node {\cdots} -- (3.5,1.5) node {\ddots} -- (3.5,0.5) node {\cdots}; \fill (4.5,4.5) node {\bf{\text{cell}_c}} -- (4.5,3.5) node {mRNA} -- (4.5,2.5) node {mRNA} -- (4.5,1.5) node {\vdots} -- (4.5,0.5) node {mRNA}; \draw (1,0) grid (5,4); \end{tikzpicture} \end{center}

    \column{0.5\textwidth}

    {\large Some cells are not cells.}

    \begin{itemize} \item matrix columns are defined by {\bf cell barcode sequences} \item {\bf cell barcode sequences identify droplet} in the 10X protocol \end{itemize}

    \end{columns} \end{center}

    Cell filtering

    \begin{center} \begin{columns} \column{0.5\textwidth} \begin{tikzpicture} \fill (0.5,3.5) node {\bf \text{gene}_1} -- (0.5,2.5) node {\bf \text{gene}_2} -- (0.5,1.5) node {\bf \vdots} -- (0.5,0.5) node {\bf \text{gene}_n}; \fill (1.5,4.5) node {\bf \text{bc}_1} -- (1.5,3.5) node {mRNA} -- (1.5,2.5) node {mRNA} -- (1.5,1.5) node {\vdots} -- (1.5,0.5) node {mRNA}; \fill (2.5,4.5) node {\color{red}\bf \text{bc}_2} -- (2.5,3.5) node {\color{red}mRNA} -- (2.5,2.5) node {\color{red}mRNA} -- (2.5,1.5) node {\color{red}\vdots} -- (2.5,0.5) node {\color{red}mRNA}; \fill (3.5,4.5) node {\bf{\cdots}} -- (3.5,3.5) node {\cdots} -- (3.5,2.5) node {\cdots} -- (3.5,1.5) node {\ddots} -- (3.5,0.5) node {\cdots}; \fill (4.5,4.5) node {\bf \text{bc}_c} -- (4.5,3.5) node {mRNA} -- (4.5,2.5) node {mRNA} -- (4.5,1.5) node {\vdots} -- (4.5,0.5) node {mRNA}; \draw (1,0) grid (5,4); \end{tikzpicture}

    \column{0.5\textwidth}

    {\large Some cells are not cells.}

    \begin{itemize} \item {\bf v2} chemistry \sim 737,000 cell barcodes \item {\bf v3} chemistry \sim 3,500,000 cell barcodes \end{itemize}

    \vspace{1em}

    To avoid cell barcode collision we need [ \text{\bf cell number} \ll \text{\bf cell barcode number} ]

    Most of the droplets will be empty

    \end{columns} \end{center}

    Cell filtering

    \begin{center} \begin{columns} \column{0.35\textwidth} Sequenced empty droplets: \begin{itemize} \item do not express many genes \item looks like experimental noise \end{itemize}

    \vspace{1em}

    The number of UMI per cell barcode

    \column{0.7\textwidth} \vspace{1.5em} \includegraphics[width=\textwidth]{img/cell_barcode_rank_vs_umi.png} \end{columns} \end{center}

    Cell filtering

    \begin{center} \begin{columns} \column{0.35\textwidth} We have {\bf two populations} of cell barcode: \begin{itemize} \item a {\bf low} total UMI counts one \item a {\bf high} total UMI counts one \end{itemize}

    \column{0.7\textwidth} \vspace{1.5em} \includegraphics[width=\textwidth]{img/cell_barcode_rank_vs_umi.png} \end{columns} \end{center}

    Cell filtering

    \begin{center} \begin{columns} \column{0.5\textwidth} \begin{tikzpicture} \fill (0.5,3.5) node {\bf \text{gene}_1} -- (0.5,2.5) node {\bf \text{gene}_2} -- (0.5,1.5) node {\bf \vdots} -- (0.5,0.5) node {\bf \text{gene}_n}; \fill (1.5,4.5) node {\bf \text{cell}_1} -- (1.5,3.5) node {mRNA} -- (1.5,2.5) node {mRNA} -- (1.5,1.5) node {\vdots} -- (1.5,0.5) node {mRNA}; \fill (2.5,4.5) node {\color{red}\bf \text{2 cells}_2} -- (2.5,3.5) node {\color{red}mRNA} -- (2.5,2.5) node {\color{red}mRNA} -- (2.5,1.5) node {\color{red}\vdots} -- (2.5,0.5) node {\color{red}mRNA}; \fill (3.5,4.5) node {\bf \cdots} -- (3.5,3.5) node {\cdots} -- (3.5,2.5) node {\cdots} -- (3.5,1.5) node {\ddots} -- (3.5,0.5) node {\cdots}; \fill (4.5,4.5) node {\bf \text{cell}_c} -- (4.5,3.5) node {mRNA} -- (4.5,2.5) node {mRNA} -- (4.5,1.5) node {\vdots} -- (4.5,0.5) node {mRNA}; \draw (1,0) grid (5,4); \end{tikzpicture}

    \column{0.5\textwidth}

    {\large Some cells are many cells:}

    \begin{itemize} \item not all tissues are easily dissociable \item two cells glued together will share the same droplet \item two different cells can share the same droplet by chance \end{itemize}

    \vspace{1em}

    Cell barcode corresponding to n-plet should be in the minority if the preparation went well.

    \end{columns} \end{center}

    Cell filtering

    \begin{center} \includegraphics[width=0.75\textwidth]{img/mouse_human_mix.png} \end{center}

    Cell filtering

    hypothesis

    Cell barcode corresponding to n-plet should be in the minority if the preparation went well.

    Algorithm

    1. Simulate thousands of doublets by adding together two randomly chosen single-cell profiles.
    2. For each original cell, compute the density of simulated doublets in the surrounding neighborhood.
    3. For each original cell, compute the density of other observed cells in the neighborhood.
    4. Return the ratio between the two density as a doublet score for each cell.

    Cell filtering

    \begin{center} \includegraphics[width=\textwidth]{img/doublet_detection_comparison.png} \end{center}

    Different algorithms are available to compare cells to synthetic doublets

    Cell filtering

    \begin{center} \includegraphics[width=0.8\textwidth]{img/features_for_QC_1.png} \end{center} \vspace{-1.5em} We can use hard thresholds to remove putative poor quality cells \vspace{-0.5em} \begin{itemize} \item apoptotic cells express MT genes \item incefficient RT or PCR amplification \end{itemize}

    Cell filtering

    \begin{center} \includegraphics[width=0.8\textwidth]{img/features_for_QC_2.png} \end{center}

    Cell expressing few genes also contains few mRNA molecules

    Gene filtering

    \begin{center} \begin{columns} \column{0.5\textwidth} \begin{center} \begin{tikzpicture} \fill (0.5,3.5) node {\bf \text{gene}_1} -- (0.5,2.5) node {\bf \text{gene}_2} -- (0.5,1.5) node {\bf \vdots} -- (0.5,0.5) node {\bf \text{gene}_n}; \fill (1.5,4.5) node {\bf{\text{cell}_1}} -- (1.5,3.5) node {mRNA} -- (1.5,2.5) node {\color{red}0} -- (1.5,1.5) node {\vdots} -- (1.5,0.5) node {mRNA}; \fill (2.5,4.5) node {\bf{\text{0 cell}_2}} -- (2.5,3.5) node {mRNA} -- (2.5,2.5) node {\color{red}0} -- (2.5,1.5) node {\vdots} -- (2.5,0.5) node {mRNA}; \fill (3.5,4.5) node {\bf{\cdots}} -- (3.5,3.5) node {\cdots} -- (3.5,2.5) node {\cdots} -- (3.5,1.5) node {\ddots} -- (3.5,0.5) node {\cdots}; \fill (4.5,4.5) node {\bf{\text{cell}_c}} -- (4.5,3.5) node {mRNA} -- (4.5,2.5) node {\color{red}0} -- (4.5,1.5) node {\vdots} -- (4.5,0.5) node {mRNA}; \draw (1,0) grid (5,4); \end{tikzpicture} \end{center}

    \column{0.5\textwidth}