Snippets Groups Projects

example PBS slurm

Elodie Vallin authored 2 years ago

d3e15c67

d3e15c67 2 years ago

Name	Last commit	Last update
bin
data
doc
public
results
src
.gitignore
CHANGELOG.md
CONTRIBUTING.md
LICENSE
README.md
command_mars.V2.md
command_mars.md
command_mars_OG.md

Introduction

This project

MARS-seq pipeline

This pipeline aims at generating counts matrix from fastq files from Illumina sequencing which were generated with the adapted MARSseq scRNAseq protocole.

Softwares:

nextflow(19.10)
fastqc (v0.11.5) and MultiQC (v1.9) for quality control analysis
cutadapt (v2.1) for trimming remaining adaptors, plate demultiplexing, mRNA filtering, cell demultiplexing and quality trimming
umi_tools (v1.0.0) for cell barcodes and UMI sequences detection, cell barcodes whitelist generation, reads sequence extraction, and read counting
bowtie2 (v2.3.4.1) for fasta files indexing and mapping
samtools (v1.7) for extracting and indexing bam files
subreads(1.6.4)
R (v4) to generate an histogram of cell barcodes frequency, convert transcript names to gene names and to fuse all cells into a single cells x genes matrix
Python () to calculate cell barcodes frequency and to handle whitelist file

Install nextflow

Run install_nextflow.sh

Files

The pipeline takes in entry:

Mars_seq.nf the nextflow pipeline
Mars_seq.config the nextflow configuration file
fastq files R1 and R2 To indicate both file for read 1 and read 2 you can put the "1" and "2" between bracket and separated by a comma ("{1,2}"). Reads file can be gziped.
a tag.fa file containing the barcode plates in the following format

>Plate1
^ATGC
>Plate2
^CATG
...

a fasta reference transcriptome
a GTF file matching the transcriptome
the expected whitelist of cell barcodes in txt format
a gene map file used for transcripts to genes conversion after mapping

Results

The pipeline output a cell x genes counts matrix and QC files.

Generating metadata files

Software

Python scripts are used to generate metadata files from the scRNASeq data.

get_reads_nb.py generate a QC matrix with reads number, mapped reads number, percent of reads mapped per each cell.
mapping_ratio.py generate a QC matrix with reads number mapped to transcriptome, reads number mapped to ERCC, ratio of the two, pourcent of reads mapped to ERCC per each cell.

files

*_mapping files from conrol_qual repository to use with get_reads_nb.py
*_geneassigned files from plateX repository to use with mapping_ratio.py

scRNAseq data analysis

R scripts

R script must been run in an R console

QC_cleaning.R: takes in entry counts matrix and QC matrix. This script does cells filtering based on their reads number, pourcent of mapped reads, pourcent of mapped reads to ERCC, number of detected genes, number of counts, per cell. And gene filtering.
Data_normalization.R: takes in entry the QC filtered counts matrix. This script does data normalization with SCTransform from Seurat package.
Analysis_UMAP.R: takes in entry the normalized and log1p transform matrix. This script does dimentionality reduction and projection using UMAP, to compare the different biological conditions all together and two by two.
DE_analysis.R: takes in entry the normalized matrix. This script does differential expression analysis on biological conditions two by two using Seurat package.
Gene_distribution_analysis.R: takes in entry the normalized and log1p transform matrix. This script does histograms of specific genes expression values for each condition, to compare pattern of expression.
Gini_index_boot_strap.R: takes in entry the normalized and log1p transform matrix. This script performs 100 bootstrap of Wasserstein distance computation for each gene between pair of conditions and computes Gini index for each comparison. Finally, it generates a boxplot of Gini indexes values for each pair of comparison.
3genes comparison_analysis.R: takes in entry the normalized and log1p transform matrix. This script generates boxplots of 3 genes expression in each biological condition.

scRT-qPCR data analysis

#sparse PLS analysis R script 3328_2.R takes in entry two files:

an expression matrix
a table to define classes

Authors