Snippets Groups Projects

The default branch for this project has been changed. Please update your bookmarks.

2 years ago
c5763064

Version 2.0 of QC steps: allow adaptative thresholding · c5763064
mlepetit authored 2 years ago

c5763064

History

Version 2.0 of QC steps: allow adaptative thresholding
mlepetit authored 2 years ago

README.md 14.74 KiB

scAN10 pipeline

This repository contains a template and library for the scANalysis of 10x data (scAN10) nextflow pipeline. It can be forked to create your own pipeline.

Overview of scAN-10 pipeline

Getting Started

Prerequisites

To run the pipeline, you will need to have the following installed on your computer:

java (>= 11)
git
docker or singularity

You can check if these are installed by running the following commands in your terminal:

java -version
git --version

Recommendation

If you want to run the pipeline on a local computer. It is strongly recommended to have a machine with >32 Gb RAM.

DOWNLOADING

To install the pipeline, use git to clone the repository using HTTPS protocol:

BE CAREFUL : do not clone with SSH but clone with HTTPS

git clone https://gitbio.ens-lyon.fr/LBMC/sbdm/scan10.git
cd scan10/
src/install_nextflow.sh

Running the pipeline

all options

usage: ./nextflow src/scan10.nf [nf options] [pipeline parameters] [filtering parameters] [visualisaion parameters]

nf options

-profile:(required) use this to choose a configuration profile. It can be docker or singularity if they are installed on your machine. Use the profile psmn to run the pipeline on ENSL cluster.

-resume:(optional) use this to used cached results. Nextflow saved results for each process successfully completed. With this parameter, for any process that is unchanged and run for a second time, Nextflow used the cached results.

pipeline parameters:

--fastq: (required) use this to specify the path of the fastq files.

--quantif: (required) use this to specify the mapping/quantification tool to use (cellranger or kb) .

--version: (required if no --gtf and no --fasta specified) use this to specify the version of ENSEMBL database from which to download gtf and fasta files. Works for human/mouse only by specifying species with --species. For other species, please provide gtf and fasta.

--species: (required) (human or mouse)

--fasta: (required if no --version specify) use this to specify the path to the transcriptome fasta file, used as reference for mapping.

--gtf: (required if no --version specify) use this to specify the path to the gtf file.

--chemistry: (required) use this to specify the 10X chemistry version. (default:V3)

--config :(optional) use this to specify the path of the configuration settings file used for this pipeline.

--skip :(optional) skip QC, normalisation and visualisation processes

filtering parameters

--filtergtf: (optional) use this to filter GTF based on biotype.

--threshold: (optional) use this to compute adaptative threshold for max_percent_mito, max_feature_RNA, max_ncount_RNA. The formula for calculating its thresholds : median()+4*mad(). Set "manual" as value to disable automatic computation of threshold (default = "adaptative").

--max_percent_mito: (optional) use this to specify the maximum percentage of mitochindrial RNA mit allowed in a cell to be kept for analysis (default = "adaptative").

--max_feature_RNA: (optional) use this to specify the maximum number of gene/feature expressed in a cell to be kept for analysis (default = "adaptative").

--max_ncount_RNA :(optional) use this to specify the maximum of UMI in a cell to be kept for for analysis (default = "adaptative").

--min_cells: (optional) Include features detected in at least this many cells (default = 5).

--min_feature_RNA: (optional) Include cells where at least this many features are detected (default = 500).