scAN10 pipeline
This repository contains a template and library for the scANalysis of 10x data (scAN10) nextflow pipeline. It can be forked to create your own pipeline.
Overview of scAN-10 pipeline

Getting Started
Prerequisites
To run the pipeline, you will need to have the following installed on your computer:
-
java
(>= 11) git
docker or singularity
You can check if these are installed by running the following commands in your terminal:
java -version
git --version
Recommendation
If you want to run the pipeline on a local computer. It is strongly recommended to have a machine with >32 Gb RAM.
DOWNLOADING
To install the pipeline, use git to clone the repository using HTTPS protocol:
BE CAREFUL : do not clone with SSH but clone with HTTPS
git clone https://gitbio.ens-lyon.fr/LBMC/sbdm/scan10.git
cd scan10/
src/install_nextflow.sh
Running the pipeline
all options
usage: ./nextflow src/scan10.nf [nf options] [pipeline parameters] [filtering parameters] [visualisaion parameters]
nf options
-profile:(required) use this to choose a configuration profile. It can be docker
or singularity
if they are installed on your machine. Use the profile psmn
to run the pipeline on ENSL cluster.
-resume:(optional) use this to used cached results. Nextflow saved results for each process successfully completed. With this parameter, for any process that is unchanged and run for a second time, Nextflow used the cached results.
pipeline parameters:
--fastq: (required) use this to specify the path of the fastq files.
--quantif: (required) use this to specify the mapping/quantification tool to use (cellranger or kb) .
--version: (required if no --gtf and no --fasta specified) use this to specify the version of ENSEMBL database from which to download gtf and fasta files. Works for human/mouse only by specifying species with --species. For other species, please provide gtf and fasta.
--species: (required) (human or mouse)
--fasta: (required if no --version specify) use this to specify the path to the transcriptome fasta file, used as reference for mapping.
--gtf: (required if no --version specify) use this to specify the path to the gtf file.
--chemistry: (required) use this to specify the 10X chemistry version. (default:V3)
--config :(optional) use this to specify the path of the configuration settings file used for this pipeline.
--skip :(optional) skip QC, normalisation and visualisation processes
filtering parameters
--filtergtf: (optional) use this to filter GTF based on biotype.
--threshold: (optional) use this to compute adaptative threshold for max_percent_mito, max_feature_RNA, max_ncount_RNA. The formula for calculating its thresholds : median()+4*mad(). Set "manual" as value to disable automatic computation of threshold (default = "adaptative").
--max_percent_mito: (optional) use this to specify the maximum percentage of mitochindrial RNA mit allowed in a cell to be kept for analysis (default = "adaptative").
--max_feature_RNA: (optional) use this to specify the maximum number of gene/feature expressed in a cell to be kept for analysis (default = "adaptative").
--max_ncount_RNA :(optional) use this to specify the maximum of UMI in a cell to be kept for for analysis (default = "adaptative").
--min_cells: (optional) Include features detected in at least this many cells (default = 5).
--min_feature_RNA: (optional) Include cells where at least this many features are detected (default = 500).