@@ -17,7 +17,7 @@ You can fork this repository to build your own pipeline.
### Detailed workflow
#### Filter GTF file with mkgtf to contain only genes of interest (kb and cellranger)
#### Filter GTF file with mkgtf to contain only genes of interest (kb and cellranger) (optional)
GTF files downloaded from sites like ENSEMBL and UCSC often contain transcripts and genes which need to be filtered from your final annotation. Cell Ranger provides mkgtf, a simple utility to filter genes based on their key-value pairs in the GTF attribute column.
Please refer the [Filter GTF](https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/advanced/references) section to see all the key-value pairs available.
...
...
@@ -72,6 +72,7 @@ R package :
- [DropletUtils](https://bioconductor.org/packages/release/bioc/html/DropletUtils.html) (v1.10.3) provides a number of utility functions for handling single-cell (RNA-seq) data from droplet technologies such as 10X Genomics.
- [Matrix](https://cran.r-project.org/web/packages/Matrix/index.html) (v1.3.4) for handling matrix.
- [cluster](https://cran.r-project.org/web/packages/cluster/index.html) (v2.1.2) provide methods for cluster analysis.
- [SeuratDisk](https://github.com/mojaveazure/seurat-disk) (v0.0.0.9020) Support for multi-modal single cell through h5Seurat and AnnData for Seurat.
The R scripts corresponding to the subsections _Quality control of feature and cells_ (scQualityControlR.R) and _Clustering and visualization_(scVizualisationR.R) work with a script called function.R that provides all the functions used in the above script.
...
...
@@ -199,15 +200,15 @@ pipeline parameter:
**--quantif** :(require) use this to specify the mapping/quantification tool to use (cellranger or kb) .
**--version**(require if no --gtf and no --fasta specify) use this to specify the version of ENSEMBL database to download gtf and fasta files
**--version**(require if no --gtf and no --fasta specify) use this to specify the version of ENSEMBL database to download gtf and fasta files.Work for human only. FOr other species, please provide gtf and fasta.
**--fasta** :(require if no --version specify) use this to specify the path of the transcriptome fasta file, used as reference for mapping.
**--gtf** :(require if no --version specify) use this to specify the path of the gtf file.
**--whitelist** :(optional) use this to specify the path of the genome fasta file, used as reference for mapping.
(default:10x_V3_barcode_whitelist.txt)
**--chemistry** :(require) use this to specify the 10X chemistry version.
(default:V3)
**--config** :(optional) use this to specify the path of the configuration settings file used for this pipeline.
...
...
@@ -225,15 +226,15 @@ pipeline parameter:
**--ncount** :(optional) use this to specify the minimum of UMI per cell to keep for analysis.(default:500)
**--cp** :(optional) use this to specify the number of principal component to keep for analysis. (default:10)
**--nexp** :(optional) use this to set the expected number of doublet (2.3%=0.023). See [10xDoublet_rate](/doc/doublet_rate.png) (default:0.023)
**Visualisation parameter**
**--skip** :(optional) skip Clustering and visualization step. Accept : true or false (default: false)
**--cp** :(optional) use this to specify the number of principal component to keep for analysis. (default:10)