The quantification run in around 2 min on 10 threads and creates 4 files:
...
...
@@ -328,13 +319,6 @@ The `/data/share/MADT/scrnaseq/10xv2_whitelist.txt` contains all the barcodes kn
We are ready to make the gene count matrix. First, `bustools` runs barcode error correction on the bus file. Then, the corrected bus file is sorted by barcode, UMI, and equivalence classes. Then the UMIs are counted and the counts are collapsed into gene level.
```{bash bustools, eval=F}
mkdir -p results/hgmm_1k/genecount tmp
singularity exec /data/share/MADT/scrnaseq/kallistobustools_0.24.4.simg sh -c "\
@@ -365,7 +349,7 @@ Single-cell RNA Seq counts contains a loot of zeros, to save space we can use th
You can use the `read_count_output()` function to load kb output into a `sparceMatrix`, and the `SingleCellExperiment()` to create a `SingleCellExperiment` object.
We are going to normalize for library effect with `SCTransform` but contrary to the first part of the practical we are going to work with `Seurat` objects instead of `SingleCellExperiment` objects.
`Seurat` is an R package which contains a large set of tools for scRNASeq analysis. Of course, it also has its own way of storing single-cell data (with the `CreateSeuratObject()` function).
We are going to normalize for library effect with `SCTransform`.
seu <- SCTransform(seu, assay = "uf", new.assay.name = "unspliced")
Contrary to the first part of the practical we are going to work with `Seurat` objects instead of `SingleCellExperiment` objects, from now on.
`Seurat` is an R package which contains a large set of tools for scRNASeq analysis. Of course, it also has its own way of storing single-cell data (with the `CreateSeuratObject()` function).
The functionalities of a `SeuratObject` are roughly the same as the ones of an `SingleCellExperiment`.
To speedup the computation and focus on interesting genes, we are going to work only with the highly variable genes. We are going to use the `Seurat` procedure described [here](https://www.biorxiv.org/content/early/2018/11/02/460147.full.pdf).
You can use the `FindVariableFeatures()` function to identify highly variable genes.
The `DimHeatmap()` function allows for easy exploration of the primary sources of heterogeneity in a dataset, and can be useful when trying to decide which PCs to include for further downstream analyses. Both cells and features are ordered according to their PCA scores. Setting `cells` to a number plots the ‘extreme’ cells on both ends of the spectrum, which dramatically speeds plotting for large datasets. Though clearly a supervised analysis, we find this to be a valuable tool for exploring correlated feature sets.
As expected, one end of the plot has mostly stem cells, and the other end has mostly neurons. Clustering should petition the big blob of NPCs that `SingleR` could not further partition due to limitations in the `SingleR` reference for mouse brains.
Use the `FindNeighbors()` and `FindClusters()` function for the clustering.
Would a clean trajectory from qNSCs to NPCs to neurons be traced? The arrows are projected onto non-linear dimension reductions by correlation between the predicted cell state and gene expression of other cells in the dataset.
The cells labeled qNSCs and astrocytes are at the very top, going into two paths, one going down and to the right to the neurons, and the other going left towards the OPCs. There also seems to be a cycle to the left of what’s labeled qNSCs and astrocytes at the top. To the lower right of the cluster containing what’s labeled OPCs (cluster 7), there are two branches, but those also look like a cycle.
...
...
@@ -1074,10 +1184,10 @@ label_clusters(seu$cell_type, Embeddings(seu, "umap"), font = 2, col = "brown")
This step is computationally expensive; in subsequent calls to `show.velocity.on.embedding.cor` for the same dimension reduction, the expensive part can be bypassed by supplying the output of the first call.
With RNA velocity we can also compute phase portraits of genes. `Mef2c` (myocyte enhancer factor 2C), which is highly expressed in the mouse adult cortex though not much in the embryonic CNS until E18, according to the [NCBI page of this gene](https://www.ncbi.nlm.nih.gov/gene/17260).