Update README

2b87d815 · Sergio Sarnataro · 18aea086 · 2b87d815
Commit 2b87d815 authored 1 year ago by Sergio Sarnataro
--- a/README.md
+++ b/README.md
+## Methods for the downstream analysis
+Downstream analysis on aligned samples has been performed by using the _scanpy_ toolkit in python [[1]](#1).
+##### Data concatenation
+First, data from the different samples were concatenated by using the function _concatenate()_ from _scanpy_.
+##### Filtering out cells and genes
+Only cells expressing more than 600 and less than 5000 genes were kept in the analysis, while the other were filtered out. Moreover, cells showing a total number of counts higher than 15000 were filtered out.
+Also, regarding the timesteps WP and 5h, only cells expressing at least on of the following genes were kept: GFP, Mef2 and twi. Regarding instead the timestep 9396, only cells expressing at least one gene among GFP and twi were kept.
+Cells not matching the conditions above, were filtered out.
+Finally, genes expressed in less than 3 cells were excluded from the analysis.
+##### Normalization and scaling
+Data were normnalized and logarithmized by using the functions _scanpy.pp.normalize_total()_ and _scanpy.pp.log1p()_ respectively, with the default parameters.
+Then, total counts and percentage of mitochondrial genes where regressed out by using the function _scanpy.pp.regress_out()_, and data were scaled by using _scanpy.pp.scale()_ with the parameter max_value=10.
+##### Principal component analysis, neighborhood graph and UMAP
+The principal component analysis was performed on the data through the function _scanpy.tl.pca()_, setting the parameter _svd_solver='arpack'_. Neighborhood graph was computed by using _scanpy.pp.neighboors()_ setting the parameters _n_neighboors=10_ and _n_pcs=40_. Finally, UMAP dimensionality reduction was calculated through _scanpy.tl.umap()_ with standard parameters.
+A visual inspection of the data in the UMAP space suggested the presence of batch effect.
+##### Batch effect correction
+### References
+<a id="1">[1]</a>
+Wolf, F., Angerer, P. & Theis, F.
+SCANPY: large-scale single-cell gene expression data analysis.
+Genome Biol 19, 15 (2018).
+https://doi.org/10.1186/s13059-017-1382-0