From 2b87d8153550e2800aa81d2dd3af44ba7bbbda15 Mon Sep 17 00:00:00 2001 From: Sergio Sarnataro <sergio.sarnataro@ens-lyon.fr> Date: Fri, 1 Sep 2023 12:19:36 +0200 Subject: [PATCH] Update README --- README.md | 39 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 39 insertions(+) diff --git a/README.md b/README.md index e69de29..75225fb 100644 --- a/README.md +++ b/README.md @@ -0,0 +1,39 @@ +## Methods for the downstream analysis + +Downstream analysis on aligned samples has been performed by using the _scanpy_ toolkit in python [[1]](#1). + +##### Data concatenation + +First, data from the different samples were concatenated by using the function _concatenate()_ from _scanpy_. + +##### Filtering out cells and genes + +Only cells expressing more than 600 and less than 5000 genes were kept in the analysis, while the other were filtered out. Moreover, cells showing a total number of counts higher than 15000 were filtered out. + +Also, regarding the timesteps WP and 5h, only cells expressing at least on of the following genes were kept: GFP, Mef2 and twi. Regarding instead the timestep 9396, only cells expressing at least one gene among GFP and twi were kept. +Cells not matching the conditions above, were filtered out. + +Finally, genes expressed in less than 3 cells were excluded from the analysis. + +##### Normalization and scaling +Data were normnalized and logarithmized by using the functions _scanpy.pp.normalize_total()_ and _scanpy.pp.log1p()_ respectively, with the default parameters. + +Then, total counts and percentage of mitochondrial genes where regressed out by using the function _scanpy.pp.regress_out()_, and data were scaled by using _scanpy.pp.scale()_ with the parameter max_value=10. + +##### Principal component analysis, neighborhood graph and UMAP +The principal component analysis was performed on the data through the function _scanpy.tl.pca()_, setting the parameter _svd_solver='arpack'_. Neighborhood graph was computed by using _scanpy.pp.neighboors()_ setting the parameters _n_neighboors=10_ and _n_pcs=40_. Finally, UMAP dimensionality reduction was calculated through _scanpy.tl.umap()_ with standard parameters. + +A visual inspection of the data in the UMAP space suggested the presence of batch effect. + +##### Batch effect correction + + + + + +### References +<a id="1">[1]</a> +Wolf, F., Angerer, P. & Theis, F. +SCANPY: large-scale single-cell gene expression data analysis. +Genome Biol 19, 15 (2018). +https://doi.org/10.1186/s13059-017-1382-0 -- GitLab