From 2b87d8153550e2800aa81d2dd3af44ba7bbbda15 Mon Sep 17 00:00:00 2001
From: Sergio Sarnataro <sergio.sarnataro@ens-lyon.fr>
Date: Fri, 1 Sep 2023 12:19:36 +0200
Subject: [PATCH] Update README

---
 README.md | 39 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 39 insertions(+)

diff --git a/README.md b/README.md
index e69de29..75225fb 100644
--- a/README.md
+++ b/README.md
@@ -0,0 +1,39 @@
+## Methods for the downstream analysis
+
+Downstream analysis on aligned samples has been performed by using the _scanpy_ toolkit in python [[1]](#1).
+
+##### Data concatenation
+
+First, data from the different samples were concatenated by using the function _concatenate()_ from _scanpy_.
+
+##### Filtering out cells and genes
+
+Only cells expressing more than 600 and less than 5000 genes were kept in the analysis, while the other were filtered out. Moreover, cells showing a total number of counts higher than 15000 were filtered out.
+
+Also, regarding the timesteps WP and 5h, only cells expressing at least on of the following genes were kept: GFP, Mef2 and twi. Regarding instead the timestep 9396, only cells expressing at least one gene among GFP and twi were kept.
+Cells not matching the conditions above, were filtered out.
+
+Finally, genes expressed in less than 3 cells were excluded from the analysis.
+
+##### Normalization and scaling
+Data were normnalized and logarithmized by using the functions _scanpy.pp.normalize_total()_ and _scanpy.pp.log1p()_ respectively, with the default parameters.
+
+Then, total counts and percentage of mitochondrial genes where regressed out by using the function _scanpy.pp.regress_out()_, and data were scaled by using _scanpy.pp.scale()_ with the parameter max_value=10.
+
+##### Principal component analysis, neighborhood graph and UMAP
+The principal component analysis was performed on the data through the function _scanpy.tl.pca()_, setting the parameter _svd_solver='arpack'_. Neighborhood graph was computed by using _scanpy.pp.neighboors()_ setting the parameters _n_neighboors=10_ and _n_pcs=40_. Finally, UMAP dimensionality reduction was calculated through _scanpy.tl.umap()_ with standard parameters.
+
+A visual inspection of the data in the UMAP space suggested the presence of batch effect.
+
+##### Batch effect correction
+
+
+
+
+
+### References
+<a id="1">[1]</a>
+Wolf, F., Angerer, P. & Theis, F.
+SCANPY: large-scale single-cell gene expression data analysis.
+Genome Biol 19, 15 (2018).
+https://doi.org/10.1186/s13059-017-1382-0
-- 
GitLab