diff --git a/README.md b/README.md index c41860ce25b7b50077e6668e9d71a82fda08b555..fffe53b1f41d1687f67d8e06186edb6b43a3e7b1 100644 --- a/README.md +++ b/README.md @@ -99,7 +99,21 @@ The final arborescence should be: * filtred_genes_Lympho_Resting.csv * filtred_genes_Lympho_Activated.csv * DESeqresults_lympho_activation_untreated.csv - + +> The `filtred_genes_Lympho_Resting.csv` and +> `filtred_genes_Lympho_Activated.csv` files +> are used to focus the machine learning process on pre-selected transcripts. +> +> This selection was based on the gene normalized read counts of the 3h +> Triptolide libraries (~5000 genes in Resting and ~6000 genes in Activated +> T-cells). Of these, only genes with completed observations in all biological +> replicates (including ribosome profiling libraries) and for all +> transcript features used to build the model, as well as at least 15% of +> observed degradation at 3h, were kept for further analysis. +> +> The `DESeqresults_lympho_activation_untreated.csv` file is used to add the +> `log2FoldChange` to the database. +> This column is used as parameter for the `deltaTDD` and `deltaTID` scores. ## Generate a Random Forest model