diff --git a/README.md b/README.md
index c41860ce25b7b50077e6668e9d71a82fda08b555..fffe53b1f41d1687f67d8e06186edb6b43a3e7b1 100644
--- a/README.md
+++ b/README.md
@@ -99,7 +99,21 @@ The final arborescence should be:
     * filtred_genes_Lympho_Resting.csv
     * filtred_genes_Lympho_Activated.csv
     * DESeqresults_lympho_activation_untreated.csv
-
+  
+> The `filtred_genes_Lympho_Resting.csv` and 
+> `filtred_genes_Lympho_Activated.csv` files
+> are used to focus the machine learning process on pre-selected transcripts.
+>
+> This selection was based on the gene normalized read counts of the 3h 
+> Triptolide libraries (~5000 genes in Resting and ~6000 genes in Activated 
+> T-cells). Of these, only genes with completed observations in all biological 
+> replicates (including ribosome profiling libraries) and for all 
+> transcript features used to build the model, as well as at least 15% of 
+> observed degradation at 3h, were kept for further analysis.
+>
+> The `DESeqresults_lympho_activation_untreated.csv` file is used to add the
+> `log2FoldChange` to the database.
+> This column is used as parameter for the `deltaTDD` and `deltaTID` scores.
 
 ## Generate a Random Forest model