From f5fc2c6936e5a316149f9be0abca6d8d31eace1a Mon Sep 17 00:00:00 2001
From: CLUET David <david.cluet@ens-lyon.fr>
Date: Fri, 8 Dec 2023 08:12:29 +0100
Subject: [PATCH] Add explanation for column name changes

---
 README.md              | 55 +++++++++++++++++++++++++++++++++---------
 RMI2_Random_Forest.Rmd |  4 +--
 2 files changed, 45 insertions(+), 14 deletions(-)

diff --git a/README.md b/README.md
index fffe53b..5465603 100644
--- a/README.md
+++ b/README.md
@@ -100,20 +100,51 @@ The final arborescence should be:
     * filtred_genes_Lympho_Activated.csv
     * DESeqresults_lympho_activation_untreated.csv
   
-> The `filtred_genes_Lympho_Resting.csv` and 
-> `filtred_genes_Lympho_Activated.csv` files
-> are used to focus the machine learning process on pre-selected transcripts.
+> **NOTA BENE**
 >
-> This selection was based on the gene normalized read counts of the 3h 
-> Triptolide libraries (~5000 genes in Resting and ~6000 genes in Activated 
-> T-cells). Of these, only genes with completed observations in all biological 
-> replicates (including ribosome profiling libraries) and for all 
-> transcript features used to build the model, as well as at least 15% of 
-> observed degradation at 3h, were kept for further analysis.
+> **The `filtred_genes_Lympho_Resting.csv` and** 
+> **`filtred_genes_Lympho_Activated.csv` files**
+> **are used to focus the machine learning process on pre-selected transcripts.**
 >
-> The `DESeqresults_lympho_activation_untreated.csv` file is used to add the
-> `log2FoldChange` to the database.
-> This column is used as parameter for the `deltaTDD` and `deltaTID` scores.
+> **This selection was based on the gene normalized read counts of the 3h** 
+> **Triptolide libraries (~5000 genes in Resting and ~6000 genes in Activated** 
+> **T-cells). Of these, only genes with completed observations in all biological** 
+> **replicates (including ribosome profiling libraries) and for all** 
+> **transcript features used to build the model, as well as at least 15% of** 
+> **observed degradation at 3h, were kept for further analysis.**
+>
+> **The `DESeqresults_lympho_activation_untreated.csv` file is used to add the**
+> **`log2FoldChange` to the database.**
+> **This column is used as parameter for the `deltaTDD` and `deltaTID` scores.**
+>
+> **Has we had a lot of experimental conditions and time. Scores required**
+> **to "explicitely" indicate how they have been obtained.**
+> **In order to have the optimal tracking of which experimental data have been**
+> **used as input and how (we had initially several metrics: relative,**
+> **absolute, ...). Thus I generated some "complex" column names allowing to**
+> **precisely use the correct column and can be handled without any**
+> **modification by `Python` pandas library. For exemple the** 
+> **$TDD_{index}$ column name**
+> **is [initialy](https://gitbio.ens-lyon.fr/LBMC/RMI2/rmi2_gff_fasta_compilation):**
+> 
+> `Abs(TDD)>Lympho_Resting>Trip_CHX>Ref_Trip_0h>3h`
+>
+> **Meaning that the `Absolute TDD` score has been computed for (`>`)**
+> **the `Lympho` in the `Resting` status using (`>`) the `Trip_CHX` treatment**
+> **condition, with the `Trip` treatment condition at `0h` as reference `Ref`,**
+> **and computed with `>` t = `3h`.**
+> 
+> **Nevertheless due to restrictions in column names in `R`, some characters like**
+> **`()` and `>`. Thus, Emmanuel Labaronne had to change such names to**
+> **perform `Random Forest` computations with `R`.**
+> **This column is now called:**
+>
+> `Abs.TDD..Lympho_Resting.Trip_CHX.Ref_Trip_0h.3h`
+>
+> **Once the reviewing process will be over, I will change the**
+> **[initial](https://gitbio.ens-lyon.fr/LBMC/RMI2/rmi2_gff_fasta_compilation)**
+> **`Python` scripts to take into account the downstream `R` limitations.**
+
 
 ## Generate a Random Forest model
 
diff --git a/RMI2_Random_Forest.Rmd b/RMI2_Random_Forest.Rmd
index d217936..d3316a5 100644
--- a/RMI2_Random_Forest.Rmd
+++ b/RMI2_Random_Forest.Rmd
@@ -38,7 +38,7 @@ header-includes:
       \includegraphics[width=1cm]{Logos/logo-rmi2-lab-version-2.png}
       }
   - \cfoot{\thepage}
-  - \lfoot{Written by Labaronne Emmanuel and Cluet David, current version 2023/12/06}
+  - \lfoot{Written by Labaronne Emmanuel and Cluet David \newline current version 2023/12/06}
   - \renewcommand{\footrulewidth}{0.4pt}
   - \pretitle{\begin{center}
     \includegraphics[width=2cm,height=2cm]{Logos/logo-rmi2-lab-version-2.png}\LARGE\\}
@@ -93,7 +93,7 @@ source("src/randomForest.R")
 
 ## Fine grain tuning of the Random Forest model
 
-```{r do_RF}
+```{r do_RF_please_wait}
 # Perform the random forest
 model <- makeRandomForest(x = x,
                           explicit_x = x_explicit)
-- 
GitLab