Add explanation for column name changes

f5fc2c69 · dcluet · 850aeba1 · f5fc2c69 · f5fc2c69
Commit f5fc2c69 authored 1 year ago by dcluet
--- a/README.md
+++ b/README.md
@@ -100,20 +100,51 @@ The final arborescence should be:
    * filtred_genes_Lympho_Activated.csv
    * DESeqresults_lympho_activation_untreated.csv
-> The `filtred_genes_Lympho_Resting.csv` and 
+> **NOTA BENE**
-> `filtred_genes_Lympho_Activated.csv` files
-> are used to focus the machine learning process on pre-selected transcripts.
 >
-> This selection was based on the gene normalized read counts of the 3h 
+> **The `filtred_genes_Lympho_Resting.csv` and** 
-> Triptolide libraries (~5000 genes in Resting and ~6000 genes in Activated 
+> **`filtred_genes_Lympho_Activated.csv` files**
-> T-cells). Of these, only genes with completed observations in all biological 
+> **are used to focus the machine learning process on pre-selected transcripts.**
-> replicates (including ribosome profiling libraries) and for all 
-> transcript features used to build the model, as well as at least 15% of 
-> observed degradation at 3h, were kept for further analysis.
 >
-> The `DESeqresults_lympho_activation_untreated.csv` file is used to add the
+> **This selection was based on the gene normalized read counts of the 3h** 
-> `log2FoldChange` to the database.
+> **Triptolide libraries (~5000 genes in Resting and ~6000 genes in Activated** 
-> This column is used as parameter for the `deltaTDD` and `deltaTID` scores.
+> **T-cells). Of these, only genes with completed observations in all biological** 
+> **replicates (including ribosome profiling libraries) and for all** 
+> **transcript features used to build the model, as well as at least 15% of** 
+> **observed degradation at 3h, were kept for further analysis.**
+>
+> **The `DESeqresults_lympho_activation_untreated.csv` file is used to add the**
+> **`log2FoldChange` to the database.**
+> **This column is used as parameter for the `deltaTDD` and `deltaTID` scores.**
+>
+> **Has we had a lot of experimental conditions and time. Scores required**
+> **to "explicitely" indicate how they have been obtained.**
+> **In order to have the optimal tracking of which experimental data have been**
+> **used as input and how (we had initially several metrics: relative,**
+> **absolute, ...). Thus I generated some "complex" column names allowing to**
+> **precisely use the correct column and can be handled without any**
+> **modification by `Python` pandas library. For exemple the** 
+> **$TDD_{index}$ column name**
+> **is [initialy](https://gitbio.ens-lyon.fr/LBMC/RMI2/rmi2_gff_fasta_compilation):**
+> 
+> `Abs(TDD)>Lympho_Resting>Trip_CHX>Ref_Trip_0h>3h`
+>
+> **Meaning that the `Absolute TDD` score has been computed for (`>`)**
+> **the `Lympho` in the `Resting` status using (`>`) the `Trip_CHX` treatment**
+> **condition, with the `Trip` treatment condition at `0h` as reference `Ref`,**
+> **and computed with `>` t = `3h`.**
+> 
+> **Nevertheless due to restrictions in column names in `R`, some characters like**
+> **`()` and `>`. Thus, Emmanuel Labaronne had to change such names to**
+> **perform `Random Forest` computations with `R`.**
+> **This column is now called:**
+>
+> `Abs.TDD..Lympho_Resting.Trip_CHX.Ref_Trip_0h.3h`
+>
+> **Once the reviewing process will be over, I will change the**
+> **[initial](https://gitbio.ens-lyon.fr/LBMC/RMI2/rmi2_gff_fasta_compilation)**
+> **`Python` scripts to take into account the downstream `R` limitations.**
 ## Generate a Random Forest model

--- a/RMI2_Random_Forest.Rmd
+++ b/RMI2_Random_Forest.Rmd
@@ -38,7 +38,7 @@ header-includes:
      \includegraphics[width=1cm]{Logos/logo-rmi2-lab-version-2.png}
      }
  - \cfoot{\thepage}
-  - \lfoot{Written by Labaronne Emmanuel and Cluet David, current version 2023/12/06}
+  - \lfoot{Written by Labaronne Emmanuel and Cluet David \newline current version 2023/12/06}
  - \renewcommand{\footrulewidth}{0.4pt}
  - \pretitle{\begin{center}
    \includegraphics[width=2cm,height=2cm]{Logos/logo-rmi2-lab-version-2.png}\LARGE\\}
@@ -93,7 +93,7 @@ source("src/randomForest.R")
 ## Fine grain tuning of the Random Forest model
-```{r do_RF}
+```{r do_RF_please_wait}
 # Perform the random forest
 model <- makeRandomForest(x = x,
                          explicit_x = x_explicit)