From f5fc2c6936e5a316149f9be0abca6d8d31eace1a Mon Sep 17 00:00:00 2001 From: CLUET David <david.cluet@ens-lyon.fr> Date: Fri, 8 Dec 2023 08:12:29 +0100 Subject: [PATCH] Add explanation for column name changes --- README.md | 55 +++++++++++++++++++++++++++++++++--------- RMI2_Random_Forest.Rmd | 4 +-- 2 files changed, 45 insertions(+), 14 deletions(-) diff --git a/README.md b/README.md index fffe53b..5465603 100644 --- a/README.md +++ b/README.md @@ -100,20 +100,51 @@ The final arborescence should be: * filtred_genes_Lympho_Activated.csv * DESeqresults_lympho_activation_untreated.csv -> The `filtred_genes_Lympho_Resting.csv` and -> `filtred_genes_Lympho_Activated.csv` files -> are used to focus the machine learning process on pre-selected transcripts. +> **NOTA BENE** > -> This selection was based on the gene normalized read counts of the 3h -> Triptolide libraries (~5000 genes in Resting and ~6000 genes in Activated -> T-cells). Of these, only genes with completed observations in all biological -> replicates (including ribosome profiling libraries) and for all -> transcript features used to build the model, as well as at least 15% of -> observed degradation at 3h, were kept for further analysis. +> **The `filtred_genes_Lympho_Resting.csv` and** +> **`filtred_genes_Lympho_Activated.csv` files** +> **are used to focus the machine learning process on pre-selected transcripts.** > -> The `DESeqresults_lympho_activation_untreated.csv` file is used to add the -> `log2FoldChange` to the database. -> This column is used as parameter for the `deltaTDD` and `deltaTID` scores. +> **This selection was based on the gene normalized read counts of the 3h** +> **Triptolide libraries (~5000 genes in Resting and ~6000 genes in Activated** +> **T-cells). Of these, only genes with completed observations in all biological** +> **replicates (including ribosome profiling libraries) and for all** +> **transcript features used to build the model, as well as at least 15% of** +> **observed degradation at 3h, were kept for further analysis.** +> +> **The `DESeqresults_lympho_activation_untreated.csv` file is used to add the** +> **`log2FoldChange` to the database.** +> **This column is used as parameter for the `deltaTDD` and `deltaTID` scores.** +> +> **Has we had a lot of experimental conditions and time. Scores required** +> **to "explicitely" indicate how they have been obtained.** +> **In order to have the optimal tracking of which experimental data have been** +> **used as input and how (we had initially several metrics: relative,** +> **absolute, ...). Thus I generated some "complex" column names allowing to** +> **precisely use the correct column and can be handled without any** +> **modification by `Python` pandas library. For exemple the** +> **$TDD_{index}$ column name** +> **is [initialy](https://gitbio.ens-lyon.fr/LBMC/RMI2/rmi2_gff_fasta_compilation):** +> +> `Abs(TDD)>Lympho_Resting>Trip_CHX>Ref_Trip_0h>3h` +> +> **Meaning that the `Absolute TDD` score has been computed for (`>`)** +> **the `Lympho` in the `Resting` status using (`>`) the `Trip_CHX` treatment** +> **condition, with the `Trip` treatment condition at `0h` as reference `Ref`,** +> **and computed with `>` t = `3h`.** +> +> **Nevertheless due to restrictions in column names in `R`, some characters like** +> **`()` and `>`. Thus, Emmanuel Labaronne had to change such names to** +> **perform `Random Forest` computations with `R`.** +> **This column is now called:** +> +> `Abs.TDD..Lympho_Resting.Trip_CHX.Ref_Trip_0h.3h` +> +> **Once the reviewing process will be over, I will change the** +> **[initial](https://gitbio.ens-lyon.fr/LBMC/RMI2/rmi2_gff_fasta_compilation)** +> **`Python` scripts to take into account the downstream `R` limitations.** + ## Generate a Random Forest model diff --git a/RMI2_Random_Forest.Rmd b/RMI2_Random_Forest.Rmd index d217936..d3316a5 100644 --- a/RMI2_Random_Forest.Rmd +++ b/RMI2_Random_Forest.Rmd @@ -38,7 +38,7 @@ header-includes: \includegraphics[width=1cm]{Logos/logo-rmi2-lab-version-2.png} } - \cfoot{\thepage} - - \lfoot{Written by Labaronne Emmanuel and Cluet David, current version 2023/12/06} + - \lfoot{Written by Labaronne Emmanuel and Cluet David \newline current version 2023/12/06} - \renewcommand{\footrulewidth}{0.4pt} - \pretitle{\begin{center} \includegraphics[width=2cm,height=2cm]{Logos/logo-rmi2-lab-version-2.png}\LARGE\\} @@ -93,7 +93,7 @@ source("src/randomForest.R") ## Fine grain tuning of the Random Forest model -```{r do_RF} +```{r do_RF_please_wait} # Perform the random forest model <- makeRandomForest(x = x, explicit_x = x_explicit) -- GitLab