Skip to content
Snippets Groups Projects
Commit f5fc2c69 authored by dcluet's avatar dcluet
Browse files

Add explanation for column name changes

parent 850aeba1
No related branches found
No related tags found
No related merge requests found
...@@ -100,20 +100,51 @@ The final arborescence should be: ...@@ -100,20 +100,51 @@ The final arborescence should be:
* filtred_genes_Lympho_Activated.csv * filtred_genes_Lympho_Activated.csv
* DESeqresults_lympho_activation_untreated.csv * DESeqresults_lympho_activation_untreated.csv
> The `filtred_genes_Lympho_Resting.csv` and > **NOTA BENE**
> `filtred_genes_Lympho_Activated.csv` files
> are used to focus the machine learning process on pre-selected transcripts.
> >
> This selection was based on the gene normalized read counts of the 3h > **The `filtred_genes_Lympho_Resting.csv` and**
> Triptolide libraries (~5000 genes in Resting and ~6000 genes in Activated > **`filtred_genes_Lympho_Activated.csv` files**
> T-cells). Of these, only genes with completed observations in all biological > **are used to focus the machine learning process on pre-selected transcripts.**
> replicates (including ribosome profiling libraries) and for all
> transcript features used to build the model, as well as at least 15% of
> observed degradation at 3h, were kept for further analysis.
> >
> The `DESeqresults_lympho_activation_untreated.csv` file is used to add the > **This selection was based on the gene normalized read counts of the 3h**
> `log2FoldChange` to the database. > **Triptolide libraries (~5000 genes in Resting and ~6000 genes in Activated**
> This column is used as parameter for the `deltaTDD` and `deltaTID` scores. > **T-cells). Of these, only genes with completed observations in all biological**
> **replicates (including ribosome profiling libraries) and for all**
> **transcript features used to build the model, as well as at least 15% of**
> **observed degradation at 3h, were kept for further analysis.**
>
> **The `DESeqresults_lympho_activation_untreated.csv` file is used to add the**
> **`log2FoldChange` to the database.**
> **This column is used as parameter for the `deltaTDD` and `deltaTID` scores.**
>
> **Has we had a lot of experimental conditions and time. Scores required**
> **to "explicitely" indicate how they have been obtained.**
> **In order to have the optimal tracking of which experimental data have been**
> **used as input and how (we had initially several metrics: relative,**
> **absolute, ...). Thus I generated some "complex" column names allowing to**
> **precisely use the correct column and can be handled without any**
> **modification by `Python` pandas library. For exemple the**
> **$TDD_{index}$ column name**
> **is [initialy](https://gitbio.ens-lyon.fr/LBMC/RMI2/rmi2_gff_fasta_compilation):**
>
> `Abs(TDD)>Lympho_Resting>Trip_CHX>Ref_Trip_0h>3h`
>
> **Meaning that the `Absolute TDD` score has been computed for (`>`)**
> **the `Lympho` in the `Resting` status using (`>`) the `Trip_CHX` treatment**
> **condition, with the `Trip` treatment condition at `0h` as reference `Ref`,**
> **and computed with `>` t = `3h`.**
>
> **Nevertheless due to restrictions in column names in `R`, some characters like**
> **`()` and `>`. Thus, Emmanuel Labaronne had to change such names to**
> **perform `Random Forest` computations with `R`.**
> **This column is now called:**
>
> `Abs.TDD..Lympho_Resting.Trip_CHX.Ref_Trip_0h.3h`
>
> **Once the reviewing process will be over, I will change the**
> **[initial](https://gitbio.ens-lyon.fr/LBMC/RMI2/rmi2_gff_fasta_compilation)**
> **`Python` scripts to take into account the downstream `R` limitations.**
## Generate a Random Forest model ## Generate a Random Forest model
......
...@@ -38,7 +38,7 @@ header-includes: ...@@ -38,7 +38,7 @@ header-includes:
\includegraphics[width=1cm]{Logos/logo-rmi2-lab-version-2.png} \includegraphics[width=1cm]{Logos/logo-rmi2-lab-version-2.png}
} }
- \cfoot{\thepage} - \cfoot{\thepage}
- \lfoot{Written by Labaronne Emmanuel and Cluet David, current version 2023/12/06} - \lfoot{Written by Labaronne Emmanuel and Cluet David \newline current version 2023/12/06}
- \renewcommand{\footrulewidth}{0.4pt} - \renewcommand{\footrulewidth}{0.4pt}
- \pretitle{\begin{center} - \pretitle{\begin{center}
\includegraphics[width=2cm,height=2cm]{Logos/logo-rmi2-lab-version-2.png}\LARGE\\} \includegraphics[width=2cm,height=2cm]{Logos/logo-rmi2-lab-version-2.png}\LARGE\\}
...@@ -93,7 +93,7 @@ source("src/randomForest.R") ...@@ -93,7 +93,7 @@ source("src/randomForest.R")
## Fine grain tuning of the Random Forest model ## Fine grain tuning of the Random Forest model
```{r do_RF} ```{r do_RF_please_wait}
# Perform the random forest # Perform the random forest
model <- makeRandomForest(x = x, model <- makeRandomForest(x = x,
explicit_x = x_explicit) explicit_x = x_explicit)
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment