Skip to content
Snippets Groups Projects
Commit f5fc2c69 authored by dcluet's avatar dcluet
Browse files

Add explanation for column name changes

parent 850aeba1
No related branches found
No related tags found
No related merge requests found
......@@ -100,20 +100,51 @@ The final arborescence should be:
* filtred_genes_Lympho_Activated.csv
* DESeqresults_lympho_activation_untreated.csv
> The `filtred_genes_Lympho_Resting.csv` and
> `filtred_genes_Lympho_Activated.csv` files
> are used to focus the machine learning process on pre-selected transcripts.
> **NOTA BENE**
>
> This selection was based on the gene normalized read counts of the 3h
> Triptolide libraries (~5000 genes in Resting and ~6000 genes in Activated
> T-cells). Of these, only genes with completed observations in all biological
> replicates (including ribosome profiling libraries) and for all
> transcript features used to build the model, as well as at least 15% of
> observed degradation at 3h, were kept for further analysis.
> **The `filtred_genes_Lympho_Resting.csv` and**
> **`filtred_genes_Lympho_Activated.csv` files**
> **are used to focus the machine learning process on pre-selected transcripts.**
>
> **This selection was based on the gene normalized read counts of the 3h**
> **Triptolide libraries (~5000 genes in Resting and ~6000 genes in Activated**
> **T-cells). Of these, only genes with completed observations in all biological**
> **replicates (including ribosome profiling libraries) and for all**
> **transcript features used to build the model, as well as at least 15% of**
> **observed degradation at 3h, were kept for further analysis.**
>
> **The `DESeqresults_lympho_activation_untreated.csv` file is used to add the**
> **`log2FoldChange` to the database.**
> **This column is used as parameter for the `deltaTDD` and `deltaTID` scores.**
>
> **Has we had a lot of experimental conditions and time. Scores required**
> **to "explicitely" indicate how they have been obtained.**
> **In order to have the optimal tracking of which experimental data have been**
> **used as input and how (we had initially several metrics: relative,**
> **absolute, ...). Thus I generated some "complex" column names allowing to**
> **precisely use the correct column and can be handled without any**
> **modification by `Python` pandas library. For exemple the**
> **$TDD_{index}$ column name**
> **is [initialy](https://gitbio.ens-lyon.fr/LBMC/RMI2/rmi2_gff_fasta_compilation):**
>
> The `DESeqresults_lympho_activation_untreated.csv` file is used to add the
> `log2FoldChange` to the database.
> This column is used as parameter for the `deltaTDD` and `deltaTID` scores.
> `Abs(TDD)>Lympho_Resting>Trip_CHX>Ref_Trip_0h>3h`
>
> **Meaning that the `Absolute TDD` score has been computed for (`>`)**
> **the `Lympho` in the `Resting` status using (`>`) the `Trip_CHX` treatment**
> **condition, with the `Trip` treatment condition at `0h` as reference `Ref`,**
> **and computed with `>` t = `3h`.**
>
> **Nevertheless due to restrictions in column names in `R`, some characters like**
> **`()` and `>`. Thus, Emmanuel Labaronne had to change such names to**
> **perform `Random Forest` computations with `R`.**
> **This column is now called:**
>
> `Abs.TDD..Lympho_Resting.Trip_CHX.Ref_Trip_0h.3h`
>
> **Once the reviewing process will be over, I will change the**
> **[initial](https://gitbio.ens-lyon.fr/LBMC/RMI2/rmi2_gff_fasta_compilation)**
> **`Python` scripts to take into account the downstream `R` limitations.**
## Generate a Random Forest model
......
......@@ -38,7 +38,7 @@ header-includes:
\includegraphics[width=1cm]{Logos/logo-rmi2-lab-version-2.png}
}
- \cfoot{\thepage}
- \lfoot{Written by Labaronne Emmanuel and Cluet David, current version 2023/12/06}
- \lfoot{Written by Labaronne Emmanuel and Cluet David \newline current version 2023/12/06}
- \renewcommand{\footrulewidth}{0.4pt}
- \pretitle{\begin{center}
\includegraphics[width=2cm,height=2cm]{Logos/logo-rmi2-lab-version-2.png}\LARGE\\}
......@@ -93,7 +93,7 @@ source("src/randomForest.R")
## Fine grain tuning of the Random Forest model
```{r do_RF}
```{r do_RF_please_wait}
# Perform the random forest
model <- makeRandomForest(x = x,
explicit_x = x_explicit)
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please to comment