04-htrfit_vs_deseq2.Rmd

---
title: "Benchmarking HTRfit and DESeq2"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Benchmarking HTRfit and DESeq2}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---


```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```


```{r setup, warning = FALSE, message = FALSE, results='hide'}
library(HTRfit)
library(DESeq2)
```


**HTRfit** offers a wrapper for **DESeq2** outputs. This functionality allows users to seamlessly integrate the results obtained from **DESeq2** into the **HTRfit** evaluation pipeline. By doing so, you can readily compare the performance of **HTRfit** with **DESeq2** on your RNAseq data. This comparative analysis aids in determining which tool performs better for your specific research goals and dataset


## Simulation


```{r design_init, warning = FALSE, message = FALSE}
## -- init a design 
list_var <- init_variable( name = "genotype", mu = 0, sd = 0.29, level = 2) %>%
            init_variable( name = "environment", mu = 0.27, sd = 0.6, level = 4) %>%
            add_interaction( between_var = c("genotype", "environment"), mu = 0.44, sd = 0.89)
              

N_GENES <- 100
MIN_REPLICATES <- 4
MAX_REPLICATES <- 4
SEQ_DEPTH <- 5e6

## -- simulate RNAseq data 
mock_data <- mock_rnaseq(list_var, 
                         n_genes = N_GENES,
                         min_replicates  = MIN_REPLICATES,
                         max_replicates = MAX_REPLICATES,
                         basal_expression = 2,
                         sequencing_depth = SEQ_DEPTH)
```


## Fit models

```{r data2fit,  warning = FALSE, message = FALSE, results = 'hide'}
## -- data from simulation or real data
count_matrix <- mock_data$counts
metaData <- mock_data$metadata
```


#### HTRfit
```{r prepareData_and_fit, warning = FALSE, message = FALSE}
## -- convert counts matrix and samples metadatas in a data frame for fitting
data2fit = prepareData2fit(countMatrix = count_matrix, 
                           metadata =  metaData, 
                           normalization = T,
                           response_name = "kij")

l_tmb <- fitModelParallel(
          formula = kij ~ genotype + environment  + genotype:environment,
          data = data2fit, 
          group_by = "geneID",
          family = glmmTMB::nbinom2(link = "log"), 
          n.cores = 1)
```


#### DESeq2

```{r fit_dds,  warning = FALSE, message = FALSE, results = 'hide'}
## -- DESeq2
dds <- DESeq2::DESeqDataSetFromMatrix(
  countData = count_matrix,
  colData = metaData,
  design = ~ genotype + environment  + genotype:environment )
dds <- DESeq2::DESeq(dds, quiet = TRUE)
```


## Evaluation

```{r example-ddsComparison, warning = FALSE, message = FALSE}
## -- get simulation/fit evaluation
resSimu <- evaluation_report(list_tmb = l_tmb,
                             dds = dds,
                             mock_obj = mock_data,
                             coeff_threshold = 0.4,
                             alt_hypothesis = "greaterAbs")
```


```{r example-outputResSimu_id, warning = FALSE, message = FALSE, fig.align = 'center', fig.height = 4, fig.width = 5}
## -- Model params
resSimu$identity$params
## -- dispersion 
resSimu$identity$dispersion
```

```{r example-outputResSimu_metric, warning = FALSE, message = FALSE, fig.align = 'center', fig.height = 4, fig.width = 7}
## -- precision-recall curve
resSimu$precision_recall$params
## -- ROC curve
resSimu$roc$params
## -- Performances metrics
resSimu$performances
```