Newer
Older
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
---
title: "Benchmarking HTRfit and DESeq2"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Benchmarking HTRfit and DESeq2}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
```{r setup, warning = FALSE, message = FALSE, results='hide'}
library(HTRfit)
library(DESeq2)
```
HTRfit offers a wrapper for DESeq2 outputs. This functionality allows users to seamlessly integrate the results obtained from DESeq2 into the HTRfit evaluation pipeline. By doing so, you can readily compare the performance of HTRfit with DESeq2 on your RNAseq data. This comparative analysis aids in determining which tool performs better for your specific research goals and dataset
## Simulation
The choice of input parameters plays a critical role in the simulation process. For optimal results, we recommend basing these decisions on real data, as outlined in the [Simulation tutorial](articles/02-tutorial.html)
```{r design_init, warning = FALSE, message = FALSE}
## -- init a design
list_var <- init_variable( name = "genotype", sd = 0.2462256, level = 2) %>%
init_variable( name = "environment", sd = 0.2462256, level = 4) %>%
add_interaction( between_var = c("genotype", "environment"), sd = 0.2462256)
N_GENES <- 100
MIN_REPLICATES <- 4
MAX_REPLICATES <- 4
SEQ_DEPTH <- 5e6
## -- simulate RNAseq data
mock_data <- mock_rnaseq(list_var,
n_genes = N_GENES,
min_replicates = MIN_REPLICATES,
max_replicates = MAX_REPLICATES,
basal_expression = 2,
sequencing_depth = SEQ_DEPTH)
```
## Fit models
```{r data2fit, warning = FALSE, message = FALSE, results = 'hide'}
## -- data from simulation
count_matrix <- mock_data$counts
metaData <- mock_data$metadata
```
#### HTRfit
```{r prepareData_and_fit, warning = FALSE, message = FALSE}
## -- convert counts matrix and samples metadatas in a data frame for fitting
data2fit = prepareData2fit(countMatrix = count_matrix,
metadata = metaData,
normalization = 'MRN',
response_name = "kij")
l_tmb <- fitModelParallel(
formula = kij ~ genotype + environment + genotype:environment,
data = data2fit,
group_by = "geneID",
family = glmmTMB::nbinom2(link = "log"),
n.cores = 1)
```
#### DESeq2
```{r fit_dds, warning = FALSE, message = FALSE, results = 'hide'}
## -- DESeq2
dds <- DESeq2::DESeqDataSetFromMatrix(
countData = count_matrix,
colData = metaData,
design = ~ genotype + environment + genotype:environment )
dds <- DESeq2::DESeq(dds, quiet = TRUE)
```
## Evaluation
Using `evaluation_report()` function, we assess the capability of DESeq2 and HTRfit to identify conditions exhibiting a 25% change in expression. Based on the identity plot, ROC curve, and PR curve, we observe similar results between the two methods.
```{r example-ddsComparison, warning = FALSE, message = FALSE}
## -- get simulation/fit evaluation
resSimu <- evaluation_report(list_tmb = l_tmb,
dds = dds,
mock_obj = mock_data,
coeff_threshold = log(1.25), ## 25% of expression change
alt_hypothesis = "greaterAbs")
```
```{r example-outputResSimu_id, warning = FALSE, message = FALSE, fig.align = 'center', fig.height = 5, fig.width = 7}
## -- Model params
resSimu$identity$params
## -- dispersion
resSimu$identity$dispersion
```
```{r example-outputResSimu_metric, warning = FALSE, message = FALSE, fig.align = 'center', fig.height = 4, fig.width = 7}
## -- precision-recall curve
resSimu$precision_recall$params
## -- ROC curve
resSimu$roc$params
## -- Performances metrics
resSimu$performances
```