-
Arnaud Duvermy authoredArnaud Duvermy authored
High-Throughput RNA-seq model fit
Why use HTRfit
HTRfit provides a robust statistical framework that allows you to investigate the essential experimental parameters influencing your ability to detect expression changes. Whether you're examining sequencing depth, the number of replicates, or other critical factors, HTRfit's computational simulation is your go-to solution.
Furthermore, by enabling the inclusion of fixed effects, mixed effects, and interactions in your RNAseq data analysis, HTRfit provides the flexibility needed to lead your differential expression analysis effectively.
Installation
method A:
To install the latest version of HTRfit, run the following in your R console :
if (!requireNamespace("remotes", quietly = TRUE))
install.packages("remotes")
remotes::install_git("https://gitbio.ens-lyon.fr/aduvermy/HTRfit")
method B:
You also have the option to download a release directly from the HTRfit release page. Once you've downloaded the release, simply untar the archive. After that, open your R console and execute the following command, where HTRfit-v1.0.0 should be replaced with the path to the untarred folder:
## -- Example using the HTRfit-v1.0.0 release
install.packages('/HTRfit-v1.0.0', repos = NULL, type='source')
When dependencies are met, installation should take a few minutes.
CRAN packages dependencies
The following depandencies are required:
## -- required
install.packages(c('car', 'parallel', 'data.table', 'ggplot2', 'gridExtra', 'glmmTMB',
'magrittr', 'MASS', 'plotROC', 'reshape2', 'rlang', 'stats', 'utils', 'BiocManager'))
BiocManager::install('S4Vectors', update = FALSE)
## -- optional
BiocManager::install('DESeq2', update = FALSE)
Docker
We have developed Docker images to simplify the package's utilization. For an optimal development and coding experience with the Docker container, we recommend using Visual Studio Code (VSCode) along with the DevContainer extension. This setup provides a convenient and isolated environment for development and testing.
- Install VSCode.
- Install Docker on your system and on VSCode.
- Launch the HTRfit container directly from VSCode
- Install the DevContainer extension for VSCode.
- Launch a remote window connected to the running Docker container.
- Install the R extension for VSCode.
- Enjoy HTRfit !
Biosphere virtual machine
A straightforward way to use HTRfit is to run it on a Virtual Machine (VM) through Biosphere. We recommend utilizing a VM that includes RStudio for an integrated development environment (IDE) experience. Biosphere VM resources can also be scaled according to your simulation needs.
HTRfit can be installed using the method A.
HTRfit simulation workflow
In the realm of RNAseq analysis, various key experimental parameters play a crucial role in influencing the statistical power to detect expression changes. Parameters such as sequencing depth, the number of replicates, and more have a significant impact. To navigate the selection of optimal values for these experimental parameters, we introduce a comprehensive statistical framework known as HTRfit, underpinned by computational simulation. Moreover, HTRfit offers seamless compatibility with DESeq2 outputs, facilitating a comprehensive evaluation of RNAseq analysis.
Getting started
Init a design and simulate RNAseq data
library('HTRfit')
## -- init a design
input_var_list <- init_variable( name = "varA", mu = 0, sd = 0.29, level = 2000) %>%
init_variable( name = "varB", mu = 0.27, sd = 0.6, level = 2) %>%
add_interaction( between_var = c("varA", "varB"), mu = 0.44, sd = 0.89)
## -- simulate RNAseq data
mock_data <- mock_rnaseq(input_var_list,
n_genes = 30000,
min_replicates = 4,
max_replicates = 4 )
The simulation process in HTRfit has been optimized to generate RNAseq counts for 30,000 genes and 4,000 experimental conditions, each replicated 4 times, resulting in a total of 16,000 samples, in less than 5 minutes. However, the object generated by the framework under these conditions can consume a significant amount of RAM, approximately 50 GB. For an equivalent simulation with 6,000 genes, less than a minute and 10 GB of RAM are required.
Fit your model
## -- prepare data & fit a model with mixed effect
data2fit = prepareData2fit(countMatrix = mock_data$counts,
metadata = mock_data$metadata,
normalization = F)
l_tmb <- fitModelParallel(formula = kij ~ varB + (varB | varA),
data = data2fit,
group_by = "geneID",
family = glmmTMB::nbinom2(link = "log"),
n.cores = 1)
Evalutation
## -- evaluation
resSimu <- simulationReport(mock_data,
list_tmb = l_tmb,
coeff_threshold = 0.27,
alt_hypothesis = "greater")