Skip to content
Snippets Groups Projects

High-Throughput RNA-seq model fit

Installation

  • method A:

To install the latest version of HTRfit, run the following in your R console :

if (!requireNamespace("remotes", quietly = TRUE))
    install.packages("remotes")
remotes::install_git("https://gitbio.ens-lyon.fr/aduvermy/HTRfit")
  • method B:

You also have the option to download a release directly from the HTRfit release page. Once you've downloaded the release, simply untar the archive. After that, open your R console and execute the following command, where HTRfit-v1.0.0 should be replaced with the path to the untarred folder:

## -- Example using the HTRfit-v1.0.0 release
install.packages('/HTRfit-v1.0.0', repos = NULL, type='source')

When dependencies are met, installation should take a few minutes.

CRAN packages dependencies

The following depandencies are required:

## -- required
install.packages(c('car', 'parallel', 'data.table', 'ggplot2', 'gridExtra', 'glmmTMB', 'magrittr', 'MASS', 'plotROC', 'reshape2', 'rlang', 'stats', 'utils', 'BiocManager'))
BiocManager::install('S4Vectors', update = FALSE)
## -- optional 
BiocManager::install('DESeq2', update = FALSE)

Docker

We have developed Docker images to simplify the package's utilization. For an optimal development and coding experience with the Docker container, we recommend using Visual Studio Code (VSCode) along with the DevContainer extension. This setup provides a convenient and isolated environment for development and testing.

  1. Install VSCode
  2. Install Docker on your system and on VSCode
  3. Launch the HTRfit container directly from VSCode
  4. Install the DevContainer extension for VSCode.
  5. Launch a remote window connected to the running Docker container.
  6. Enjoy HTRfit !

HTRfit simulation workflow

In this modeling framework, counts denoted as

K_{ij}
for gene i and sample j are generated using a negative binomial distribution. The negative binomial distribution considers a fitted mean
\mu_{ij}
and a gene-specific dispersion parameter
\alpha_i
.

The fitted mean

\mu_{ij}
is determined by a parameter, qij, which is proportionally related to the sum of all effects specified using init_variable() or add_interaction(). If basal gene expressions are provided, the
\mu_{ij}
values are scaled accordingly using the gene-specific basal expression value (
bexpr_i
).

Furthermore, the coefficients

\beta_i
represent the natural logarithm fold changes for gene i across each column of the model matrix X. The dispersion parameter
\alpha_i
plays a crucial role in defining the relationship between the variance of observed counts and their mean value. In simpler terms, it quantifies how far we expect observed counts to deviate from the mean value.

Getting started

## -- init a design 
input_var_list <- init_variable( name = "varA", mu = 0, sd = 0.29, level = 60) %>%
                  init_variable( name = "varB", mu = 0.27, sd = 0.6, level = 2) %>%
                    add_interaction( between_var = c("varA", "varB"), mu = 0.44, sd = 0.89)
## -- simulate RNAseq data 
mock_data <- mock_rnaseq(input_var_list, 
                         n_genes = 30,
                         min_replicates  = 10,
                         max_replicates = 10, 
                         basal_expression = 5 )
## -- prepare data & fit a model with mixed effect
data2fit = prepareData2fit(countMatrix = mock_data$counts, 
                           metadata =  mock_data$metadata, 
                           normalization = F)
l_tmb <- fitModelParallel(formula = kij ~ varB + (varB | varA),
                          data = data2fit, 
                          group_by = "geneID",
                          family = glmmTMB::nbinom2(link = "log"), 
                          log_file = "log.txt",
                          n.cores = 1)
## -- evaluation
resSimu <- simulationReport(mock_data, 
                            list_tmb = l_tmb,
                            coeff_threshold = 0.27, 
                            alt_hypothesis = "greater")