Snippets Groups Projects

theoryBehindHtrfit.Rmd

modif readme and add release in website

Arnaud Duvermy authored Jan 22, 2024

dfa9ba9f

dfa9ba9f Jan 22, 2024

theoryBehindHtrfit.Rmd 1.98 KiB

title: "Theory behind HTRfit"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Theory behind HTRfit}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

HTRfit simulation workflow

In the realm of RNAseq analysis, various key experimental parameters play a crucial role in influencing the statistical power to detect expression changes. Parameters such as sequencing depth, the number of replicates, and others are expected to impact statistical power. To navigate the selection of optimal values for these experimental parameters, we introduce a comprehensive statistical framework known as HTRfit, underpinned by computational simulation. Moreover, HTRfit offers seamless compatibility with DESeq2 outputs, facilitating a comprehensive evaluation of RNAseq analysis.

Theory behind HTRfit

In this modeling framework, counts denoted as

K_{ij}

for gene i and sample j are generated using a negative binomial distribution. The negative binomial distribution considers a fitted mean

\mu_{ij}

and a gene-specific dispersion parameter

dispersion_i

. The fitted mean

\mu_{ij}

is determined by a parameter,

q_{ij}

, which is proportionally related to the sum of all effects specified using init_variable() or add_interaction(). If basal gene expressions are provided, the

\mu_{ij}

values are scaled accordingly using the gene-specific basal expression value (

bexpr_i

). Furthermore, the coefficients

\beta_i

represent the natural logarithm fold changes for gene i across each column of the model matrix X. The dispersion parameter

dispersion_i

plays a crucial role in defining the relationship between the variance of observed counts and their mean value. In simpler terms, it quantifies how far we expect observed counts to deviate from the mean value for each genes.