Skip to content
Snippets Groups Projects
Select Git revision
  • dfa9ba9f1f32f0be9de116bb4956b71ee1ea3bb6
  • master default protected
  • v2.1.1
  • v2.1.0
4 results

theoryBehindHtrfit.Rmd

Blame
  • theoryBehindHtrfit.Rmd 1.98 KiB
    title: "Theory behind HTRfit"
    output: rmarkdown::html_vignette
    vignette: >
      %\VignetteIndexEntry{Theory behind HTRfit}
      %\VignetteEngine{knitr::rmarkdown}
      %\VignetteEncoding{UTF-8}
    knitr::opts_chunk$set(
      collapse = TRUE,
      comment = "#>"
    )

    HTRfit simulation workflow

    In the realm of RNAseq analysis, various key experimental parameters play a crucial role in influencing the statistical power to detect expression changes. Parameters such as sequencing depth, the number of replicates, and others are expected to impact statistical power. To navigate the selection of optimal values for these experimental parameters, we introduce a comprehensive statistical framework known as HTRfit, underpinned by computational simulation. Moreover, HTRfit offers seamless compatibility with DESeq2 outputs, facilitating a comprehensive evaluation of RNAseq analysis.

    Theory behind HTRfit

    In this modeling framework, counts denoted as

    KijK_{ij}
    for gene i and sample j are generated using a negative binomial distribution. The negative binomial distribution considers a fitted mean
    μij\mu_{ij}
    and a gene-specific dispersion parameter
    dispersionidispersion_i
    . The fitted mean
    μij\mu_{ij}
    is determined by a parameter,
    qijq_{ij}
    , which is proportionally related to the sum of all effects specified using init_variable() or add_interaction(). If basal gene expressions are provided, the
    μij\mu_{ij}
    values are scaled accordingly using the gene-specific basal expression value (
    bexpribexpr_i
    ). Furthermore, the coefficients
    βi\beta_i
    represent the natural logarithm fold changes for gene i across each column of the model matrix X. The dispersion parameter
    dispersionidispersion_i
    plays a crucial role in defining the relationship between the variance of observed counts and their mean value. In simpler terms, it quantifies how far we expect observed counts to deviate from the mean value for each genes.