Skip to content
Snippets Groups Projects
rnaseq_notOnly.Rmd 2.19 KiB
title: "Not only RNAseq"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Not only RNAseq}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(HTRfit)

In the realm of RNAseq analysis, it's widely acknowledged that counts follow a Negative Binomial distribution. Hence, we recommend employing family = glmmTMB::nbinom2(link = "log") in the parameters of fitModelParallel() to align with this distribution. In RNAseq it is well known that counts have a Negative binomial distribution that's why we recommand using family = glmmTMB::nbinom2(link = "log") in fitModelParallel() parameters.

pub_data_fn <- system.file("extdata", "rna_pub_data.tsv", package = "HTRfit") 
pub_data <- read.table(file = pub_data_fn, header = TRUE)
plot(density(unlist(log10(round(pub_data[,-1])))), main = "Density of log RNAseq counts ")

While HTRfit is optimized for RNA-seq data, it's worth noting that the model family can be customized to accommodate other data types. In a different context, the model family can be adapted according to the nature of your response data. To illustrate, consider an attempt to model Iris sepal length based on sepal width using the iris dataframe. Evaluating the distribution shape of the variable to be modeled (Sepal.Width), we observe a normal distribution.

data("iris")
plot(density(unlist(iris$Sepal.Width)))

Given this observation, we opt to fit a model for each species using a Gaussian family model. Details about model family here.

l_tmb_iris <- fitModelParallel(
                formula =  Sepal.Width ~ Sepal.Length ,
                data = iris,
                group_by = "Species",
                family = gaussian(),
                n.cores = 1)

tidy_results(l_tmb_iris)