Skip to content
Snippets Groups Projects
07-rnaseq_notOnly.Rmd 2.17 KiB
Newer Older
Arnaud Duvermy's avatar
Arnaud Duvermy committed
---
title: "Not only RNAseq"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Not only RNAseq}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```


```{r setup}
library(HTRfit)
```

In the realm of RNAseq analysis, it's widely acknowledged that counts follow a Negative Binomial distribution. Hence, we recommend employing `family = glmmTMB::nbinom2(link = "log")` in the parameters of `fitModelParallel()` to align with this distribution.
In RNAseq it is well known that counts have a Negative binomial distribution that's why we recommand using `family = glmmTMB::nbinom2(link = "log")` in `fitModelParallel()` parameters.

```{r distrib_rnaseq,  warning = FALSE, message = FALSE, fig.align='center', fig.height=4, fig.width=4 }
pub_data_fn <- system.file("extdata", "pub_data_rna.tsv", package = "HTRfit") 
pub_data <- read.table(file = pub_data_fn, header = TRUE)
plot(density(unlist(round(pub_data[,-1]))), log='x', main = "Density of RNAseq counts ")
```

While HTRfit is optimized for RNA-seq data, it's worth noting that the model family can be customized to accommodate other data types. In a different context, the model family can be adapted according to the nature of your response data. 
To illustrate, consider an attempt to model Iris sepal length based on sepal width using the iris dataframe. Evaluating the distribution shape of the variable to be modeled (Sepal.Width), we observe a normal distribution.


```{r distrib_iris, warning = FALSE, message = FALSE, fig.align='center', fig.height=4, fig.width=5}
data("iris")
plot(density(unlist(iris$Sepal.Width)))
```

Given this observation, we opt to fit a model for each species using a Gaussian family model (default). Details about model family [here](https://search.r-project.org/CRAN/refmans/glmmTMB/html/nbinom2.html).


```{r example-fitModelParallel_nonRNA, warning = FALSE, message = FALSE}
l_tmb_iris <- fitModelParallel(
                formula =  Sepal.Width ~ Sepal.Length ,
                data = iris,
                group_by = "Species",
                family = stats::gaussian(),
                n.cores = 1)
```