rm vignettes temporary

Former-commit-id: d00216303674b79b032d664ed42133be8d1f56fc Former-commit-id: 58153b8018d6b6f5d8dffc93a37eb7862d922e8f Former-commit-id: 2f8239590ec92f3eef17ff52a5de2e9df6e7a3c6

rm vignettes temporary
Former-commit-id: d00216303674b79b032d664ed42133be8d1f56fc Former-commit-id: 58153b8018d6b6f5d8dffc93a37eb7862d922e8f Former-commit-id: 2f8239590ec92f3eef17ff52a5de2e9df6e7a3c6
dffa5912 · Arnaud Duvermy · 37fc3633 · 37fc3633 · 37fc3633 · 37fc3633
Commit dffa5912 authored 11 months ago by Arnaud Duvermy
--- a/vignettes/01-theoryBehindHtrfit.Rmd
+++ b/vignettes/01-theoryBehindHtrfit.Rmd
---
-title: "Theory behind HTRfit"
-output: rmarkdown::html_vignette
-vignette: >
-  %\VignetteIndexEntry{Theory behind HTRfit}
-  %\VignetteEngine{knitr::rmarkdown}
-  %\VignetteEncoding{UTF-8}
---
-
-```{r, include = FALSE}
-knitr::opts_chunk$set(
-  collapse = TRUE,
-  comment = "#>",
-  fig.path = "../man/figures/"
-)
-
-```
-
-
-In the realm of RNAseq analysis, various key experimental parameters play a crucial role in influencing the statistical power to detect expression changes. Parameters such as sequencing depth, the number of replicates, and others are expected to impact statistical power. 
-To navigate the selection of optimal values for these experimental parameters, we introduce a comprehensive statistical framework known as HTRfit, underpinned by computational simulation. Moreover, HTRfit offers seamless compatibility with DESeq2 outputs, facilitating a comprehensive evaluation of RNAseq analysis.
-
-# HTRfit simulation workflow
-
-<div id="bg"  align="center">
-  <img src="../reference/figures/htrfit_workflow.png" width="500" height="300">
-</div> 
-
-In this modeling framework, counts denoted as $K_{ij}$ for gene i and sample j are generated using a negative binomial distribution. The negative binomial distribution considers a fitted mean $\mu_{ij}$ and a gene-specific dispersion parameter $dispersion_i$.
-The fitted mean $\mu_{ij}$ is determined by a parameter, $q_{ij}$, which is proportionally related to the sum of all effects specified using `init_variable()` or `add_interaction()`. If basal gene expressions are provided, the $q_{ij}$ values are scaled accordingly using the gene-specific basal expression value ($bexpr_i$).
-Furthermore, the coefficients $\beta_i$ represent the natural logarithm fold changes for gene i across each column of the model matrix X. The dispersion parameter $dispersion_i$ plays a crucial role in defining the relationship between the variance of observed counts and their mean value. In simpler terms, it quantifies how far we expect observed counts to deviate from the mean value for each genes.
-In addition, HTRfit allows for sequencing depth control using a scalar value specific to each sample ($s_j$) applied on the $\mu_{ij}$ value.
\ No newline at end of file
--- a/vignettes/02-tutorial.Rmd
+++ b/vignettes/02-tutorial.Rmd
---
-title: "Simulation tutorial"
-output: rmarkdown::html_vignette
-vignette: >
-  %\VignetteIndexEntry{Simulation tutorial}
-  %\VignetteEngine{knitr::rmarkdown}
-  %\VignetteEncoding{UTF-8}
---
-
-```{r, include = FALSE}
-knitr::opts_chunk$set(
-  collapse = TRUE,
-  comment = "#>"
-)
-```
-
-```{r setup}
-library(HTRfit)
-```
-
-
-## Initializing design
-
- 
-In this setup, we will use HTRfit to simulate an experimental design involving multiple genotypes and environmental conditions. To do so, we will first need to estimate plausible parameter distributions based on publicly available data.
-
-
-### Loading public dataset
-
-Counts data from the [SRP217588](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE135473) project are embedded in HTRfit package. These data were processed by an RNAseq pipeline using Kallisto to generate count data. This dataset includes count data for 4 distinct environments, 2 genotypes in 6000 genes, providing a valuable resource for estimating distribution parameters.
-
-
-```{r loading_pub_data, warning = FALSE, message = FALSE}
-## -- public data embedded in HTRfit
-pub_data_fn <- system.file("extdata", "rna_pub_data.tsv", package = "HTRfit") 
-pub_data <- read.table(file = pub_data_fn, header = TRUE)
-rownames(pub_data) <- pub_data$gene_id
-pub_data$gene_id <- NULL
-pub_metadata_fn <- system.file("extdata", "rna_pub_metadata.tsv", package = "HTRfit") 
-pub_metadata <- read.table(file = pub_metadata_fn,  header = TRUE)
-
-## -- 6713 genes & 40 samples
-dim(pub_data)
-```
-
-### Fit a model on a public dataset
-
-To estimate plausible parameter distributions, we will fit a generalized linear  model (GLM) to the publicly available data. This GLM will allow us to estimate plausible coefficients betaG, betaE, and betaGxE for 6000 genes. The `tidy_pub_fit` object contains the estimate obtain by HTRfit for each coefficient and each genes.
-
-```{r fit_pub_data, eval=FALSE, message=FALSE, warning=FALSE, include=TRUE}
-# -- prepare data &  fit a model 
-data2fit = prepareData2fit(countMatrix = pub_data, 
-                           metadata = pub_metadata, 
-                           normalization = 'MRN',   
-                           response_name = "kij",
-                           transform = "x+1", ## apply + 1 to each counts
-                           row_threshold = 10) ## remove genes i with all(k_ij) < 10
-l_tmb <- fitModelParallel(
-            formula = kij ~ genotype + environment + genotype:environment,
-            data = data2fit, 
-            group_by = "geneID",
-            family = glmmTMB::nbinom2(link = "log"), 
-            n.cores = 1)
-
-## -- Select best fit base on AIC results
-l_genes <- identifyTopFit(l_tmb)
-## -- extract results for best genes
-tidy_pub_fit <- tidy_results(l_tmb[l_genes])
-```
-
-### Estimating parameter distributions
-
-#### Beta coefficients parameters
-
-Now let's examine the distributions of the estimated parameters. Means, standard deviation and correlation between these parameters are inferred using the following lines :
-
-```{r load_fit_pub_data, include = FALSE}
-## -- hidded in vignette
-## -- allows to not evaluate previous chunk (a bit long)
-tidy_pub_fn <- system.file("extdata", "tidy_pub_results.tsv", package = "HTRfit") 
-tidy_pub_fit <- read.table(file = tidy_pub_fn,  header = TRUE)
-dispersion_pub_fn <-  system.file("extdata", "pub_dispersion.tsv", package = "HTRfit")
-dispersion_pub <- read.table(file = dispersion_pub_fn,  header = TRUE)$x
-```
-
-
-```{r beta_pub_data, warning = FALSE, message = FALSE}
-# -- Obtain the parameters
-all_beta_pub <- reshape2::dcast(tidy_pub_fit, formula = ID ~ term, value.var = "estimate")
-columns_env <-  c("environmentenv2","environmentenv3", "environmentenv4" )
-beta_env <- rowMeans(all_beta_pub[, columns_env])
-columns_interaction <-  c("genotype2:environmentenv2","genotype2:environmentenv3", "genotype2:environmentenv4" )
-beta_interaction <- rowMeans( all_beta_pub[ , columns_interaction] )
-# -- Group the parameters
-grouped_beta_pub <- data.frame( betaG = all_beta_pub$genotype2, 
-                                betaE = beta_env, 
-                                betaGxE = beta_interaction )
-
-# -- Obtain statistics for the parameters
-beta0_pub = all_beta_pub$`(Intercept)`
-## -- public effect standard deviation 
-sapply(grouped_beta_pub, sd)
-
-quantile(beta0_pub, prob=c(.33,.66,1))
-hist(beta0_pub)
-```
-
-#### Dispersion parameter
-
-We also save the dispersion parameter obtain from the public dataset.
-
-```{r pub_dispersion, warning = FALSE, message = FALSE, eval = FALSE}
-dispersion_pub  <- glance_tmb(l_tmb)$dispersion
-```
-
-
-## Mimic the public RNAseq data with HTRfit
-
-To reproduce the design of the public [SRP217588](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE135473) project, 2 levels of genotype and 4 levels of environment are initialized using `init_variable()`. By using `add_interaction()`, interaction effects between genotype and environment (betaGxE) are set.  For all of them, mean and standard deviation are set concordantly to values obtained from the public dataset coefficient distribution (betaG, betaE, betaGxE).
-By using the estimated intercept obtained from the previous analysis of publicly available data as the basal expression level for each gene in the simulated dataset, we ensure that the basal expression levels for our simulated genes are consistent with those in the publicly available data.
-
-
-```{r design_init, warning = FALSE, message = FALSE}
-## -- design to simulate
-N_GENES <- 100 
-MIN_REPLICATES <- 5
-MAX_REPLICATES <- 5
-SEQ_DEPTH <- 5e6
-
-## -- init effects to simulate
-input_var_list <- init_variable( name = "genotype", mu = 0, sd = 2.18, level = 2) %>%
-                            init_variable( name = "environment", mu = 0, sd = 0.57, level = 4 ) %>%
-                            add_interaction(between_var = c("genotype", "environment"), mu = 0 , sd = 1.018)
-
-## -- simulate RNAseq data 
-mock_data <- mock_rnaseq(input_var_list, 
-                         n_genes = N_GENES,
-                         min_replicates  = MIN_REPLICATES,
-                         max_replicates = MAX_REPLICATES,
-                         basal_expression = beta0_pub,
-                         sequencing_depth = SEQ_DEPTH,
-                         dispersion = dispersion_pub,
-                         normal_distr = 'univariate' ) ## for fix effect
-```
-
-## Fit and evaluate a model on simulated RNAseq data
-
-### Fit model
-
-We fit a model to the simulated RNAseq data using the `fitModelParallel()` function. Our model includes terms for genotype, environment, and the interaction between genotype and environment.
-
-
-```{r fit_model, warning = FALSE, message = FALSE }
-
-## -- prepare data & fit a model with mixed effect
-data2fit = prepareData2fit(countMatrix = mock_data$counts, 
-                           metadata =  mock_data$metadata, 
-                           normalization = 'MRN',   
-                           response_name = "kij",
-                           transform = "x+1", ## apply + 1 to each counts
-                           row_threshold = 10) ## remove genes i with all(k_ij) < 10
-
-l_tmb <- fitModelParallel(
-          formula = kij ~ genotype + environment + genotype:environment,
-          data = data2fit, 
-          group_by = "geneID",
-          family = glmmTMB::nbinom2(link = "log"), 
-          n.cores = 1, 
-          control = glmmTMB::glmmTMBControl(optCtrl=list(iter.max=1e5, eval.max=1e5)))
-
-```
-
-### Evaluate fit
-
-Next, we evaluate the fitting results obtained from the simulated RNAseq data by comparing the expected results computed by `evaluation_report()`, with those obtained with  `fitModelParallel()` : `l_tmb`. 
-
-```{r evaluation, warning = FALSE, message = FALSE}
-
-## -- eval params for Wald test
-ln_FC_threshold <- 0.67
-alternative_hypothesis <- "greaterAbs"
-
-## -- remove genes with very low AIC
-l_genes <- identifyTopFit(l_tmb)
-
-## -- evaluation
-resSimu <- evaluation_report( list_tmb = l_tmb,
-                              list_gene = l_genes,
-                              mock_obj = mock_data,
-                              coeff_threshold = ln_FC_threshold, 
-                              alt_hypothesis = alternative_hypothesis )
-```
-
-## Explore evaluation results
-
-### Identity plot
-
-The `evaluation_report()` function produces an identity plot, offering a visual tool to juxtapose the simulated effects (actual effects) with the model-inferred effects. This graphical representation simplifies assessing how well the model aligns with the values of the simulated effects, enabling a visual analysis of the model's goodness of fit to the simulated data. For a more quantitative evaluation of inference quality, the R-squared (R2) and RMSE metrics can be employed.
-
-### Model parameters
-
-```{r id_params , warning = FALSE, message = FALSE, fig.align='center', fig.height=4, fig.width=7}
-## -- Model params
-resSimu$identity$params
-```
-
-### Dispersion parameter 
-
-The dispersion plot, generated by the `evaluation_report()` function, offers a visual comparison of the dispersion parameters used in the simulation \(dispersion_i\) with those estimated by the model. This graphical representation provides an intuitive way to assess the alignment between the simulated dispersion values and the model-inferred values, enabling a visual evaluation of how well the model captures the underlying data characteristics.
-
-```{r id_disp, warning = FALSE, message = FALSE, fig.align='center', fig.height=4, fig.width=5}
-## -- Dispersion params
-resSimu$identity$dispersion
-```
-
-### ROC curve
-
-The Receiver Operating Characteristic (ROC) curve is a valuable tool for assessing the performance of classification models, particularly in the context of identifying differentially expressed genes. It provides a graphical representation of the model's ability to distinguish between genes that are differentially expressed and those that are not, by varying the `coeff_threshold` and the `alt_hypothesis` parameters.
-
-
-```{r roc, warning = FALSE, message = FALSE, fig.align='center', fig.height=4, fig.width=7}
-## -- ROC curve
-resSimu$roc$params
-```
-
-
-### Precision-recall curve
-
-The precision-recall curve (PR curve) illustrates the relationship between precision and recall at various classification thresholds. It is particularly valuable in the context of imbalanced data, where one class is significantly more prevalent than the other. Unlike the ROC curve, which can be influenced by class distribution, the PR curve focuses on the model's ability to correctly identify examples of the minority class while maintaining high precision.
-
-```{r pr_curve, warning = FALSE, message = FALSE, fig.align='center', fig.height=4, fig.width=7}
-## -- precision recall by params
-resSimu$precision_recall$params
-```
-
-### AUC and classification performance metrics 
-
-The area under the ROC curve (AUC) provides a single metric that summarizes the model's overall performance in distinguishing between differentially expressed and non-differentially expressed genes. A higher AUC indicates better model performance.  In the case of the ROC curve, this AUC should be compared to the value of 0.5, representing a random classifier. On the other hand, for the PR curve, we compute the pr_AUC_random, serving as a baseline for comparison. 
-In addition to evaluating model performance through the AUC for both ROC and PR curves, we provide access to key classification metrics, including Accuracy, Precision, Recall (or Sensitivity), and Specificity. These metrics offer a comprehensive view of the model's classification capabilities. These metrics are computed for each parameter (excluding the intercept when skip_eval_intercept = TRUE), providing detailed insights into individual parameter contributions. Furthermore, an aggregated assessment is performed, considering all parameters (except the intercept by default), offering a perspective on the model's overall classification effectiveness.
-
-```{r perf, warning = FALSE, message = FALSE, fig.align='center', fig.height=4, fig.width=7}
-## -- precision-recall curve
-resSimu$performances$byparams
-```
-
-## Mimic the public RNAseq data with HTRfit - increasing GxE interaction 
-
-To initialized GxE interaction to 0, set `sd = 0`.
-
-```{r design_init_+GxE, warning = FALSE, message = FALSE}
-## -- design to simulate
-N_GENES <- 100 
-MIN_REPLICATES <- 5
-MAX_REPLICATES <- 5
-SEQ_DEPTH <- 5e6
-
-## -- init effects to simulate
-input_var_list <- init_variable( name = "genotype", mu = 0, sd = 2.18, level = 2) %>%
-                  init_variable( name = "environment", mu = 0, sd = 0.57, level = 4 ) %>%
-                  add_interaction(between_var = c("genotype", "environment"), mu = 0 , sd = 3)
-
-## -- simulate RNAseq data 
-mock_data <- mock_rnaseq(input_var_list, 
-                         n_genes = N_GENES,
-                         min_replicates  = MIN_REPLICATES,
-                         max_replicates = MAX_REPLICATES,
-                         basal_expression = beta0_pub,
-                         sequencing_depth = SEQ_DEPTH,
-                         dispersion = dispersion_pub,
-                         normal_distr = 'univariate' ) ## for fix effect
-```
-
-### Fit and evaluate a model on simulated RNAseq data
-
-We fit a model to the simulated RNAseq data using the `fitModelParallel()` function. Our model includes terms for genotype, environment, and the interaction between genotype and environment (GxE).
-
-```{r fit_model_+GxE, warning = FALSE, message = FALSE }
-
-## -- prepare data & fit a model with mixed effect
-data2fit = prepareData2fit(countMatrix = mock_data$counts, 
-                           metadata =  mock_data$metadata, 
-                           normalization = 'MRN',   
-                           response_name = "kij",
-                           transform = "x+1", ## apply + 1 to each counts
-                           row_threshold = 10) ## remove genes i with all(k_ij) < 10
-l_tmb <- fitModelParallel(
-          formula = kij ~ genotype + environment + genotype:environment,
-          data = data2fit, 
-          group_by = "geneID",
-          family = glmmTMB::nbinom2(link = "log"), 
-          n.cores = 1, 
-          control = glmmTMB::glmmTMBControl(optCtrl=list(iter.max=1e5, eval.max=1e5)))
-```
-
-### Evaluate fit
-
-Next, we evaluate the fitting results obtained from the simulated RNAseq data by comparing the expected results computed by `evaluation_report()`, with those obtained with  `fitModelParallel()` : `l_tmb`. 
-
-```{r evaluation_+GxE, warning = FALSE, message = FALSE}
-
-## -- eval params for Wald test
-ln_FC_threshold <- 0.67
-alternative_hypothesis <- "greaterAbs"
-
-## -- remove genes with very low AIC
-l_genes <- identifyTopFit(l_tmb)
-
-## -- evaluation
-resSimu <- evaluation_report( list_tmb = l_tmb,
-                              list_gene = l_genes,
-                              mock_obj = mock_data,
-                              coeff_threshold = ln_FC_threshold, 
-                              alt_hypothesis = alternative_hypothesis)
-```
-
-The identity plot allows direct appreciation of the absence of GxE interaction.
-
-```{r id_params_+GxE , warning = FALSE, message = FALSE, fig.align='center', fig.height=4, fig.width=7}
-## -- Model params
-resSimu$identity$params
-```
-
-
-## Mimic the public RNAseq data with HTRfit - without GxE interaction 
-
-To initialized GxE interaction to 0, set `sd = 0`.
-
-```{r design_init_0GxE, warning = FALSE, message = FALSE}
-## -- design to simulate
-N_GENES <- 100 
-MIN_REPLICATES <- 5
-MAX_REPLICATES <- 5
-SEQ_DEPTH <- 5e6
-
-## -- init effects to simulate
-input_var_list <- init_variable( name = "genotype", mu = 0, sd = 2.18, level = 2) %>%
-                  init_variable( name = "environment", mu = 0, sd = 0.57, level = 4 ) %>%
-                  add_interaction(between_var = c("genotype", "environment"), mu = 0 , sd = 0) ## sd = 0
-
-## -- simulate RNAseq data 
-mock_data <- mock_rnaseq(input_var_list, 
-                         n_genes = N_GENES,
-                         min_replicates  = MIN_REPLICATES,
-                         max_replicates = MAX_REPLICATES,
-                         basal_expression = beta0_pub,
-                         sequencing_depth = SEQ_DEPTH,
-                         dispersion = dispersion_pub,
-                         normal_distr = 'univariate' ) ## for fix effect
-```
-
-### Fit and evaluate a model on simulated RNAseq data
-
-We fit a model to the simulated RNAseq data using the `fitModelParallel()` function. Our model includes terms for genotype, environment, and the interaction between genotype and environment.
-
-```{r fit_model_0GxE, warning = FALSE, message = FALSE }
-
-## -- prepare data & fit a model with mixed effect
-data2fit = prepareData2fit(countMatrix = mock_data$counts, 
-                           metadata =  mock_data$metadata, 
-                           normalization = 'MRN',   
-                           response_name = "kij",
-                           transform = "x+1", ## apply + 1 to each counts
-                           row_threshold = 10) ## remove genes i with all(k_ij) < 10
-l_tmb <- fitModelParallel(
-          formula = kij ~ genotype + environment + genotype:environment,
-          data = data2fit, 
-          group_by = "geneID",
-          family = glmmTMB::nbinom2(link = "log"), 
-          n.cores = 1, 
-          control = glmmTMB::glmmTMBControl(optCtrl=list(iter.max=1e5, eval.max=1e5)))
-```
-
-Next, we evaluate the fitting results obtained from the simulated RNAseq data by comparing the expected results computed by `evaluation_report()`, with those obtained with  `fitModelParallel()` : `l_tmb`. 
-
-```{r evaluation_0GxE, warning = FALSE, message = FALSE}
-
-## -- eval params for Wald test
-ln_FC_threshold <- 0.67
-alternative_hypothesis <- "greaterAbs"
-
-## -- remove genes with very low AIC
-l_genes <- identifyTopFit(l_tmb)
-
-## -- evaluation
-resSimu <- evaluation_report( list_tmb = l_tmb,
-                              list_gene = l_genes,
-                              mock_obj = mock_data,
-                              coeff_threshold = ln_FC_threshold, 
-                              alt_hypothesis = alternative_hypothesis)
-```
-
-The identity plot allows direct appreciation of the absence of GxE interaction.
-
-```{r id_params_0GxE , warning = FALSE, message = FALSE, fig.align='center', fig.height=4, fig.width=7}
-## -- Model params
-resSimu$identity$params + 
-  ggplot2::theme(axis.text.x = ggplot2::element_text(size = 7, angle = 45, 
-                                                      vjust = 0.9, hjust=1))
-```
\ No newline at end of file
--- a/vignettes/03-rnaseq_analysis.Rmd
+++ b/vignettes/03-rnaseq_analysis.Rmd
---
-title: "RNAseq analysis"
-output: rmarkdown::html_vignette
-vignette: >
-  %\VignetteIndexEntry{RNAseq analysis}
-  %\VignetteEngine{knitr::rmarkdown}
-  %\VignetteEncoding{UTF-8}
---
-
-```{r, include = FALSE}
-knitr::opts_chunk$set(
-  collapse = TRUE,
-  comment = "#>"
-)
-```
-
-```{r setup}
-library(HTRfit)
-```
-
-
-
-In RNAseq, we employ Generalized Linear Models (GLM) to unravel how genes respond to various experimental conditions. These models assist in deciphering the specific impacts of experimental variables on gene expression.HTRfit can be utilized to analyze such RNAseq data, providing a robust framework for exploring and interpreting the intricate relationships between genes and experimental conditions.
-
-## Input data 
-
-HTRfit analysis necessitates a count matrix and sample metadata, in the form of dataframes.
-Notice that gene_id have to be specified as rownames of `count_matrix`.
-
-```{r create_data , include = FALSE}
-## -- hided in vignette
-## -- simulate small example to prevent excessively lengthy vignette construction
-list_var <- init_variable( name = "genotype", mu = 3, sd = 0.2, level = 2) %>%
-            init_variable( name = "environment", mu = 2, sd = 0.43, level = 2) %>%
-            add_interaction( between_var = c("genotype", "environment"), mu = 0.44, sd = 0.2)
-N_GENES = 30 
-MIN_REPLICATES = 4
-MAX_REPLICATES = 4
-BASAL_EXPR = 3
-mock_data <- mock_rnaseq(list_var, N_GENES,
-                         min_replicates  = MIN_REPLICATES,
-                         max_replicates = MAX_REPLICATES,
-                         basal_expression = BASAL_EXPR)
-########################
-
-## -- data from simulation or real data
-count_matrix <- mock_data$counts
-metaData <- mock_data$metadata
-##############################
-```
-
-
-```{r display_input }
-## -- gene count matrix
-count_matrix[1:4, 1:2]
-## -- samples metadata
-head(metaData)
-```
-
-
-## Prepare data for fitting
-
-The `prepareData2fit()` function serves the purpose of converting the counts matrix and sample metadata into a dataframe that is compatible with downstream **HTRfit** functions designed for model fitting. This function also includes an option to perform median ratio normalization and transformation on the data counts.
-
-
-```{r example-prepareData, warning = FALSE, message = FALSE}
-## -- convert counts matrix and samples metadatas in a data frame for fitting
-data2fit = prepareData2fit(
-             countMatrix = count_matrix, 
-             metadata =  metaData, 
-             normalization = NULL,
-             response_name = "kij")
-
-
-## -- median ratio normalization
-data2fit = prepareData2fit(
-             countMatrix = count_matrix, 
-             metadata =  metaData, 
-             normalization = 'MRN', 
-             response_name = "kij")
-
-## -- apply + 1 transformation on each counts
-data2fit = prepareData2fit(
-             countMatrix = count_matrix, 
-             metadata =  metaData, 
-             normalization = 'MRN', 
-             transform = "x+1",
-             response_name = "kij")
-```
-
-## Fit model from your data
-
-The `fitModelParallel()` function enables independent model fitting for each gene. The number of threads used for this process can be controlled by the `n.cores` parameter. 
-
-
-```{r example-fitModelParallel, warning = FALSE, message = FALSE}
-l_tmb <- fitModelParallel(
-          formula = kij ~ genotype + environment  + genotype:environment,
-          data = data2fit, 
-          group_by = "geneID",
-          family = glmmTMB::nbinom2(link = "log"), 
-          n.cores = 1)
-```
-
-## Use mixed effect in your model
-
-HTRfit uses the **glmmTMB** functions for model fitting algorithms. This choice allows for the utilization of random effects within your formula design. For further details on how to specify your model, please refer to the [mixed model documentation](https://rdrr.io/cran/glmmTMB/man/glmmTMBControl.html).
-
-```{r example-fitModelParallel_mixed, warning = FALSE, message = FALSE}
-l_tmb <- fitModelParallel(
-          formula = kij ~ genotype + ( 1 | environment ),
-          data = data2fit, 
-          group_by = "geneID",
-          family = glmmTMB::nbinom2(link = "log"), 
-          n.cores = 1)
-```
-
-## Additional settings
-
-The function provides precise control over model settings for fitting optimization, including options for specifying the [model family](https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/family) and [model control setting](https://rdrr.io/cran/glmmTMB/man/glmmTMBControl.html). By default, a Gaussian family model is fitted, but for RNA-seq data, it is highly recommended to specify `family = glmmTMB::nbinom2(link = "log")`.
-
-
-
-```{r example-fitModelParallel_addSet, warning = FALSE, message = FALSE}
-l_tmb <- fitModelParallel(
-          formula = kij ~ genotype + environment  + genotype:environment,
-          data = data2fit, 
-          group_by = "geneID",
-          n.cores = 1, 
-          family = glmmTMB::nbinom2(link = "log"),
-          control = glmmTMB::glmmTMBControl(optCtrl=list(iter.max=1e5,
-                                                         eval.max=1e5)))
-```
-
-## Extracts a tidy result table from a list tmb object
-
-The tidy_results function extracts a data frame containing estimates of ln(fold changes), standard errors, test statistics, p-values, and adjusted p-values for fixed effects. Additionally, it provides access to correlation terms and standard deviations for random effects, offering a detailed view of HTRfit modeling results.
-
-
-```{r example-tidyRes, warning = FALSE, message = FALSE}
-## -- get tidy results
-my_tidy_res <- tidy_results(l_tmb, coeff_threshold = 0.1, 
-                            alternative_hypothesis = "greaterAbs")
-## -- head
-my_tidy_res[1:3,]
-```
-
-## Update fit
-
-The `updateParallel()` function updates and re-fits a model for each gene. It offers options similar to those in `fitModelParallel()`. In addition, it is possible to modify the reference level of the categorical variable used in your model in order to use different contrast.
-
-
-```{r example-update, warning = FALSE, message = FALSE}
-## -- update your fit modifying the model family
-l_tmb <- updateParallel(
-          formula =  kij ~ genotype + environment  + genotype:environment,
-          list_tmb = l_tmb ,
-          family = gaussian(), 
-          n.cores = 1)
-
-## -- update fit using additional model control settings
-l_tmb <- updateParallel(
-          formula =  kij ~ genotype + environment  + genotype:environment ,
-          list_tmb = l_tmb ,
-          family = gaussian(), 
-          n.cores = 1,
-          control = glmmTMB::glmmTMBControl(optCtrl=list(iter.max=1e3,
-                                                         eval.max=1e3)))
-
-
-## -- update your model formula and your family model
-l_tmb <- updateParallel(
-            formula =   kij ~ genotype + environment  ,
-            list_tmb = l_tmb ,
-            family = glmmTMB::nbinom2(link = "log"), 
-            n.cores = 1)
-
-## -- modif reference levels 
-## -- genotype reference = "genotype2"
-## -- environment reference = "environment2"
-l_tmb <- updateParallel(
-            formula =   kij ~ genotype + environment  ,
-            list_tmb = l_tmb ,
-            family = glmmTMB::nbinom2(link = "log"), 
-            n.cores = 1, 
-            reference_labels = c("genotype2", "environment2")) 
-```
-
-#### Struture of list tmb object
-
-```{r example-str_obj_l_tmb, warning = FALSE, message = FALSE}
-str(l_tmb$gene1, max.level = 1)
-```
-
-## Plot fit metrics
-
-Visualizing fit metrics is essential for evaluating your models. Here, we show you how to generate various plots to assess the quality of your models. You can explore all metrics or focus on specific aspects like dispersion and log-likelihood.
-
-
-```{r example-plotMetrics, warning = FALSE, message = FALSE, fig.align = 'center', fig.height = 4, fig.width = 8}
-## -- plot all metrics
-diagnostic_plot(list_tmb = l_tmb)
-```
-
-```{r example-plotMetricsFocus, warning = FALSE, message = FALSE, fig.align = 'center', fig.height = 3, fig.width = 8}
-## -- Focus on metrics
-diagnostic_plot(list_tmb = l_tmb, focus = c("dispersion", "logLik"))
-```
-
-## Select best fit based on diagnostic metrics
-
-The identifyTopFit function allows for selecting genes that best fit models, based on various metrics such as AIC, BIC, LogLik, deviance, dispersion . This function provides the flexibility to employ multiple filtering methods to identify genes most relevant for further analysis. We implemented a filtering method based on the Median Absolute Deviation (MAD). For example, using the MAD method with a tolerance threshold of 3, the function identifies genes whose metric values are greater ("top") or lower ("low") than `median(metric) - 3 * MAD(metric)`. The keep option allows choosing whether to retain genes with metric values above or below the MAD threshold. 
-
-```{r example-identifytopfit, warning = FALSE, message = FALSE}
-identifyTopFit(list_tmb = l_tmb, metric = "AIC", 
-               filter_method = "mad", keep = "top", 
-               sort = TRUE, decreasing = TRUE, 
-               mad_tolerance = 3)
-```
-
-
-## Anova to select the best model
-
-Utilizing the `anovaParallel()` function enables you to perform model selection by assessing the significance of the fixed effects. You can also include additional parameters like type. For more details, refer to [car::Anova](https://rdrr.io/cran/car/man/Anova.html).
-
-
-```{r example-anova, warning = FALSE, message = FALSE}
-## -- update your fit modifying the model family
-l_anova <- anovaParallel(list_tmb = l_tmb)
-
-## -- additional settings
-l_anova <- anovaParallel(list_tmb = l_tmb, type = "III" )
-
-```
--- a/vignettes/04-htrfit_vs_deseq2.Rmd
+++ b/vignettes/04-htrfit_vs_deseq2.Rmd
---
-title: "Benchmarking HTRfit and DESeq2"
-output: rmarkdown::html_vignette
-vignette: >
-  %\VignetteIndexEntry{Benchmarking HTRfit and DESeq2}
-  %\VignetteEngine{knitr::rmarkdown}
-  %\VignetteEncoding{UTF-8}
---
-
-
-```{r, include = FALSE}
-knitr::opts_chunk$set(
-  collapse = TRUE,
-  comment = "#>"
-)
-```
-
-
-```{r setup, warning = FALSE, message = FALSE, results='hide'}
-library(HTRfit)
-library(DESeq2)
-```
-
-
-HTRfit offers a wrapper for DESeq2 outputs. This functionality allows users to seamlessly integrate the results obtained from DESeq2 into the HTRfit evaluation pipeline. By doing so, you can readily compare the performance of HTRfit with DESeq2 on your RNAseq data. This comparative analysis aids in determining which tool performs better for your specific research goals and dataset
-
-
-## Simulation
-
-
-```{r design_init, warning = FALSE, message = FALSE}
-## -- init a design 
-list_var <- init_variable( name = "genotype", mu = 0, sd = 0.29, level = 2) %>%
-            init_variable( name = "environment", mu = 0.27, sd = 0.6, level = 4) %>%
-            add_interaction( between_var = c("genotype", "environment"), mu = 0.44, sd = 0.89)
-              
-
-N_GENES <- 100
-MIN_REPLICATES <- 4
-MAX_REPLICATES <- 4
-SEQ_DEPTH <- 5e6
-
-## -- simulate RNAseq data 
-mock_data <- mock_rnaseq(list_var, 
-                         n_genes = N_GENES,
-                         min_replicates  = MIN_REPLICATES,
-                         max_replicates = MAX_REPLICATES,
-                         basal_expression = 2,
-                         sequencing_depth = SEQ_DEPTH)
-```
-
-
-
-## Fit models
-
-```{r data2fit,  warning = FALSE, message = FALSE, results = 'hide'}
-## -- data from simulation or real data
-count_matrix <- mock_data$counts
-metaData <- mock_data$metadata
-```
-
-
-#### HTRfit
-```{r prepareData_and_fit, warning = FALSE, message = FALSE}
-## -- convert counts matrix and samples metadatas in a data frame for fitting
-data2fit = prepareData2fit(countMatrix = count_matrix, 
-                           metadata =  metaData, 
-                           normalization = 'MRN',
-                           response_name = "kij")
-
-l_tmb <- fitModelParallel(
-          formula = kij ~ genotype + environment  + genotype:environment,
-          data = data2fit, 
-          group_by = "geneID",
-          family = glmmTMB::nbinom2(link = "log"), 
-          n.cores = 1)
-```
-
-
-#### DESeq2
-
-```{r fit_dds,  warning = FALSE, message = FALSE, results = 'hide'}
-## -- DESeq2
-dds <- DESeq2::DESeqDataSetFromMatrix(
-  countData = count_matrix,
-  colData = metaData,
-  design = ~ genotype + environment  + genotype:environment )
-dds <- DESeq2::DESeq(dds, quiet = TRUE)
-```
-
-
-## Evaluation
-
-```{r example-ddsComparison, warning = FALSE, message = FALSE}
-## -- get simulation/fit evaluation
-resSimu <- evaluation_report(list_tmb = l_tmb,
-                             dds = dds,
-                             mock_obj = mock_data,
-                             coeff_threshold = 0.4,
-                             alt_hypothesis = "greaterAbs")
-```
-
-
-
-```{r example-outputResSimu_id, warning = FALSE, message = FALSE, fig.align = 'center', fig.height = 4, fig.width = 5}
-## -- Model params
-resSimu$identity$params
-## -- dispersion 
-resSimu$identity$dispersion
-```
-
-```{r example-outputResSimu_metric, warning = FALSE, message = FALSE, fig.align = 'center', fig.height = 4, fig.width = 7}
-## -- precision-recall curve
-resSimu$precision_recall$params
-## -- ROC curve
-resSimu$roc$params
-## -- Performances metrics
-resSimu$performances
-```
--- a/vignettes/05-list_var_object.Rmd
+++ b/vignettes/05-list_var_object.Rmd
---
-title: "Object - list var"
-output: rmarkdown::html_vignette
-vignette: >
-  %\VignetteIndexEntry{Object - list var}
-  %\VignetteEngine{knitr::rmarkdown}
-  %\VignetteEncoding{UTF-8}
---
-
-```{r, include = FALSE}
-knitr::opts_chunk$set(
-  collapse = TRUE,
-  comment = "#>"
-)
-```
-
-
-```{r setup}
-library(HTRfit)
-```
-
-The `init_variable()` function is used for defining the variables in your experimental design. Sizes of effects for each variable and interaction term can be defined in two different ways: 1) The user can manually sets values of all levels of a variable, in which case the effects are necessarily considered fixed in the model; 2) The effects can be randomly picked in a normal distribution with mean and standard deviation defined by the user, in which case the user can decide whether the effects are considered fixed or random in the model.
-
-It is crucial to highlight that the choice of these parameters plays a critical role in the simulation process. For optimal results, we recommend basing these decisions on real data, as outlined in the [Simulation tutorial](articles/02-tutorial.html)
-
-## Manually init a single variable
-
-The `init_variable()` function allows for precise control over the variables in your experimental design. 
-In this example, we manually initialize **varA** with specific size effects (mu) for three levels.
-
-```{r example-init_variable_man, warning = FALSE, message = FALSE}
-list_var <- init_variable( name = "varA", mu = c(0.2, 4, -3), level = 3)
-```
-
-## Randomly init a single variable
-
-Alternatively, you can randomly initialize **varA** by specifying a mean (mu) and standard deviation (sd). 
-This introduces variability into **varA**, making it either a fixed or random effect in your design.
-
-
-```{r example-init_variable_rand, warning = FALSE, message = FALSE}
-list_var <- init_variable( name = "varA", mu = 10, sd = 0.2, level = 5) 
-```
-
-## Randomly init several variables
-
-You can also initialize multiple variables, such as **varA** and **varB**, with random values. 
-This flexibility allows you to create diverse experimental designs.
-
-
-```{r example-init_variable_mult, warning = FALSE, message = FALSE}
-list_var <- init_variable( name = "varA", mu = 10, sd = 0.2, level = 5) %>%
-            init_variable( name = "varB", mu = -3, sd = 0.34, level = 2)
-```
-
-## Add interaction between variable
-
-Similarly to `init_variable()`, `add_interaction()` allows to define an interaction between variable.
-
-In this example, we initialize **varA** and **varB**, and create an interaction between **varA**, and **varB** using `add_interaction()`.
-
-
-
-```{r example-add_interaction, warning = FALSE, message = FALSE}
-list_var <- init_variable( name = "varA", mu = 3, sd = 0.2, level = 2) %>%
-            init_variable( name = "varB", mu = 2, sd = 0.43, level = 2) %>%
-            add_interaction( between_var = c("varA", "varB"), mu = 0.44, sd = 0.2)
-```
-
-## Complex design
-
-Interactions can involve a maximum of three variables, such as **varA**, **varB**, and **varC**.
-
-```{r example-add_interaction_complex, eval = FALSE, message = FALSE, warning = FALSE, include = TRUE}
-list_var <- init_variable( name = "varA", mu = 5, sd = 0.2, level = 2) %>%
-            init_variable( name = "varB", mu = 1, sd = 0.78, level = 2) %>%
-            init_variable( name = "varC", mu = c(2, 3), sd = NA, level = 2) %>%
-            add_interaction( between_var = c("varA", "varC"), mu = 0.44, sd = 0.2) %>%
-            add_interaction( between_var = c("varA", "varB"), mu = 0.43, sd = 0.37) %>%
-            add_interaction( between_var = c("varB", "varC"), mu = -0.33, sd = 0.12) %>%
-            add_interaction( between_var = c("varA", "varB" ,"varC"), mu = 0.87, sd = 0.18)
-```
-
-
-## Control correlation between effects
-
-HTRfit offers a convenient way to control the correlation between variables during simulation. 
-The following code demonstrates how variables, initialized with <ins> random values using mu and sd </ins>, can be further refined by setting the correlation between these variables. If the `set_correlation()` function is not used, the default correlation value is 0. 
-
-
-```{r example-set_corr, warning = FALSE, message = FALSE}
-list_var <- init_variable( name = "varA", mu = 3, sd = 0.2, level = 2) %>%
-            init_variable( name = "varB", mu = c(1,3), level = 2) %>%
-            add_interaction( between_var = c("varA", "varB"), mu = 0.44, sd = 0.2) %>%
-            set_correlation(between_var = c("varA", "varA:varB"), corr = -0.99)
-## -- simulate RNAseq data 
-mock_data <- mock_rnaseq(list_var, 
-                         n_genes = 100,
-                         min_replicates  = 4,
-                         max_replicates = 4)
-```
-
-Observe the  -0.99 correlation in the `mock_obj`: 
-
-```{r simu_corr_var}
-plot(mock_data$groundTruth$effects$varA, 
-     mock_data$groundTruth$effects$`varA:varB`, 
-     xlab = "varA", ylab= "varA:varB")
-```
-
-
-## Structure of list_var object 
-
-The list_var object collected all the information needed to generate a [`mock_rnaseq` object](06-mock_rnaseq_object.html).
-
-```{r example-str_obj_init, warning = FALSE, message = FALSE}
-str(list_var)
-```
--- a/vignettes/06-mock_rnaseq_object.Rmd
+++ b/vignettes/06-mock_rnaseq_object.Rmd
---
-title: "Object - Mock rnaseq"
-output: rmarkdown::html_vignette
-vignette: >
-  %\VignetteIndexEntry{Object - Mock rnaseq }
-  %\VignetteEngine{knitr::rmarkdown}
-  %\VignetteEncoding{UTF-8}
---
-
-```{r, include = FALSE}
-knitr::opts_chunk$set(
-  collapse = TRUE,
-  comment = "#>"
-)
-```
-
-
-
-```{r setup}
-library(HTRfit)
-```
-
-In this section, you will explore how to generate RNAseq data based on the [`list var`](05-list_var_object.html) object. The `mock_rnaseq()` function enables you to manage parameters in your RNAseq design, including number of genes, minimum and maximum number of replicates within your experimental setup, sequencing depth, basal expression of each gene, and dispersion of gene expression used for simulating counts.
-
-## Minimal example
-
-```{r example-mock_rnaseq_min, warning = FALSE, message = FALSE}
-list_var <- init_variable( name = "varA", mu = 3, sd = 0.2, level = 2) %>%
-            init_variable( name = "varB", mu = 2, sd = 0.43, level = 2) %>%
-            add_interaction( between_var = c("varA", "varB"), mu = 0.44, sd = 0.2)
-
-## -- Required parameters
-N_GENES = 6000
-MIN_REPLICATES = 2
-MAX_REPLICATES = 10
-########################
-
-## -- simulate RNAseq data based on list_var, minimum input required
-## -- number of replicate randomly defined between MIN_REP and MAX_REP
-mock_data <- mock_rnaseq(list_var, N_GENES,
-                         min_replicates  = MIN_REPLICATES,
-                         max_replicates = MAX_REPLICATES)
-
-## -- simulate RNAseq data based on list_var, minimum input required
-## -- Same number of replicates between conditions
-mock_data <- mock_rnaseq(list_var, N_GENES,
-                         min_replicates  = MAX_REPLICATES,
-                         max_replicates = MAX_REPLICATES)
-```
-
-## Scaling genes counts based on sequencing depth
-
-Sequencing depth is a critical parameter affecting the statistical power of an RNAseq analysis. With the `sequencing_depth` option in the `mock_rnaseq()` function, you have the ability to control this parameter.
-
-
-```{r example-mock_rnaseq_seqDepth, warning = FALSE, message = FALSE}
-## -- Required parameters
-N_GENES = 6000
-MIN_REPLICATES = 2
-MAX_REPLICATES = 10
-########################
-
-SEQ_DEPTH = c(100000, 5000000, 10000000)## -- Possible number of reads/sample
-SEQ_DEPTH =  10000000 ## -- all samples have same number of reads
-mock_data <- mock_rnaseq(list_var, N_GENES,
-                         min_replicates  = MIN_REPLICATES,
-                         max_replicates = MAX_REPLICATES,
-                         sequencing_depth = SEQ_DEPTH)
-```
-
-## Set dispersion of gene expression
-
-The dispersion parameter ($dispersion_i$), characterizes the relationship between the variance of the observed read counts and its mean value. In simple terms, it quantifies how much we expect observed counts to deviate from the mean value. You can specify the dispersion for individual genes using the dispersion parameter.
-
-
-
-```{r example-mock_rnaseq_disp, warning = FALSE, message = FALSE}
-
-## -- Required parameters
-N_GENES = 6000
-MIN_REPLICATES = 2
-MAX_REPLICATES = 4
-########################
-
-DISP = 0.1 ## -- Same dispersion for each genes
-DISP = 1000 ## -- Same dispersion for each genes
-DISP = runif(N_GENES, 0, 1000) ## -- Dispersion can vary between genes
-mock_data <- mock_rnaseq(list_var, N_GENES,
-                         min_replicates  = MIN_REPLICATES,
-                         max_replicates = MAX_REPLICATES,
-                         dispersion = DISP  )
-
-```
-
-## Set basal gene expression
-
-The basal gene expression parameter, defined using the basal_expression option, allows you to control the fundamental baseline of expression level of genes. When a single value is specified, the basal expression of all genes is the same and the effects of variables defined in list_var are applied from this basal expression level. When several values are specified, the basal expression of each gene is picked randomly among these values (with replacement). The effects of variables are then applied from the basal expression of each gene. This represents the fact that expression levels can vary among genes even when not considering the effects of the defined variables (for example, some genes are constitutively more highly expressed than others because they have stronger basal promoters).
-
-
-```{r example-mock_rnaseq_bexpr, warning = FALSE, message = FALSE}
-## -- Required parameters
-N_GENES = 6000
-MIN_REPLICATES = 10
-MAX_REPLICATES = 10
-########################
-
-BASAL_EXPR = -3 ## -- Value can be negative to simulate low expressed gene
-BASAL_EXPR = 2 ## -- Same basal gene expression for the N_GENES
-BASAL_EXPR = c( -3, -1, 2, 8, 9, 10 ) ## -- Basal expression can vary between genes
-mock_data <- mock_rnaseq(list_var, N_GENES,
-                         min_replicates  = MIN_REPLICATES,
-                         max_replicates = MAX_REPLICATES,
-                         basal_expression = BASAL_EXPR)
-
-```
-
-## Mock RNAseq object
-
-```{r example-str_obj_mock, warning = FALSE, message = FALSE}
-str(mock_data, max.level = 1)
-```
--- a/vignettes/07-rnaseq_notOnly.Rmd
+++ b/vignettes/07-rnaseq_notOnly.Rmd
---
-title: "Not only RNAseq"
-output: rmarkdown::html_vignette
-vignette: >
-  %\VignetteIndexEntry{Not only RNAseq}
-  %\VignetteEngine{knitr::rmarkdown}
-  %\VignetteEncoding{UTF-8}
---
-
-```{r, include = FALSE}
-knitr::opts_chunk$set(
-  collapse = TRUE,
-  comment = "#>"
-)
-```
-
-
-```{r setup}
-library(HTRfit)
-```
-
-In the realm of RNAseq analysis, it's widely acknowledged that counts follow a Negative Binomial distribution. Hence, we recommend employing `family = glmmTMB::nbinom2(link = "log")` in the parameters of fitModelParallel() to align with this distribution.
-In RNAseq it is well known that counts have a Negative binomial distribution that's why we recommand using `family = glmmTMB::nbinom2(link = "log")` in `fitModelParallel()` parameters.
-
-```{r distrib_rnaseq,  warning = FALSE, message = FALSE, fig.align='center', fig.height=4, fig.width=4 }
-pub_data_fn <- system.file("extdata", "rna_pub_data.tsv", package = "HTRfit") 
-pub_data <- read.table(file = pub_data_fn, header = TRUE)
-plot(density(unlist(log10(round(pub_data[,-1])))), main = "Density of log RNAseq counts ")
-```
-
-While HTRfit is optimized for RNA-seq data, it's worth noting that the model family can be customized to accommodate other data types. In a different context, the model family can be adapted according to the nature of your response data. 
-To illustrate, consider an attempt to model Iris sepal length based on sepal width using the iris dataframe. Evaluating the distribution shape of the variable to be modeled (Sepal.Width), we observe a normal distribution.
-
-
-```{r distrib_iris, warning = FALSE, message = FALSE, fig.align='center', fig.height=4, fig.width=5}
-data("iris")
-plot(density(unlist(iris$Sepal.Width)))
-```
-
-Given this observation, we opt to fit a model for each species using a Gaussian family model (default). Details about model family [here](https://search.r-project.org/CRAN/refmans/glmmTMB/html/nbinom2.html).
-
-
-```{r example-fitModelParallel_nonRNA, warning = FALSE, message = FALSE}
-l_tmb_iris <- fitModelParallel(
-                formula =  Sepal.Width ~ Sepal.Length ,
-                data = iris,
-                group_by = "Species",
-                family = stats::gaussian(),
-                n.cores = 1)
-```
-
--- a/vignettes/08-evaluationMetrics.Rmd
+++ b/vignettes/08-evaluationMetrics.Rmd
---
-title: "About evaluation metrics"
-output: rmarkdown::html_vignette
-vignette: >
-  %\VignetteIndexEntry{About evaluation metrics}
-  %\VignetteEngine{knitr::rmarkdown}
-  %\VignetteEncoding{UTF-8}
---
-
-```{r, include = FALSE}
-knitr::opts_chunk$set(
-  collapse = TRUE,
-  comment = "#>"
-)
-```
-
-
-## Root Mean Square Error (RMSE)
-
-RMSE measures the average deviation between predicted values and actual values. It is calculated as follows:
-
-\[ \text{RMSE} = \sqrt{\frac{1}{n}\sum_{i=1}^{n}(y_i - \hat{y}_i)^2} \]
-
-## R-squared (R²)
-
-R² measures the proportion of the variance in the dependent variable that is predictable from the independent variable(s). It is calculated as follows:
-
-\[ R^2 = 1 - \frac{\sum_{i=1}^{n}(y_i - \hat{y}_i)^2}{\sum_{i=1}^{n}(y_i - \bar{y})^2} \]
-
-## Precision-Recall Area Under Curve (PR-AUC)
-
-PR-AUC measures the area under the precision-recall curve. It is often used in binary classification tasks where the class distribution is imbalanced.
-
-## Performance Ratio
-
-performance ratio is calculated as the ratio of PR-AUC to the PR-AUC of a random classifier (`pr_randm_AUC`). 
-It provides a measure of how well the model performs compared to a random baseline.
-
-\[ performanceRatio =  \frac{PR_{AUC}}{PR_{randomAUC}} \]
-
-
-## Receiver Operating Characteristic Area Under Curve (ROC AUC)
-
-ROC AUC measures the area under the receiver operating characteristic curve. It evaluates the classifier's ability to distinguish between classes.
-
-## Confusion Matrix
-
-|                 | Predicted Negative | Predicted Positive |
-|:---------------:|:------------------:|:------------------:|
-| Actual Negative | TN                 | FP                 |
-| Actual Positive | FN                 | TP                 |
-
-## Accuracy
-
-Accuracy measures the proportion of correct predictions out of the total predictions made by the model. It is calculated as follows:
-
-\[
-\text{Accuracy} = \frac{\text{TP} + \text{TN}}{\text{TP} + \text{TN} + \text{FP} + \text{FN}}
-\]
-
-## Specificity
-
-Specificity measures the proportion of true negatives out of all actual negatives. It is calculated as follows:
-
-\[
-\text{Specificity} = \frac{\text{TN}}{\text{TN} + \text{FP}}
-\]
-
-## Recall (Sensitivity)
-
-Recall, also known as sensitivity, measures the proportion of true positives out of all actual positives. It indicates the model's ability to correctly identify positive instances.
-
-\[ \text{Recall} = \text{Sensitivity} = \frac{\text{TP}}{\text{TP} + \text{FN}} \]
-
-## Precision
-
-Precision measures the proportion of true positives out of all predicted positives. It indicates the model's ability to avoid false positives.
-
-\[ \text{Precision} = \frac{\text{TP}}{\text{TP} + \text{FP}} \]
-
-
--- a/vignettes/09-FAQ.Rmd
+++ b/vignettes/09-FAQ.Rmd
---
-title: "Frequently Asked Questions"
-output: rmarkdown::html_vignette
-vignette: >
-  %\VignetteIndexEntry{Frequently Asked Questions}
-  %\VignetteEngine{knitr::rmarkdown}
-  %\VignetteEncoding{UTF-8}
---
-
-```{r, include = FALSE}
-knitr::opts_chunk$set(
-  collapse = TRUE,
-  comment = "#>"
-)
-```
-
-
-## Why mu has no effect within simulation
-
-some element of response
-
-## Difference between norm_distrib = 'univariate' & = 'multivariate'
-
-some element of response
-
-
-## Why use `transform = 'x+1'` with `prepareData2Fit()`
-
-some element of response. Zero Nightmare
-
-
-## Why use `row_threshold = 10` with `prepareData2Fit()`
-
-some element of response