Skip to content
Snippets Groups Projects
Name Last commit Last update
img
results
src
README.md

Counts simulations & DESEQ2 investigations

Purpose:

  1. Understand how DESEQ2 works
  2. Understand how maximize statistical power
  3. Refined biological protocol (seqeuncing effort, ...)

About DESEQ2

Main step:

1) Estimate size factor

Median ratio method is used to estimate the size factor per sample.

The size factor is used for normalizing counts (per gene per sample). Normalized counts allow minimizing biais linked to library size. By normalizing the counts DESEQ2 aims to make sure differential expression are based on factors study and not to sequencing depth /!\ gene length is not take into account !

2)Estimate dispersion

Purpose: Estimate the variability between replicates

Get dispersion estimate for each gene using Maximum Linkelihood Estimatation
Fit a curve to wise gene dispersion estimate

3) Fit linear model

The differential expression analysis uses a generalized linear model of the form:
Kij ∼ NB(µij , α i )
µij = s j q ij
log 2 (q ij ) = x j. β i

where counts K ij for gene i, sample j are modeled using a Negative Binomial distribution with fitted mean µ ij and a gene-specific dispersion parameter α i . The fitted mean is composed of a sample-specific size factor s j and a parameter q ij proportional to the expected true concentration of fragments for sample j. The coefficients β i give the log2 fold changes for gene i for each column of the model matrix X.

4) Wald Test:

H0: Test if Log(FC) = 0

With DESeq2, the Wald test is the default used for hypothesis testing when comparing two groups. The Wald test is a test of hypothesis usually performed on parameters that have been estimated by maximum likelihood. The Wald test is also a standard way to extract a P value from a regression fit.

HTRSIM