Median ratio method is used to estimate the size factor per sample.
The size factor is used for normalizing counts (per gene per sample).
Normalized counts allow minimizing biais linked to library size.
By normalizing the counts DESEQ2 aims to make sure differential expression are based on factors study and not to sequencing depth
/!\ gene length is not take into account !
2) Estimate dispersion
Purpose: Estimate the variability between replicates
Get dispersion estimate for each gene using Maximum Linkelihood Estimatation
Fit a curve to wise gene dispersion estimate
3) Fit linear model
The differential expression analysis uses a generalized linear model of the form:
The differential expression analysis uses a generalized linear model of the form:
Kij ∼ NB(µij , α i )
Kij ∼ NB(µij , α i )<br/>
µij = s j q ij
µij = s j q ij<br/>
log 2 (q ij ) = x j. β i
log 2 (q ij ) = x j. β i<br/>
where counts K ij for gene i, sample j are modeled using a Negative Binomial distribution with
where counts K ij for gene i, sample j are modeled using a Negative Binomial distribution with
fitted mean µ ij and a gene-specific dispersion parameter α i . The fitted mean is composed of a
fitted mean µ ij and a gene-specific dispersion parameter α i . The fitted mean is composed of a
sample-specific size factor s j and a parameter q ij proportional to the expected true concentration
sample-specific size factor s j and a parameter q ij proportional to the expected true concentration
of fragments for sample j. The coefficients β i give the log2 fold changes for gene i for each col-
of fragments for sample j. The coefficients β i give the log2 fold changes for gene i for each col-
umn of the model matrix X. The sample-specific size factors can be replaced by gene-specific
umn of the model matrix X. The sample-specific size factors can be see as the logFC between condition.
normalization factors for each sample using normalizationFactors.
4) Wald Test H0: Log(FC) = 0
Experiments without replicates do not allow for estimation of the dispersion of counts around the
expected value for each group, which is critical for differential expression analysis. Analysis with-out replicates is no longer supported since v1.22.