Type-I error control. The panels show empirical cumulative distribution functions (ECDFs) for P values from a comparison of one replicate from condition A of the fly RNA-Seq data with the other one. No genes are truly differentially expressed, and the ECDF curves (blue) should remain below the diagonal (gray). Panel (a): top row corresponds to DESeq, middle row to edgeR and bottom row to a Poisson-based χ2 test. The right column shows the distributions for all genes, the left and middle columns show them separately for genes below and above a mean of 100. Panel (b) shows the same data, but zooms into the range of small P values. The plots indicate that edgeR and DESeq control type I error at (and in fact slightly below) the nominal rate, while the Poisson-based χ2 test fails to do so. edgeR has an excess of small P values for low counts: the blue line lies above the diagonal. This excess is, however, compensated by the method being more conservative for high counts. All methods show a point mass at p = 1, this is due to the discreteness of the data, whose effect is particularly evident at low counts.
Anders and Huber Genome Biology 2010 11:R106 doi:10.1186/gb-2010-11-10-r106