Accounting for relevant component-specific covariates results in the optimal classification of background and enriched components for a simulated data set. (a, b) Density plots showing the distribution of background (blue shading) and enriched (black circles) simulated counts (y-axis) versus G/C content (x-axis). Window counts were simulated with either (a) a low proportion of high signal-to-noise sites or (b) a high proportion of low signal-to-noise sites. In this example G/C content had a positive and negative relationship with the background and enriched components, respectively. (c, d) Receiver operating characteristic (ROC) curves for the performance of three different component-specific covariate model formulations, including no covariates (model 1, red dashed line), G/C content modeling the background and zero-inflated components (model 2, green dashed line) and G/C content modeling the background, zero-inflated and enriched components (model 3, black solid line). Classification results for the simulated (c) low proportion of high signal-to-noise sites and (d) high proportion of low signal-to-noise sites. Utilization of relevant covariates in each component resulted in better classification outcomes (model 3). This impact is greater in lower signal-to-noise data (d), where it is more difficult to distinguish enrichment from background. (e, f) Scatter plot of G/C content (x-axis) versus simulated window counts (y-axis) using model 3 to estimate the posterior probability of a window being enriched, which is depicted as a color gradient. Lighter colors correspond to higher posterior probability and a greater likelihood of being enriched. Posterior probabilities for the simulated (e) low proportion of high signal-to-noise sites and (f) high proportion of low signal-to-noise sites are shown along with model estimates for the background (solid black line) and enriched components (dashed black line).
Rashid et al. Genome Biology 2011 12:R67 doi:10.1186/gb-2011-12-7-r67