ZINBA provides a unified framework for the detection of enriched sites across a wide variety of DNA-seq datasets. (a) A 100-kb region of chromosome 2 at the ATF2 gene locus illustrating the diversity of enrichment patterns in DNA-seq data, which includes histone H3 lysine 36 tri-methylation (H3K36me3), CCCTC-binding factor (CTCF) and RNA polymerase II (RNA Pol II) ChIP-seq along with the FAIRE-seq and DNase-seq assays. Data for each of the DNA-seq experiments are displayed as the number of overlapping extended reads at each base pair, which was produced by the indicated groups and is available from the UCSC genome browser. (b) ZINBA comprises three steps that can each operate as an independent module. In step 1, the set of aligned reads from the experiment along with a set of covariate measures are collated for each contiguous non-overlapping window spanning the genome. In step 2, the component-specific model formulations of covariates are employed by the mixture regression framework to compute the posterior probability of each window belonging to either the zero-inflated, background or enriched components. The component-specific model formulations of covariates can be generated using an automated model selection procedure or specified by the user. In step 3, the windows exceeding the user-specified probability threshold (default 0.95) are merged to form broad regions of enrichment and a shape detection algorithm is employed on the read overlap representation of the data to refine the boundary estimates of distinct punctate peaks. BED, browser extensible data; BIC, Bayesian information criterion.
Rashid et al. Genome Biology 2011 12:R67 doi:10.1186/gb-2011-12-7-r67