Dynamic filtering of epigenome data identifies candidate regions for further analysis. Using successive filtering steps, a genomic dataset with 82,221 hotspots of 5-hydroxymethylcytosine (5hmC) in human ES cells  is refined to a list of 16 regions that provide strong candidates for investigating the functional association between 5hmC and H3K4me1-marked enhancer elements. (a) Filtering with a minimum length threshold of 1 kb yields 5,734 genomic regions. (b) Filtering with a minimum 5hmC hotspot score threshold of 300, which corresponds to a detection significance of 10-30 or better, yields 2,535 genomic regions. (c) Filtering for overlap with H3K4me1 peaks in a human ES cell line (H1hESC) yields 2,334 genomic regions. (d) Filtering for association with genes that are annotated with any of the 1,608 Gene Ontology terms containing the word 'regulation' yields 1,064 genomic regions. (e) Filtering for overlap with an alternative dataset of 5hmC hotspots  yields 99 genomic regions. (f) Filtering for a minimum DNA methylation coverage threshold of five CpGs yields 65 genomic regions. (g) Filtering for intermediate DNA methylation with levels in the range of 20% to 50% yields 16 genomic regions. (h) EpiExplorer screenshot showing the final list of candidate regions, ready for visualization in a genome browser, for download and manual inspection, and for export to other web-based tools for further analysis.
Halachev et al. Genome Biology 2012 13:R96 doi:10.1186/gb-2012-13-10-r96