Open Access Open Badges Software

PEMer: a computational framework with simulation-based error models for inferring genomic structural variants from massive paired-end sequencing data

Jan O Korbel123*, Alexej Abyzov3, Xinmeng Jasmine Mu4, Nicholas Carriero5, Philip Cayting3, Zhengdong Zhang3, Michael Snyder34 and Mark B Gerstein3456

Author Affiliations

1 Gene Expression Unit, European Molecular Biology Laboratory (EMBL), Meyerhofstr., Heidelberg, 69117, Germany

2 EMBL Outstation Hinxton, EMBL-European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK

3 Molecular Biophysics and Biochemistry Department, Yale University, Whitney Ave, New Haven, CT 06520, USA

4 Department of Molecular, Cellular, and Developmental Biology, Yale University, Whitney Ave, New Haven, CT 06520, USA

5 Department of Computer Science, Yale University, Prospect Street, New Haven, CT 06511, USA

6 Program in Computational Biology and Bioinformatics, Yale University, Whitney Ave, New Haven, CT 06520, USA

For all author emails, please log on.

Genome Biology 2009, 10:R23  doi:10.1186/gb-2009-10-2-r23

Published: 23 February 2009


Personal-genomics endeavors, such as the 1000 Genomes project, are generating maps of genomic structural variants by analyzing ends of massively sequenced genome fragments. To process these we developed Paired-End Mapper (PEMer; webcite). This comprises an analysis pipeline, compatible with several next-generation sequencing platforms; simulation-based error models, yielding confidence-values for each structural variant; and a back-end database. The simulations demonstrated high structural variant reconstruction efficiency for PEMer's coverage-adjusted multi-cutoff scoring-strategy and showed its relative insensitivity to base-calling errors.