Open Access Highly Accessed Open Badges Software

REAPR: a universal tool for genome assembly evaluation

Martin Hunt1, Taisei Kikuchi12, Mandy Sanders1, Chris Newbold13, Matthew Berriman1 and Thomas D Otto1*

Author Affiliations

1 Parasite Genomics, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, CB10 1SA, UK

2 Division of Parasitology, Department of Infectious Diseases, Faculty of Medicine, University of Miyazaki, Miyazaki 889-1692, Japan

3 Weatherall Institute of Molecular Medicine, University of Oxford, John Radcliffe Hospital, Oxford, OX3 9DS, UK

For all author emails, please log on.

Genome Biology 2013, 14:R47  doi:10.1186/gb-2013-14-5-r47

Published: 27 May 2013


Methods to reliably assess the accuracy of genome sequence data are lacking. Currently completeness is only described qualitatively and mis-assemblies are overlooked. Here we present REAPR, a tool that precisely identifies errors in genome assemblies without the need for a reference sequence. We have validated REAPR on complete genomes or de novo assemblies from bacteria, malaria and Caenorhabditis elegans, and demonstrate that 86% and 82% of the human and mouse reference genomes are error-free, respectively. When applied to an ongoing genome project, REAPR provides corrected assembly statistics allowing the quantitative comparison of multiple assemblies. REAPR is available at webcite.

Genome assembly; validation; evaluation