CGAL: computing genome assembly likelihoods
1 Department of Electrical Engineering and Computer Sciences, 387 Soda Hall, UC Berkeley, Berkeley, CA 94720, USA
2 Departments of Mathematics and Molecular & Cell Biology, 970 Evans Hall, UC Berkeley, Berkeley, CA 94720, USA
Genome Biology 2013, 14:R8 doi:10.1186/gb-2013-14-1-r8Published: 29 January 2013
Assembly algorithms have been extensively benchmarked using simulated data so that results can be compared to ground truth. However, in de novo assembly, only crude metrics such as contig number and size are typically used to evaluate assembly quality. We present CGAL, a novel likelihood-based approach to assembly assessment in the absence of a ground truth. We show that likelihood is more accurate than other metrics currently used for evaluating assemblies, and describe its application to the optimization and comparison of assembly algorithms. Our methods are implemented in software that is freely available at http://bio.math.berkeley.edu/cgal/ webcite.