Assisted assembly principle. (a) In this example, five reads align uniquely to the reference genome, and the two leftmost of these (purple) also appear as the two rightmost reads in an existing de novo contig. We can then extend the de novo contig by using the three unassembled reads (green), even if there is no supporting linking evidence (in general, ARACHNE requires a read to be linked to the contig it overlaps before using it to extend the contig). (b) Two scaffolds (blue and purple) are mapped and oriented on the reference genome by the trusted green reads. Furthermore, the two scaffolds are joined by a single link (black dotted line), although this is not trusted per se. The ARACHNE scaffolding algorithm would not normally join the two scaffolds; however, in this case the separation of the two scaffolds implied by the link is consistent with the separation implied by the mapping on the reference genome, and we thus implicitly validate the black dotted link and join the two scaffolds. (c) Trusted read placements anchor portions of a single scaffold onto two distant parts of the reference genome, suggesting either a bona fide syntenic break or a misassembly. To test for the latter, the contested region on the scaffold is subject to a stringent test for misassembly, and broken if it fails. The same level of stringency of misassembly testing could not be applied to the entire assembly because, at low coverage, there would be too many false positives.
Gnerre et al. Genome Biology 2009 10:R88 doi:10.1186/gb-2009-10-8-r88