Three classes of microbial genome assembly complexity. The top row illustrates repeat content via an alignment dotplot in Bacillus anthracis Ames, Yersinia pestis CO92, and Escherichia coli O26:H11 11368. For a repeat occurring at two distinct positions x and y in the genome, a dot of the corresponding size is placed on the matrix at [x,y]. The bottom row illustrates assemblies of these genomes using 200× simulated PacBio C2 sequencing (outer circle), and infinite coverage of 500 bp, perfect reads (inner circle). The number of gaps in each assembly is noted. Class I genomes have few repeats except for the rDNA operon sized 5 to 7 kbp. In this case, both short reads and PacBio reads can generate a continuous assembly. Class II genomes have many repeats, such as insertion sequence elements, but none greater than 7 kbp. In this case, the PacBio reads can completely assemble the genome, while the short-read assembly is heavily fragmented. Class III genomes contain large, often phage-related, repeats >7 kbp. In this case, no technology can generate a complete genome. However, the PacBio assembly is significantly more continuous than the short-read assembly.
Koren et al. Genome Biology 2013 14:R101 doi:10.1186/gb-2013-14-9-r101