Error rates increase as read length diverges from predicted. The graphs show the average difference from the reference sequence for all reads of a given length, and the distribution of read lengths for all reads. The majority of reads peak at a few specific lengths. The number of reads beyond the peaks shown are too few to appear on the graph; however, they contain many more errors than the reads of the majority length(s). Perfect reads peak at only a few specific lengths. Sequences that fall outside of these lengths are unlikely to be truncated sequences or to have sequenced beyond the end of the primer. Instead they tend to be low-quality reads of spurious sequences. (a) The average error rate of sequences at each length for 56,700 reads of reference sequence 517. (b) The average error rate of sequences at each length for all reads combined. Even with a mixture of sequence lengths, the reads outside of the peak lengths are highly error prone.
Huse et al. Genome Biology 2007 8:R143 doi:10.1186/gb-2007-8-7-r143