Low quality reads contribute disproportionately to the overall error rate. The graph shows the proportion of reads and test fragments at each percent difference from their reference sequence (individual error rate) and the proportion of errors contributed by reads and test fragments at each given difference (cumulative error rate). The vast majority of both experimental and test fragment reads contain few or no errors; only 5% of all reads and 0.6% of test fragments differ from their reference sequence by 2% or more. The experimental reads that have errors, however, are likely to have a large number of errors and thus be quite different from their reference sequence. For instance, 40% of all errors are from the 1% reads differing by at least 10% from their reference sequence. The GS20 test fragments, by contrast, show far fewer very low-quality sequences: only 3% of the test fragment errors are from sequences at least 10% different from their reference.
Huse et al. Genome Biology 2007 8:R143 doi:10.1186/gb-2007-8-7-r143