Genotyping performance on simulated data sets. (a) Bar charts report Sniper genotyping accuracy based on resequencing of four different synthetic genomic DNA templates. Sample genomes were generated from each known template by introducing single nucleotide sequence variation randomly to 0.1%. We simulated 36-nucleotide PE reads from each unknown genome to 50-fold coverage and mapped them to the respective known genomic template according to a unique (red), best no-guess (yellow), best guess (green), or total (blue) mapping strategy using k = 1, 2, or 3 mismatches, as shown. Accuracy for each bar was determined using only those genotype loci identified at or above the specified stringency level Q = -10 log10(1 - P), where P is the posterior probability of the MAP genotype at a single nucleotide locus. Error bars represent ± standard deviation across five replicate simulations of generating a new sample genome and simulating reads from this genome. (b) Receiver operating characteristic (ROC) styled plot relating genotyping sensitivity, TPR = (TP + TPFG)/(TP + TPFG + FN) (where TP = true positive loci, FN = false negative loci, and TPFG = true position, false genotype loci, to number of false positives (FPs) for simulation results genotyping the moderate difficulty Yeast 2 × RPL + 5% template at 50-fold coverage; as in (a), each point represents the average over five replicates. Points representing stringency levels Q ≥ 40 and Q ≥ 90 are labeled for clarity. (c) Bar chart reporting the estimated false discovery rate FDR = (FP + TPFG)/(TP + FP + TPFG) for genotyping the Yeast 2 × RPL + 5% template at 50-fold coverage using Q ≥ 40 confidence. Error bars represent ± standard deviation over five replicates, as in (a). See Additional file 14 for a complete description of estimates.
Simola and Kim Genome Biology 2011 12:R55 doi:10.1186/gb-2011-12-6-r55