Estimation of contamination. Estimation of contamination level in a mixed disease sample. The propor-tion for the control sample (α) is estimated from the simulated mixed data. A total of 11 data sets with different α values (1%, 5%, 10% 20%, 30%, 40%, 50%, 60%, 70%, 80% and 90%) were generated and tested. Virmid estimated all the α values (red line with circles) with high concordance compared to the true values (black line with squares). Note that there is a significant bias in highly contaminated samples (α ≥ 60) in the call-based method (green line with circles) due to undetectable low BAF mutations; somatic mutations with higher BAF are likely to be called initially causing overestimation of BAF and underestimation of α.
Kim et al. Genome Biology 2013 14:R90 doi:10.1186/gb-2013-14-8-r90