Overall Virmid workflow. (a) Disease/control paired data are used (top) to generate an alignment (BAM) file. The mixed disease sample produces short reads of mixed types (blue and orange rectangles). Somatic mutations, where the control has the reference genotype (AA) and the disease has the non-reference (AB or BB, red dots in the alignment), are hard to detect if there is high contamination due to the significant drop in B allele frequency (BAF). Virmid takes the disease/control paired data and analyzes: (1) the proportion of control cells in the disease sample (α) and (2) the most probable disease genotype for each position that can be used to call somatic mutations. (b) An example BAF drop. Without contamination, the expected BAF is 0.5 and 1.0 for heterozygous and homozygous mutations sites, respectively. When there is control sample contamination, α, mutation alleles are observed only in (1 - α) of the whole reads. So the expected BAF drops to (1 - α)/2 and (1 - α). With an accurate estimate of α, Virmid can detect more true somatic mutations, which would be missed by conventional tools due to insufficient observation of B alleles. BAF, B allele frequency.
Kim et al. Genome Biology 2013 14:R90 doi:10.1186/gb-2013-14-8-r90