Performance. Results of the performance test of the EM method on simulated mutation data. Each simulated tumor belongs to one of up to three different cancer types, where in each type there are five independent processes active. The total number of processes per data set is thus 5 (blue), 10 (red) or 15 (yellow). Here, we show the scaling of different observables as a function of sample size (calculated over 50 replicates for each combination of M and n). (A) The number of processes present in the data is determined via the BIC. Shown are the median (line), the smallest and the largest (shaded area) number of inferred processes. (B) The correlation between the real and the inferred mutation spectra (the difference from 1 is plotted). (C) The time until completion of the inference program scales approximately linearly with M (for constant n; the fits above correspond to 0.93, 0.99 and 1.02). (B and C show the median with the 10% and 90% quantiles.). BIC: Bayesian information criterion; EM: expectation-maximization.
Fischer et al. Genome Biology 2013 14:R39 doi:10.1186/gb-2013-14-4-r39