Overview of the GASVPro. Fragments from a test genome are sequenced and the resulting paired reads are aligned to the reference. A fragment may either have a unique mapping or be ambiguous with multiple alignments to the reference. Following clustering of alignments (with GASV), the set of possible structural variants and the fragments whose alignments support these variants are recorded in the alignment matrix A. As each fragment originates from a single location in the test genome, a fragment supports at most one structural variant. Thus, the mapping matrix M records the 'true' mapping for each fragment. GASVPro scores mapping matrices according to a generative probabilistic model that incorporates concordant mappings. GASVPro utilizes an MCMC procedure to efficiently sample over the space of possible mapping matrices defined by the alignment matrix A. The underlying probabilistic model can be easily generalized to consider additional features indicative of a 'true' mapping, such as the empirical fragment length distribution or probability of sequencing errors.
Sindi et al. Genome Biology 2012 13:R22 doi:10.1186/gb-2012-13-3-r22