Expected contig length for various clone-based haplotyping designs. (a) Log-log plot of the maximum achievable haplotype N50 length for different values of clone length (L) (assuming a distribution of heterozygous variants obtained from Complete Genomics Institute (CGI) whole genome sequencing (WGS) on chromosome 1 of sample NA20431 of the Personal Genome Project (designated PGP1). This plot suggests a power law relationship between haplotype N50 length (N50) and clone length (L), which is characterized by N50 being approximately L1.42. Note that achieved haplotype lengths (filled circles) may not reach the maximum length, owing to smaller numbers of pools or low fraction of the variants recovered. (b) Simulated haplotype length versus the number of pools (p) for given values of L (shown in different colors). In all cases, except one (magenta), the number of clones per pool (n) is 5,000 (n = 16,800 for magenta). The curves reach saturation when all variants that are less than distance L apart are connected in a contig. Simulations are performed using the distribution of heterozygous variants obtained from CGI WGS on chromosome 1 of PGP1. The squares represent the simulated estimate given parameter settings of several clone-based haplotyping experiments, while the circles show the reported N50.
Lo et al. Genome Biology 2013 14:R100 doi:10.1186/gb-2013-14-9-r100