Base composition and fragment length influences on sequence counts. (a) The proportion of (G+C) nucleotides was calculated for the 50-bp sequence centered around each annotated CCGG in the reference human genome. The base composition of all of the MspI sequences generated from the human ES cell line studied was also calculated. The relative proportion for (G+C) content in 2% bins for each set of data was calculated and plotted as shown. The black line shows the proportions in the reference genome, while the red line illustrates the distribution we observed in our MspI experiment. Two peaks representing base composition in repetitive sequences are apparent. The MspI distribution closely matches the expected distribution except when the base composition exceeds approximately 80%, when it is slightly under-represented. (b) We calculated the relative frequencies of MspI digestion product sizes in the human reference genome. In this case we found that the shorter fragments are more likely to be sequenced than larger (≥300 bp) fragments. The three major peaks observed represent Alu short interspersed repetitive element (SINE) sequences.
Suzuki et al. Genome Biology 2010 11:R36 doi:10.1186/gb-2010-11-4-r36