Detecting single nucleotide polymorphisms from Bisulfite-seq data. Hypothetical bisulfite-sequencing data is shown, with reference genome at top, genome of the individual sequenced (unobserved) in the middle, and bisulfite sequencing reads bottom. (a) shows three reference cytosine positions, with the first being a match to the reference genome and the second two being homozygous single nucleotide polymorphisms. The first case shows a true C:G genotype, and all reads on the same strand as the C (the 'C-strand') are read as T, indicating an unmethylated state (shown as blue). Because the Illumina Bisulfite-seq protocol is 'directional', reads on the opposite strand (the 'G-strand') are read as the true genotype, G ('genotype' reads on the G-strand are boxed in this figure). The second case illustrates a true C>T SNP, which can be distinguished by the A reads present on the G-strand. In this case, the reads on the C-strand are inferred to be from a true 'T' and should not be used for methylation calling (crossed out here). The third case shows a T>C SNP, which again can be identified based on G-strand reads. (b) A cytosine position with 50% unmethylated (T) and 50% methylated (C) reads can be associated with a heterozygous SNP on the same sequencing reads. In this case, the unmethylated reads are those on the 'A' allele chromosome (here shown as maternal) and the methylated reads are on the 'T' allele chromosome.
Liu et al. Genome Biology 2012 13:R61 doi:10.1186/gb-2012-13-7-r61