Schematic of the COMIT algorithm for identifying unusually conserved motifs in coding regions. The example illustrates how the score would be calculated for the motif ACAAAG, using genome-wide coding sequence alignments for two species. Each instance of the motif is identified in species 1, and the observed conservation - that is, whether all bases are identical among the two species - is calculated. The expected conservation at each instance is modeled from genome-wide frequencies of nucleotide-level conservation patterns conditional on the aligned amino acids. For each instance, the expected conservation is calculated from all possible ways in which the motif could be conserved at that location given the amino acids in each species, using values from Table 1 (typically some of these quantities, such as (H, Y)111, will be zero). The observed and expected conservation levels are compared and normalized to yield a conservation z-score for each motif.
Kural et al. Genome Biology 2009 10:R133 doi:10.1186/gb-2009-10-11-r133