Expressed and non-expressed genes differ in amino acid identity (left) and core genome representation (right). Data are from DNA sequence sets and include the five most abundant taxa per sample, with taxon abundance determined by the proportion of total reads with top matches to protein-coding genes in each genome (BLASTX of all DNA reads against NCBI-nr). 'Core genome representation' is calculated as the percentage of each gene set (that is, expressed or non-expressed genes) falling within the core genome of each taxon, as defined in the text. All differences (left and right panels) are significant (P < 0.001), unless marked with an asterisk.
Stewart et al. Genome Biology 2011 12:R26 doi:10.1186/gb-2011-12-3-r26