Representation of enzymes within three large-scale datasets. (a) Coverage of enzymes and genes provided by the three different datasets: the non-redundant protein database (nr); partial genomes; and complete genomes. Fifty percent of all enzymes are associated with approximately 15% of all partial genomes, approximately 60% of all complete genomes and approximately 75% of the nr categories used in this study. Compared to all the genes within the partial and complete genome datasets, the enzymes are more highly represented. (b) Relationships of enzyme coverage between the partial and complete genome datasets. Each point indicates a discrete enzyme (color indicates superclass membership - see inset key in (d)). Enzymes involved in secondary metabolism appear to be more highly represented in the partial genome datasets than the complete genome datasets. (c,d) As for (b) but showing the relationship of enzyme coverage between the nr dataset and the complete and partial genome datasets, respectively.
Peregrín-Alvarez et al. Genome Biology 2009 10:R63 doi:10.1186/gb-2009-10-6-r63