Open Access Open Badges Research

Community transcriptomics reveals universal patterns of protein sequence conservation in natural microbial communities

Frank J Stewart1, Adrian K Sharma2, Jessica A Bryant2, John M Eppley2 and Edward F DeLong2*

Author Affiliations

1 School of Biology, Georgia Institute of Technology, Ford ES&T Building, Rm 1242, 311 Ferst Drive, Atlanta, GA 30332, USA

2 Department of Civil and Environmental Engineering, Massachusetts Institute of Technology, Parsons Laboratory 48, 15 Vassar Street, Cambridge, MA 02139, USA

For all author emails, please log on.

Genome Biology 2011, 12:R26  doi:10.1186/gb-2011-12-3-r26

Published: 22 March 2011



Combined metagenomic and metatranscriptomic datasets make it possible to study the molecular evolution of diverse microbial species recovered from their native habitats. The link between gene expression level and sequence conservation was examined using shotgun pyrosequencing of microbial community DNA and RNA from diverse marine environments, and from forest soil.


Across all samples, expressed genes with transcripts in the RNA sample were significantly more conserved than non-expressed gene sets relative to best matches in reference databases. This discrepancy, observed for many diverse individual genomes and across entire communities, coincided with a shift in amino acid usage between these gene fractions. Expressed genes trended toward GC-enriched amino acids, consistent with a hypothesis of higher levels of functional constraint in this gene pool. Highly expressed genes were significantly more likely to fall within an orthologous gene set shared between closely related taxa (core genes). However, non-core genes, when expressed above the level of detection, were, on average, significantly more highly expressed than core genes based on transcript abundance normalized to gene abundance. Finally, expressed genes showed broad similarities in function across samples, being relatively enriched in genes of energy metabolism and underrepresented by genes of cell growth.


These patterns support the hypothesis, predicated on studies of model organisms, that gene expression level is a primary correlate of evolutionary rate across diverse microbial taxa from natural environments. Despite their complexity, meta-omic datasets can reveal broad evolutionary patterns across taxonomically, functionally, and environmentally diverse communities.