Open Access Highly Accessed Open Badges Method

EMIRGE: reconstruction of full-length ribosomal genes from microbial community short read sequencing data

Christopher S Miller1*, Brett J Baker12, Brian C Thomas1, Steven W Singer34 and Jillian F Banfield15*

Author Affiliations

1 Department of Earth and Planetary Science, University of California, Berkeley, 307 McCone Hall #4767, Berkeley, CA 94720, USA

2 Current address: Department of Geological Sciences, University of Michigan, 1100 N. University Ave, Ann Arbor, MI 48109, USA

3 Earth Sciences Division, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Mail Stop 90-R1116, Berkeley, CA 94720, USA

4 Deconstruction Division, Joint BioEnergy Institute, 5885 Hollis St, Emeryville, CA 94660, USA

5 Department of Environmental Science, Policy and Management, University of California, Berkeley, 336 Hilgard Hall, Berkeley, CA 94720, USA

For all author emails, please log on.

Genome Biology 2011, 12:R44  doi:10.1186/gb-2011-12-5-r44

Published: 19 May 2011


Recovery of ribosomal small subunit genes by assembly of short read community DNA sequence data generally fails, making taxonomic characterization difficult. Here, we solve this problem with a novel iterative method, based on the expectation maximization algorithm, that reconstructs full-length small subunit gene sequences and provides estimates of relative taxon abundances. We apply the method to natural and simulated microbial communities, and correctly recover community structure from known and previously unreported rRNA gene sequences. An implementation of the method is freely available at webcite.