Open Access Highly Accessed Open Badges Research

The genome of Rhizobium leguminosarum has recognizable core and accessory components

J Peter W Young1*, Lisa C Crossman2, Andrew WB Johnston3, Nicholas R Thomson2, Zara F Ghazoui1, Katherine H Hull1, Margaret Wexler3, Andrew RJ Curson3, Jonathan D Todd3, Philip S Poole4, Tim H Mauchline4, Alison K East4, Michael A Quail2, Carol Churcher2, Claire Arrowsmith2, Inna Cherevach2, Tracey Chillingworth2, Kay Clarke2, Ann Cronin2, Paul Davis2, Audrey Fraser2, Zahra Hance2, Heidi Hauser2, Kay Jagels2, Sharon Moule2, Karen Mungall2, Halina Norbertczak2, Ester Rabbinowitsch2, Mandy Sanders2, Mark Simmonds2, Sally Whitehead2 and Julian Parkhill2

Author Affiliations

1 Department of Biology, University of York, York, UK

2 The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, UK

3 School of Biological Sciences, University of East Anglia, Norwich, UK

4 School of Biological Sciences, University of Reading, Reading, UK

For all author emails, please log on.

Genome Biology 2006, 7:R34  doi:10.1186/gb-2006-7-4-r34

Published: 26 April 2006



Rhizobium leguminosarum is an α-proteobacterial N2-fixing symbiont of legumes that has been the subject of more than a thousand publications. Genes for the symbiotic interaction with plants are well studied, but the adaptations that allow survival and growth in the soil environment are poorly understood. We have sequenced the genome of R. leguminosarum biovar viciae strain 3841.


The 7.75 Mb genome comprises a circular chromosome and six circular plasmids, with 61% G+C overall. All three rRNA operons and 52 tRNA genes are on the chromosome; essential protein-encoding genes are largely chromosomal, but most functional classes occur on plasmids as well. Of the 7,263 protein-encoding genes, 2,056 had orthologs in each of three related genomes (Agrobacterium tumefaciens, Sinorhizobium meliloti, and Mesorhizobium loti), and these genes were over-represented in the chromosome and had above average G+C. Most supported the rRNA-based phylogeny, confirming A. tumefaciens to be the closest among these relatives, but 347 genes were incompatible with this phylogeny; these were scattered throughout the genome but were over-represented on the plasmids. An unexpectedly large number of genes were shared by all three rhizobia but were missing from A. tumefaciens.


Overall, the genome can be considered to have two main components: a 'core', which is higher in G+C, is mostly chromosomal, is shared with related organisms, and has a consistent phylogeny; and an 'accessory' component, which is sporadic in distribution, lower in G+C, and located on the plasmids and chromosomal islands. The accessory genome has a different nucleotide composition from the core despite a long history of coexistence.