Open Access Highly Accessed Open Badges Research

Structure and dynamics of the pan-genome of Streptococcus pneumoniae and closely related species

Claudio Donati1*, N Luisa Hiller2, Hervé Tettelin3, Alessandro Muzzi1, Nicholas J Croucher4, Samuel V Angiuoli3, Marco Oggioni5, Julie C Dunning Hotopp3, Fen Z Hu2, David R Riley3, Antonello Covacci1, Tim J Mitchell6, Stephen D Bentley4, Morgens Kilian7, Garth D Ehrlich2, Rino Rappuoli1, E Richard Moxon8 and Vega Masignani1

Author Affiliations

1 Novartis Vaccines and Diagnostics, Via Fiorentina 1, 53100 Siena, Italy

2 Allegheny General Hospital, Allegheny-Singer Research Institute, Center for Genomic Sciences, Pittsburgh, Pennsylvania 152123, USA

3 Institute for Genome Sciences, Department of Microbiology and Immunology, University of Maryland School of Medicine, 801 West Baltimore Street, MD 21201, USA

4 The Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK

5 Laboratorio di Microbiologia Molecolare e Biotecnologia, Dipartimento di Biologia Molecolare, Universita' di Siena, Policlinico Le Scotte, 53100 Siena, Italy

6 Division of Infection and Immunity, Glasgow Biomedical Research Centre, University of Glasgow, 120 University Place, Glasgow G12 8TA, UK

7 Institute of Medical Microbiology and Immunology, Aarhus University, DK-8000 Aarhus, Denmark

8 University of Oxford Department of Paediatrics, Medical Sciences Division, John Radcliffe Hospital, Headington OX3 9DU, UK

For all author emails, please log on.

Genome Biology 2010, 11:R107  doi:10.1186/gb-2010-11-10-r107

Published: 29 October 2010



Streptococcus pneumoniae is one of the most important causes of microbial diseases in humans. The genomes of 44 diverse strains of S. pneumoniae were analyzed and compared with strains of non-pathogenic streptococci of the Mitis group.


Despite evidence of extensive recombination, the S. pneumoniae phylogenetic tree revealed six major lineages. With the exception of serotype 1, the tree correlated poorly with capsular serotype, geographical site of isolation and disease outcome. The distribution of dispensable genes - genes present in more than one strain but not in all strains - was consistent with phylogeny, although horizontal gene transfer events attenuated this correlation in the case of ancient lineages. Homologous recombination, involving short stretches of DNA, was the dominant evolutionary process of the core genome of S. pneumoniae. Genetic exchange occurred both within and across the borders of the species, and S. mitis was the main reservoir of genetic diversity of S. pneumoniae. The pan-genome size of S. pneumoniae increased logarithmically with the number of strains and linearly with the number of polymorphic sites of the sampled genomes, suggesting that acquired genes accumulate proportionately to the age of clones. Most genes associated with pathogenicity were shared by all S. pneumoniae strains, but were also present in S. mitis, S. oralis and S. infantis, indicating that these genes are not sufficient to determine virulence.


Genetic exchange with related species sharing the same ecological niche is the main mechanism of evolution of S. pneumoniae. The open pan-genome guarantees the species a quick and economical response to diverse environments.