Small variable segments constitute a major type of diversity of bacterial genomes at the species level
1 INRA, UMR1319, Micalis, Bat 222, Jouy en Josas, 78350, France
2 INSERM U722 and Université Paris 7, Faculté de Médecine, Site Xavier Bichat, Paris, 75018, France
3 CNRS-UMR 8030 & CEA/IG/Genoscope, Laboratoire d'Analyses Bioinformatiques en Génomique et Métabolisme (LABGeM), rue Gaston Crémieux, Evry, 91057, France
4 CEA, Institut de Génomique, Genoscope, rue Gaston Crémieux, Evry, 91057, France
Genome Biology 2010, 11:R45 doi:10.1186/gb-2010-11-4-r45Published: 30 April 2010
Analysis of large scale diversity in bacterial genomes has mainly focused on elements such as pathogenicity islands, or more generally, genomic islands. These comprise numerous genes and confer important phenotypes, which are present or absent depending on strains. We report that despite this widely accepted notion, most diversity at the species level is composed of much smaller DNA segments, 20 to 500 bp in size, which we call microdiversity.
We performed a systematic analysis of the variable segments detected by multiple whole genome alignments at the DNA level on three species for which the greatest number of genomes have been sequenced: Escherichia coli, Staphylococcus aureus, and Streptococcus pyogenes. Among the numerous sites of variability, 62 to 73% were loci of microdiversity, many of which were located within genes. They contribute to phenotypic variations, as 3 to 6% of all genes harbor microdiversity, and 1 to 9% of total genes are located downstream from a microdiversity locus. Microdiversity loci are particularly abundant in genes encoding membrane proteins. In-depth analysis of the E. coli alignments shows that most of the diversity does not correspond to known mobile or repeated elements, and it is likely that they were generated by illegitimate recombination. An intriguing class of microdiversity includes small blocks of highly diverged sequences, whose origin is discussed.
This analysis uncovers the importance of this small-sized genome diversity, which we expect to be present in a wide range of bacteria, and possibly also in many eukaryotic genomes.