Comparative analysis of tandem repeats from hundreds of species reveals unique insights into centromere evolution
1 Department of Molecular and Cell Biology and Genome Center, University of California, Davis, 1 Shields Ave, Davis, CA 95616, USA
2 Department of Plant Biology, University of California, Davis, 1 Shields Ave, Davis, CA 95616, USA
3 USDA-ARS, Western Regional Research Center, 800 Buchanan St, Albany, CA 94710, USA
4 Department of Evolution and Ecology, University of California, Davis, 1 Shields Ave, Davis, CA 95616, USA
5 Department of Biochemistry and Biophysics, University of California, San Francisco, 1700 4th St, San Francisco, CA 94158, USA
6 Pacific Biosciences, 1380 Willow Rd, Menlo Park, CA 94025, USA
7 Department of Animal Production and Health, Universidade Estadual Paulista, IAEA Collaborating Centre in Animal Genomics and Bioinformatics, Rua Clóvis Pestana, 793-16050-680, Aracatuba, SP, Brazil
8 Howard Hughes Medical Institute, 4000 Jones Bridge Rd, Chevy Chase, MD 20815, USA
9 USDA-ARS, US Meat Animal Research Center, State Spur 18D, Clay Center, NE 68933, USA
10 Department of Plant Sciences, Center for Population Biology, and Genome Center, University of California, Davis, 1 Shields Ave, Davis, CA 95616, USA
Citation and License
Genome Biology 2013, 14:R10 doi:10.1186/gb-2013-14-1-r10Published: 30 January 2013
Centromeres are essential for chromosome segregation, yet their DNA sequences evolve rapidly. In most animals and plants that have been studied, centromeres contain megabase-scale arrays of tandem repeats. Despite their importance, very little is known about the degree to which centromere tandem repeats share common properties between different species across different phyla. We used bioinformatic methods to identify high-copy tandem repeats from 282 species using publicly available genomic sequence and our own data.
Our methods are compatible with all current sequencing technologies. Long Pacific Biosciences sequence reads allowed us to find tandem repeat monomers up to 1,419 bp. We assumed that the most abundant tandem repeat is the centromere DNA, which was true for most species whose centromeres have been previously characterized, suggesting this is a general property of genomes. High-copy centromere tandem repeats were found in almost all animal and plant genomes, but repeat monomers were highly variable in sequence composition and length. Furthermore, phylogenetic analysis of sequence homology showed little evidence of sequence conservation beyond approximately 50 million years of divergence. We find that despite an overall lack of sequence conservation, centromere tandem repeats from diverse species showed similar modes of evolution.
While centromere position in most eukaryotes is epigenetically determined, our results indicate that tandem repeats are highly prevalent at centromeres of both animal and plant genomes. This suggests a functional role for such repeats, perhaps in promoting concerted evolution of centromere DNA across chromosomes.