The Adaptive Evolution Database (TAED)
1 Departments of Chemistry
2 Anatomy and Cell Biology, University of Florida, Gainesville, FL 32611, USA
3 Bioinformatics Division, EraGen Biosciences, 12085 Research Drive, Alachua, FL 32615, USA
4 Department of Biochemistry and Biophysics and Stockholm Bioinformatics Center, Stockholm University, 10691 Stockholm, Sweden
5 Maxygen, 515 Galveston Drive, Redwood City, CA 94063, USA
Genome Biology 2001, 2:research0028-research0028.6 doi:10.1186/gb-2001-2-8-research0028
A previous version of this manuscript was made available before peer review at http://genomebiology.com/2001/2/4/preprint/0003/Published: 24 July 2001
The Master Catalog is a collection of evolutionary families, including multiple sequence alignments, phylogenetic trees and reconstructed ancestral sequences, for all protein-sequence modules encoded by genes in GenBank. It can therefore support large-scale genomic surveys, of which we present here The Adaptive Evolution Database (TAED). In TAED, potential examples of positive adaptation are identified by high values for the normalized ratio of nonsynonymous to synonymous nucleotide substitution rates (KA/KS values) on branches of an evolutionary tree between nodes representing reconstructed ancestral sequences.
Evolutionary trees and reconstructed ancestral sequences were extracted from the Master Catalog for every subtree containing proteins from the Chordata only or the Embryophyta only. Branches with high KA/KS values were identified. These represent candidate episodes in the history of the protein family when the protein may have undergone positive selection, where the mutant form conferred more fitness than the ancestral form. Such episodes are frequently associated with change in function. An unexpectedly large number of families (between 10% and 20% of those families examined) were found to have at least one branch with high KA/KS values above arbitrarily chosen cut-offs (1 and 0.6). Most of these survived a robustness test and were collected into TAED.
TAED is a raw resource for bioinformaticists interested in data mining and for experimental evolutionists seeking candidate examples of adaptive evolution for further experimental study. It can be expanded to include other evolutionary information (for example changes in gene regulation or splicing) placed in a phylogenetic perspective.