Significance and context
Determining the effects of a mutation on a protein's structure and function has until recently only been possible by laborious physical and biological characterization of the mutant protein. With the advent of databases containing vast amounts of information on DNA and protein sequences and on protein structure and function, computational methods to speed up the assessment of the effects of mutation can now be envisaged. Sunyaev et al. describe a computational method for assessing the impact of amino-acid substitutions on the structure and function of a protein, using a wide range of information from the databases. This should in future enable a provisional prediction of the effects of a newly identified mutation or polymorphism to be made more rapidly.
The method (see Methodological innovations for details) was first evaluated using a set of known deleterious amino-acid replacements in human proteins. The analysis produced 10-30% false-negatives; that is, changes that were predicted not to be deleterious. On analysis of a data set consisting of human proteins and their orthologs from other mammals, the method predicted that about 9% of the substitutions are damaging, thus providing an estimate of the rate of false-positive prediction (base substitutions that have been fixed in functional proteins in different lineages cannot be considered deleterious). Applying the methodto data on well-characterized polymorphisms (polymorphisms in which the allele frequency, three-dimensional structure of the protein and disease association are all known), 73% of damaging changes were successfully predicted. The authors also estimate that an average human genome carries approximately 20,000 heterozygous changes with respect to a human consensus sequence; out of these 'substitutions', about 2,000 would be predicted to be deleterious. The authors provide theoretical and experimental evidence that most of the deleterious amino-acid replacements are not expected to abolish protein function, however. Much work is required in order to achieve a better performance in prediction, even if all currently available information is taken into account by the method.
The authors propose a combined prediction strategy that explores the physicochemical effects of each amino-acid change and exploits all the information available in sequence and structure databases. They take into account whether the replacement lies in an annotated active or binding site; whether it affects the interaction with ligands; if it leads to a change in hydrophobicity or electrostatic charge in a buried site; destroys a disulfide bond; inserts a proline in an α-helix; or is incompatible with the profile of the substitutions observed in a set of aligned homologous proteins. An evolutionary conservation analysis has also been integrated in the prediction tool. For this, a set of proteins homologous (more than 30% identity) to the ones under analysis is assembled and a sequence profile is extracted from the aligned sequences. Then a scoring system is used to evaluate whether the amino-acid replacement alters protein structure and/or function.
Unfortunately, ab initio predictions of the impact of an amino-acid replacement are not possible yet (if indeed they will ever be) and, at the very minimum, homologous protein sequence data are required for the method presented here to work. Although this might be a problem today, it is likely to become less so in the near future, as the sequences of more genomes will be available. From the data and analyses presented in the paper, it is clear that each of us has always been, and will always be, an 'average' individual from a selective point of view.