<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>gb-2008-9-4-r69</ui>
   <ji>GBJ</ji>
   <fm>
      <dochead>Research</dochead>
      <bibl>
         <title>
            <p>Natural selection of protein structural and functional properties: a single nucleotide polymorphism perspective</p>
         </title>
         <aug>
            <au id="A1">
               <snm>Liu</snm>
               <fnm>Jinfeng</fnm>
               <insr iid="I1"/>
               <email>jinfengl@gene.com</email>
            </au>
            <au id="A2">
               <snm>Zhang</snm>
               <fnm>Yan</fnm>
               <insr iid="I1"/>
               <email>yz5@gene.com</email>
            </au>
            <au id="A3">
               <snm>Lei</snm>
               <fnm>Xingye</fnm>
               <insr iid="I2"/>
               <email>xingyel@gene.com</email>
            </au>
            <au id="A4" ca="yes">
               <snm>Zhang</snm>
               <fnm>Zemin</fnm>
               <insr iid="I1"/>
               <email>zemin@gene.com</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Department of Bioinformatics, Genentech Inc., 1 DNA Way, South San Francisco, CA 94080, USA</p>
            </ins>
            <ins id="I2">
               <p>Department of Biostatistics, Genentech Inc., 1 DNA Way, South San Francisco, CA 94080, USA</p>
            </ins>
         </insg>
         <source>Genome Biology</source>
         <issn>1465-6906</issn>
         <pubdate>2008</pubdate>
         <volume>9</volume>
         <issue>4</issue>
         <fpage>R69</fpage>
         <url>http://genomebiology.com/2008/9/4/R69</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">18397526</pubid>
               <pubid idtype="doi">10.1186/gb-2008-9-4-r69</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>20</day>
               <month>3</month>
               <year>2008</year>
            </date>
         </rec>
         <revrec>
            <date>
               <day>25</day>
               <month>3</month>
               <year>2008</year>
            </date>
         </revrec>
         <acc>
            <date>
               <day>8</day>
               <month>4</month>
               <year>2008</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>08</day>
               <month>04</month>
               <year>2008</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2008</year>
         <collab>Liu et al.; licensee BioMed Central Ltd.</collab>
         <note>This is an open access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <shorttitle>
         <p>Measure of selective constraints</p>
      </shorttitle>
      <shortabs>
         <p>A large-scale survey using single nucleotide polymorphism data from dbSNP provides insights into the evolutionary selection constraints on human proteins of different structural and functional categories.</p>
      </shortabs>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>The rates of molecular evolution for protein-coding genes depend on the stringency of functional or structural constraints. The Ka/Ks ratio has been commonly used as an indicator of selective constraints and is typically calculated from interspecies alignments. Recent accumulation of single nucleotide polymorphism (SNP) data has enabled the derivation of Ka/Ks ratios for polymorphism (SNP A/S ratios).</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>Using data from the dbSNP database, we conducted the first large-scale survey of SNP A/S ratios for different structural and functional properties. We confirmed that the SNP A/S ratio is largely correlated with Ka/Ks for divergence. We observed stronger selective constraints for proteins that have high mRNA expression levels or broad expression patterns, have no paralogs, arose earlier in evolution, have natively disordered regions, are located in cytoplasm and nucleus, or are related to human diseases. On the residue level, we found higher degrees of variation for residues that are exposed to solvent, are in a loop conformation, natively disordered regions or low complexity regions, or are in the signal peptides of secreted proteins. Our analysis also revealed that histones and protein kinases are among the protein families that are under the strongest selective constraints, whereas olfactory and taste receptors are among the most variable groups.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>Our study suggests that the SNP A/S ratio is a robust measure for selective constraints. The correlations between SNP A/S ratios and other variables provide valuable insights into the natural selection of various structural or functional properties, particularly for human-specific genes and constraints within the human lineage.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <meta>
      <classifications>
         <classification type="BMC" subtype="man_spc_id" id="30010008">Evolution</classification>
         <classification type="BMC" subtype="man_spc_id" id="30010010">Genome studies</classification>
      </classifications>
   </meta>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>It is well established that there are tremendous variations in rates of evolution among protein-coding genes. A central problem in molecular evolution is to identify factors that determine the rate of protein evolution. One widely accepted principle is that a major force governing the rate of amino acid substitution is the stringency of functional or structural constraints. Proteins with rigorous functional or structural requirements are subject to strong purifying (negative) selective pressure, resulting in smaller numbers of amino acid changes. Therefore, these proteins tend to evolve slower than proteins with weaker constraints. A classic measure for selective pressure on protein-coding genes is the Ka/Ks ratio <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>, that is, the ratio of non-synonymous (amino acid changing) substitutions per non-synonymous site to synonymous (silent) substitutions per synonymous site. The assumption is that synonymous sites are subject to only background nucleotide mutation, whereas non-synonymous sites are subject to both background mutation and amino acid selective pressure. Thus, the ratio of the observed non-synonymous mutation rate (Ka) to the synonymous mutation rate (Ks) can be utilized as an estimate of the selective pressure, where Ka/Ks &#171; 1 suggests that most amino acid substitutions have been eliminated by selection, that is, strong purifying selection. Ka/Ks ratios for protein-coding genes are generally derived from inter-species sequence alignments and different evolution models have been developed to accurately estimate the ratios <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>. There have been many studies using Ka/Ks ratios to measure evolutionary constraints among different classes of proteins. For example, it has been suggested that essential genes in bacteria evolve slower than non-essential genes <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>, that house-keeping genes are under stronger selective constraints than tissue-specific genes <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>, and that secreted proteins are under less purifying selection based on Ka/Ks ratios from human-mouse sequence alignments <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>.</p>
         <p>In the past few years, advances in sequencing technology have led to a rapid accumulation of DNA variation data for human populations, including copy number variations and single nucleotide polymorphisms (SNPs). Currently, the dbSNP database <abbrgrp><abbr bid="B6">6</abbr></abbrgrp> at the National Center of Biotechnology Information (NCBI) catalogues about 12 million human SNPs, close to half of which are validated. It has also been shown by several independent sequencing studies that dbSNP has high coverage of frequent SNPs <abbrgrp><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr></abbrgrp>. The vast amount of SNP data can not only shed light on the variation in disease susceptibility and drug response among human populations, but also help us understand molecular evolution. In particular, these SNP data have provided us with another way of measuring evolutionary constraints, based on a prediction of the neutral theory of molecular evolution that A/S ratios should be highly correlated between intra-species polymorphism and inter-species divergence <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>. In fact, SNP A/S ratios (also referred to as Ka/Ks ratios for polymorphisms) have been calculated to determine whether there is frequent positive selection on the human genome <abbrgrp><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr></abbrgrp>, and they have been compared with Ka/Ks for human-chimpanzee divergence <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>. However, it is not clear whether SNP A/S ratios are closely correlated with Ka/Ks in practice given the current volume of SNP data, and there have not been any large-scale studies of selective constraints on protein structural and functional properties using SNP data.</p>
         <p>In the present study, we conducted a large-scale survey of SNP A/S ratios using SNP data from dbSNP. We first confirmed that the SNP A/S ratio is a good measure for selective pressure by showing its correlation with Ka/Ks from inter-species alignments and protein alignment conservation. We then obtained a variety of structural and functional properties from either database annotations or computational prediction methods and analyzed SNP A/S ratios for different classes of proteins and residues in an attempt to study the natural selection of these properties from the SNP perspective. Our comprehensive analysis provides: valuable insight into some features that have not been examined previously; independent confirmation of some previously established results; and additional data for areas where previous studies have had contradictory findings.</p>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <p>We collected 13,686 human genes that have at least one validated coding SNP according to dbSNP. The analysis was limited to validated SNPs to ensure data quality. Overall, 45,538 coding-region SNPs and 1,529,119 intronic SNPs were identified in these genes, corresponding to SNP densities of 2.0 and 2.4 SNPs, respectively, per 1,000 nucleotides. The number of non-synonymous coding SNPs per non-synonymous site (A) is 0.00123, the number of synonymous coding SNPs per synonymous site (S) is 0.00439, and the A/S ratio is 0.28. The values of A and S are both two times more than what have been reported in a small study <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>, but the A/S ratio is similar.</p>
         <sec>
            <st>
               <p>SNP A/S ratio as a measure for selective constraints</p>
            </st>
            <p>To assess whether SNP A/S ratios from the current large-scale SNP data set provide a good measure for selective constraints, we first compared them with Ka/Ks ratios derived from inter-species alignments. We collected 9,759 human proteins with both validated coding-region SNPs and available human-mouse Ka/Ks data from Ensemble <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>, binned them by their Ka/Ks values, and measured the SNP A/S ratios for each group. There is a strong positive correlation between these two measure (Figure <figr fid="F1">1a</figr>; Kendall's rank correlation <abbrgrp><abbr bid="B14">14</abbr></abbrgrp> &#964; = 0.50, <it>p</it>-value &lt; 1e-04), which is in agreement with the neutral theory of molecular evolution. Analysis of data from chimpanzee and Old World monkey (<it>Macaca mulatta</it>) led to similar conclusions, although the Ka/Ks values may need to be corrected to subtract the contribution of SNPs due to relatively short evolutionary distance.</p>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>The SNP A/S ratio is a good measure for evolutionary constraints</p>
               </caption>
               <text>
                  <p>The SNP A/S ratio is a good measure for evolutionary constraints. Error bars represent 95th percentile confidence intervals from bootstrap resampling. <b>(a) </b>SNP A/S ratios correlate with Ka/Ks ratios from human-mouse alignments. Proteins were grouped into bins of equal intervals (interval = 0.05) according to their Ka/Ks ratios, and the SNP A/S ratio was calculated for each bin. <b>(b) </b>SNP A/S ratios correlate negatively with residue conservation scores from protein sequence alignments. All residues were grouped into bins of equal intervals (interval = 0.5) according to their position specific alignment information taken from PSI-BLAST alignment profiles, and the SNP A/S ratio was obtained for each bin.</p>
               </text>
               <graphic file="gb-2008-9-4-r69-1"/>
            </fig>
            <p>We next investigated whether the conservation in protein sequences correlates with the SNP A/S ratio under the assumption that both the conservation at the protein sequence level and the SNP A/S ratio at the nucleotide level are indications for selective constraints. Using the position-specific alignment entropy (a measure for conservation) from PSI-BLAST profiles <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>, we calculated A/S ratios for residues with different conservation scores. We indeed observed a monotonic decrease of the A/S ratio with an increase in protein sequence conservation (Figure <figr fid="F1">1b</figr>). The residues with the conservation range of 0-0.5 have a ratio of 0.33, while those having conservation scores bigger than 3.5 have an A/S ratio of 0.06.</p>
         </sec>
         <sec>
            <st>
               <p>SNP A/S ratios for protein features</p>
            </st>
            <p>Many studies have been published addressing the correlation between evolutionary constraints and other variables, most of which were based on relatively small data sets. Having established the SNP A/S ratio as a good measure for selective constraints, we attempted to use the large-scale human SNP data set to revisit some of the features in the earlier studies, and also to investigate several protein properties that had not been examined before.</p>
         </sec>
         <sec>
            <st>
               <p>Selective constraints and mRNA expression</p>
            </st>
            <p>Until a few years ago, the prevalent theory in molecular evolution was that evolutionary rate is largely dependent on structural and functional constraints. Recently, increasingly more evidence suggests that there is a strong correlation between evolutionary rate and gene expression. It has been observed that highly expressed genes evolve slowly in bacteria <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>, yeast <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>, and mammals <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>. In yeast, it has been shown by principal component regression that the number of translation events is the dominant determinant of evolutionary rate among several other functional attributes <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>, leading to the increasingly popular 'translational robustness' hypothesis <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>. However, a later study suggested that the dominant effect may result from the noise in biological data that confounded the analysis <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>. Studies of human mRNA expression data showed that the breadth of expression (that is, the number of tissues in which a gene is expressed) also correlates with evolutionary rate <abbrgrp><abbr bid="B22">22</abbr><abbr bid="B23">23</abbr></abbrgrp>; it is still debatable whether the breadth or the rate of expression is the stronger predictor <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>. We obtained mRNA expression data for 10,885 genes in our data set that are available from a published microarray experiment (Gene Expression Atlas) <abbrgrp><abbr bid="B24">24</abbr></abbrgrp> and investigated the correlation between selective constraints and four gene expression parameters examined previously: peak expression level, mean expression level, expression breadth, and tissue specificity. Overall, this set of genes with available mRNA expression data has an SNP A/S ratio of 0.25, lower than that of our entire data set (0.28). We indeed observed that highly expressed genes tend to have low A/S ratios (Figure <figr fid="F2">2a,b</figr>): both mean and peak expression rate negatively correlate with the SNP A/S ratio (&#964; = -0.178 and -0.160, respectively; Table S1 in Additional data file 1). Genes with the lowest mean expression levels have an A/S ratio of 0.38, about twice as high as the ratio in the highest expression group (Figure <figr fid="F2">2a</figr>). The SNP A/S ratio also correlates well with the breadth of expression (Figure <figr fid="F2">2c</figr>; &#964; = -0.213, <it>p</it>-value &lt; 1e-04), but only marginally with tissue specificity (Figure <figr fid="F2">2d</figr>; &#964; = 0.047, <it>p</it>-value = 0.003). Since these four expression parameters correlate strongly with each other, we carried out partial correlation analysis <abbrgrp><abbr bid="B14">14</abbr></abbrgrp> to identify the stronger predictors for evolutionary rates. The correlation between tissue specificity and the A/S ratio disappeared entirely after controlling for mean expression level (&#964; = 0.0107, <it>p</it>-value = 0.499; Table S1 in Additional data file 1) or expression breadth (&#964; = 0.0084, <it>p</it>-value = 0.596; Table S1 in Additional data file 1). Expression breadth and mean expression level both remain significantly correlated with the A/S ratio when controlling one for the other (&#964; = -0.096 and -0.064, <it>p</it>-values &lt; 1e-04 and 7e-04, respectively; Table S1 in Additional data file 1). Peak expression level is highly correlated with mean expression level and its partial correlation patterns largely resemble those of mean expression level. It has recently been recognized that it is critical to control for expression when studying the statistical relevance of other variables as predictors for evolutionary rates, since many previously reported correlations became insignificant after this control. As expression breadth appeared to have the strongest correlation with the SNP A/S ratio in our data set among the four parameters, we chose to control for it in the following correlation analysis between selective constraints and other variables. The results did not change qualitatively when controlling for mean expression level instead.</p>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>Correlation between SNP A/S ratios and expression parameters</p>
               </caption>
               <text>
                  <p>Correlation between SNP A/S ratios and expression parameters. Genes were grouped into bins of roughly nine equal intervals according to several expression measurements from a microarray experiment, and the SNP A/S ratio was obtained for each bin. Error bars represent 95th percentile confidence intervals from bootstrap resampling. <b>(a) </b>Negative correlation between SNP A/S ratios and mean mRNA expression levels. <b>(b) </b>Negative correlation between SNP A/S ratios and peak mRNA expression levels. <b>(c) </b>Negative correlation between SNP A/S ratios and expression breadth. <b>(d) </b>No correlation between SNP A/S ratios and expression tissue specificity.</p>
               </text>
               <graphic file="gb-2008-9-4-r69-2"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>SNP A/S ratio and evolutionary variables</p>
            </st>
            <p>Consistent with the hypothesis that gene duplications are an important source of new protein function, it has been observed that duplicated genes evolve under weaker purifying selection than unduplicated ones <abbrgrp><abbr bid="B25">25</abbr><abbr bid="B26">26</abbr></abbrgrp>. We collected 12,460 human genes without paralogs and 167 genes with paralogs according to the HomoloGene database <abbrgrp><abbr bid="B27">27</abbr><abbr bid="B28">28</abbr></abbrgrp>, and found that the A/S ratio is markedly higher for genes with paralogs (0.46 versus 0.27, <it>p</it>-value &lt; 1e-04; Figure <figr fid="F3">3a</figr>, dark gray bars). To control for expression breadth, we analyzed the subset of genes with mRNA expression data from the Gene Expression Atlas <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>. The two groups of genes do not differ in their distribution of expression breadth (Kolmogorov-Smirnov test, <it>p</it>-value = 0.507). The difference in the A/S ratio did not change significantly when the expression breadth was controlled by Monte Carlo sampling (Figure <figr fid="F3">3a</figr>, light gray bars and white bars). We then examined whether the higher rate could be solely explained by additional copies of paralogs while keeping one copy stable. When we selected the fastest evolving genes from each homology group, they have an A/S ratio of 0.55 compared with 0.36 for the batch of the slowest-evolving genes from each homology group. Both numbers are higher than the A/S ratio for genes without paralogs (0.27), suggesting that both duplicated copies are evolving faster than unduplicated genes. The much bigger variation in the with-paralog group (95th percentile confidence interval = [0.38, 0.58]) reflects the small number of genes in that particular group.</p>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>SNP A/S ratios and evolutionary variables</p>
               </caption>
               <text>
                  <p>SNP A/S ratios and evolutionary variables. <b>(a) </b>Proteins with paralogs (167 proteins) are under weaker selective pressure than proteins without paralogs (12,460 proteins). The 95th percentile confidence intervals of the A/S ratio are [0.38, 0.58] for proteins with paralogs, and [0.26, 0.27] for proteins without paralogs (dark gray bars). To control for expression breadth, the subset of proteins with mRNA expression data were analyzed (65 proteins with paralogs and 10,612 without, light gray bars) and Monte Carlo samplings were performed so that the two groups had the same distribution of expression breadth. The differences in A/S ratios are significant both before (light gray bars) and after (white bars) controlling for expression. <b>(b) </b>Proteins that arose early in evolution are subject to stronger evolutionary constraints.</p>
               </text>
               <graphic file="gb-2008-9-4-r69-3"/>
            </fig>
            <p>To determine whether the SNP A/S ratio correlates with the age of proteins, we classified each protein into one of seven age groups according to their most ancient homologs. It appears that young proteins (for example, those found in human or primates only) have the highest A/S ratios (0.76 for human and 0.66 for primates), whereas proteins traceable to all animals or other eukaryotes have much lower ratios of about 0.25 (Figure <figr fid="F3">3b</figr>). This is consistent with a previous finding that proteins that arose earlier in evolution tend to have a larger proportion of sites subjected to negative selection <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>, although there was some debate about whether the observation was an artifact resulting from the inability of BLAST to detect homology for the fastest-evolving genes <abbrgrp><abbr bid="B30">30</abbr><abbr bid="B31">31</abbr></abbrgrp>. We examined the functions of proteins in each group by their Gene Ontology (GO) <abbrgrp><abbr bid="B32">32</abbr></abbrgrp> annotation of biological process. The human-specific group is the least well annotated, with only 6% having GO annotation compared with 62% overall and 84% for proteins conserved in both eukaryotes and prokaryotes (the 'universal' group). Among the proteins with GO annotation of biological process, we observed the enrichment of 'epidermis development', 'defense response to bacterium', and 'spermatogenesis' in the human and primate groups, whereas 'amino acid metabolic process', 'glycolysis', and 'fatty acid metabolic process' are overrepresented in the 'universal' group.</p>
         </sec>
         <sec>
            <st>
               <p>SNP A/S ratios and sequence/structure variables</p>
            </st>
            <p>As an example of the many conflicting reports in the literature about correlations with evolutionary rates, for a variable as simple as protein length, it was shown that there was positive correlation <abbrgrp><abbr bid="B33">33</abbr></abbrgrp>, negative correlation <abbrgrp><abbr bid="B34">34</abbr><abbr bid="B35">35</abbr></abbrgrp>, or no correlation <abbrgrp><abbr bid="B36">36</abbr></abbrgrp>. In addition, there was a study based on protein sequence alignments that showed that less conserved proteins are shorter than more conserved ones on average <abbrgrp><abbr bid="B37">37</abbr></abbrgrp>. In our data set, we observed a negative correlation between protein length and SNP A/S ratio (Kendall's &#964; = -0.137, <it>p</it>-value &lt; 1e-04). The correlation did not change upon controlling for expression breadth. Our analysis also showed that this correlation is only prominent for proteins shorter than 500 residues, and disappears for longer proteins (Figure <figr fid="F4">4a</figr>).</p>
            <fig id="F4">
               <title>
                  <p>Figure 4</p>
               </title>
               <caption>
                  <p>Evolutionary constraints on protein sequence and structure features</p>
               </caption>
               <text>
                  <p>Evolutionary constraints on protein sequence and structure features. Error bars represent 95th percentile confidence intervals from bootstrap resampling. <b>(a) </b>For proteins shorter than 500 residues, short proteins have high A/S ratios. <b>(b) </b>Buried residues are under stronger selection. The 95th percentile confidence intervals of the A/S ratio are [0.23, 0.25] for buried residues, and [0.30, 0.32] for exposed residues. <b>(c) </b>Loop residues have relaxed evolutionary constraints. The 95th percentile confidence intervals of the A/S ratio are [0.25, 0.26] for residues in alpha-helices, [0.24, 0.27] for residues in beta-strands, and [0.30, 0.32] for residues in loops. <b>(d) </b>Proteins with disordered regions are more conserved, while disordered residues are under lower selective pressure. <b>(e) </b>Residues in low complexity regions evolve faster.</p>
               </text>
               <graphic file="gb-2008-9-4-r69-4"/>
            </fig>
            <p>Solvent accessibility measures the degree of an amino acid residue's exposure to the surrounding solvent. There have been a number of studies about the effect of mutations on solvent accessibility and its implication in human diseases; most of them were based on relatively small collections of SNPs in known protein structures. The general consensus was that buried residues are less likely to vary and their mutations are more likely to cause disease <abbrgrp><abbr bid="B38">38</abbr><abbr bid="B39">39</abbr></abbrgrp>. We obtained solvent accessibility predictions for all proteins in our dataset using PROFacc <abbrgrp><abbr bid="B40">40</abbr></abbrgrp>, and compared the SNP A/S ratios. Exposed residues have an A/S ratio of 0.31, significantly higher than that of 0.24 for the buried residues (Figure <figr fid="F4">4b</figr>). The <it>p</it>-value for this difference is smaller than 1e-04 according to bootstrap analysis. Similar results were obtained when using three-state prediction (buried, intermediate, and exposed) or numeric relative accessibility values. This underscores higher selective constraints on buried residues, possibly due to their importance in maintaining protein stability.</p>
            <p>We also investigated selective constraints upon different protein structure conformations. We first grouped all residues into different secondary structure conformations (alpha-helix, beta-strand, or loop) according to predictions by PSIPRED <abbrgrp><abbr bid="B41">41</abbr></abbrgrp>. Significantly higher A/S ratios were observed for residues in the loop conformation (Figure <figr fid="F4">4c</figr>), suggesting relaxed selective pressure on these residues. There is no difference between residues in alpha-helices and beta-strands. We next examined natively disordered proteins, a class of structurally flexible proteins that have recently gained traction because of their potential important roles in dynamic molecular recognition of macromolecules <abbrgrp><abbr bid="B42">42</abbr></abbrgrp>. It has been estimated that one-third of eukaryotic proteins contains disordered regions <abbrgrp><abbr bid="B43">43</abbr></abbrgrp>, and that they are more likely to be involved in regulatory functions and protein-protein interactions <abbrgrp><abbr bid="B44">44</abbr><abbr bid="B45">45</abbr></abbrgrp>. We obtained disorder predictions using DISOPRED2 <abbrgrp><abbr bid="B43">43</abbr></abbrgrp> and retained only the disordered regions longer than 30 residues. Interestingly, while proteins with disordered regions have a lower A/S ratio (Figure <figr fid="F4">4d</figr>; Figure S2b in Additional data file 1), the residues in disordered regions have a much higher A/S ratio than other residues (0.38 versus 0.22; Figure <figr fid="F4">4d</figr>). This seems to suggest that disordered proteins as a class are under stronger selective pressure, but the disordered residues are allowed to evolve much faster to explore different ways to interact with other molecules. Since disordered regions are often characterized by low sequence complexity <abbrgrp><abbr bid="B42">42</abbr><abbr bid="B44">44</abbr></abbrgrp>, we also examined the selective constraints on low complexity regions as defined by SEG <abbrgrp><abbr bid="B46">46</abbr></abbrgrp>. Not surprisingly, low complexity regions have a higher A/S ratio, but the profile is different from that of the disordered regions (Figure <figr fid="F4">4e</figr>), confirming that disorder and low complexity are related but different sequence features.</p>
         </sec>
         <sec>
            <st>
               <p>SNP A/S ratios and protein subcellular localization</p>
            </st>
            <p>Subcellular localization is an important aspect of protein function. There have been conflicting reports about the correlation between protein subcellular localization and evolutionary rate. While a previous survey of human SNPs in 2002 did not find a significant correlation of selective pressure against deleterious non-synonymous SNPs with localization <abbrgrp><abbr bid="B47">47</abbr></abbrgrp>, a more recent study of mammalian sequences found that secreted proteins evolve much faster than cytoplasmic proteins (Ka/Ks 0.27 versus 0.12), and that membrane segments are under higher selective pressure than non-membrane segments (0.07 versus 0.15) <abbrgrp><abbr bid="B48">48</abbr></abbrgrp>. We attempted to address this issue by examining A/S ratios from several subcellular localization assignment methods. When we divide our data set into 3,064 secreted proteins and 10,622 non-secreted proteins according to SignalP <abbrgrp><abbr bid="B49">49</abbr></abbrgrp> predictions, there is a small and insignificant difference between these two classes, but the residues within the signal peptides appear under much less selective pressure (A/S ratios of 0.42 versus 0.29; Figure <figr fid="F5">5a</figr>). Interestingly, when only the subset of genes that have mRNA expression data was examined (both before and after controlling for expression), secreted proteins had significantly higher A/S ratios than non-secreted proteins (<it>p</it>-value &lt; 1e-04; Figure S3a in Additional data file 1). There is no difference between membrane proteins and non-membrane proteins, membrane segments and non-membrane segments according to TMHMM <abbrgrp><abbr bid="B50">50</abbr></abbrgrp> predictions (Figure <figr fid="F5">5b</figr>; Figure S3b in Additional data file 1). We also obtained predictions of subcellular localizations for non-membrane proteins by LOCtree <abbrgrp><abbr bid="B51">51</abbr></abbrgrp>, a hierarchical prediction system mimicking cellular sorting mechanisms. Predicted extracellular proteins have an A/S ratio of 0.34 on average, significantly higher than nuclear and cytoplasmic proteins (Figure <figr fid="F5">5c</figr>). Lastly, we examined A/S ratios of 6,228 proteins that have unambiguous GO cellular component assignments. We observed the same trend as for the LOCtree predictions, although the absolute numbers are slightly lower (Figure <figr fid="F5">5d</figr>). This may be explained by the fact that more conserved proteins are more likely to get GO annotation through sequence homology. The selective constraints acted upon membrane proteins seem to fall between the extracellular and cytoplasmic proteins according to the GO annotations (Figure <figr fid="F5">5d</figr>). The results from both LOCtree predictions and GO annotation did not change qualitatively when controlling for expression breadth (Figure S3c,d in Additional data file 1). Overall, our analysis suggests that extracellular proteins are indeed under more relaxed selection than cytoplasmic and nuclear proteins, but the difference is not as dramatic as previously reported. The absence of difference between membrane and non-membrane proteins according to TMHMM predictions may result from the lack of distinction between the extracellular and cytoplasmic/nuclear proteins.</p>
            <fig id="F5">
               <title>
                  <p>Figure 5</p>
               </title>
               <caption>
                  <p>Selective pressures on protein subcellular localization</p>
               </caption>
               <text>
                  <p>Selective pressures on protein subcellular localization. Error bars represent 95th percentile confidence intervals from bootstrap resampling. <b>(a) </b>Analysis of SignalP predictions suggests that while there is no significant difference in selective pressure between secreted and non-secreted proteins, residues within signal peptides are evolving faster. <b>(b) </b>TMHMM predictions show no difference in A/S ratios between membrane proteins and non-membrane proteins, transmembrane segments and non-transmembrane segments. <b>(c) </b>LOCtree predictions of protein subcellular localization indicate extracellular proteins (1,587 proteins) are under more relaxed selective pressure than cytoplasmic proteins (2,105) and nuclear proteins (5,431). <b>(d) </b>GO cellular component annotations suggest extracellular proteins (522 proteins) are under more relaxed selective pressure than cytoplasmic proteins (1,030) and nuclear proteins (1,961), while membrane proteins (2,715) fall in between. The 95th percentile confidence intervals of the A/S ratio are [0.27, 0.33] for extracellular proteins, [0.21, 0.24] for nuclear proteins, [0.22, 0.26] for cytoplasmic proteins, and [0.26, 0.29] for membrane proteins.</p>
               </text>
               <graphic file="gb-2008-9-4-r69-5"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Selective constraints on functional classes and protein families</p>
            </st>
            <p>We next studied the variation in SNP distribution of functional categories based on GO annotations. A/S ratios were calculated for 176 GO biological process categories and 152 molecular function categories that have at least 20 genes in our data set. As expected, there are dramatic differences in selective constraints among different categories: A/S ratios range from 0.72 for 'sensory perception of smell' to 0.07 for 'protein kinase C activation' (Table <tblr tid="T1">1</tblr>). We compared our results with a comparative genomic study of human and chimpanzee <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>. Seven of the top ten categories with highest divergence rates between human and chimpanzee are not present in our entire set of 176 categories due to differences in gene sets and the availability of SNP data. Among the three that are present, all show elevated A/S ratios, and two of them are also in our top ten list (GO:0007608 sensory perception of smell and GO:0007565 female pregnancy). When GO terms were mapped to a small set of high level terms according to Gene Ontology Annotation <abbrgrp><abbr bid="B52">52</abbr></abbrgrp> (GOA slim), the biological process category with the most relaxed selective constraint was 'response to stimulus', which has a significantly higher A/S ratio of 0.33 compared with 'multicellular organismal development', 'transport', 'macromolecule metabolic process', and 'cell differentiation' (Figure <figr fid="F6">6a</figr>). In terms of molecular function, the least variable groups are 'protein transporter activity' and 'motor activity', and the opposite groups are 'receptor activity' and 'isomerase activity' (Figure <figr fid="F6">6b</figr>).</p>
            <fig id="F6">
               <title>
                  <p>Figure 6</p>
               </title>
               <caption>
                  <p>Evolutionary constraints on protein functional categories</p>
               </caption>
               <text>
                  <p>Evolutionary constraints on protein functional categories. Error bars represent 95th percentile confidence intervals from bootstrap resampling. GO annotations were extracted for each protein, and the GO terms were mapped to high level GOA slim terms for <b>(a) </b>biological process and <b>(b) </b>molecular function. SNP A/S ratios were then calculated for each group.</p>
               </text>
               <graphic file="gb-2008-9-4-r69-6"/>
            </fig>
            <tbl id="T1">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>GO biological process categories with the highest and lowest SNP A/S ratios</p>
               </caption>
               <tblbdy cols="4">
                  <r>
                     <c ca="left">
                        <p>GO accession</p>
                     </c>
                     <c ca="right">
                        <p>A/S ratio</p>
                     </c>
                     <c ca="right">
                        <p>Number of proteins</p>
                     </c>
                     <c ca="left">
                        <p>GO description</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="4">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>GO:0007608</p>
                     </c>
                     <c ca="right">
                        <p>0.72</p>
                     </c>
                     <c ca="right">
                        <p>298</p>
                     </c>
                     <c ca="left">
                        <p>Sensory perception of smell</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>GO:0050896</p>
                     </c>
                     <c ca="right">
                        <p>0.54</p>
                     </c>
                     <c ca="right">
                        <p>403</p>
                     </c>
                     <c ca="left">
                        <p>Response to stimulus</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>GO:0007565</p>
                     </c>
                     <c ca="right">
                        <p>0.48</p>
                     </c>
                     <c ca="right">
                        <p>43</p>
                     </c>
                     <c ca="left">
                        <p>Female pregnancy</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>GO:0006298</p>
                     </c>
                     <c ca="right">
                        <p>0.47</p>
                     </c>
                     <c ca="right">
                        <p>29</p>
                     </c>
                     <c ca="left">
                        <p>Mismatch repair</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>GO:0031424</p>
                     </c>
                     <c ca="right">
                        <p>0.46</p>
                     </c>
                     <c ca="right">
                        <p>22</p>
                     </c>
                     <c ca="left">
                        <p>Keratinization</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>GO:0007186</p>
                     </c>
                     <c ca="right">
                        <p>0.43</p>
                     </c>
                     <c ca="right">
                        <p>600</p>
                     </c>
                     <c ca="left">
                        <p>G-protein coupled receptor protein signaling pathway</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>GO:0007131</p>
                     </c>
                     <c ca="right">
                        <p>0.42</p>
                     </c>
                     <c ca="right">
                        <p>20</p>
                     </c>
                     <c ca="left">
                        <p>Meiotic recombination</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>GO:0008033</p>
                     </c>
                     <c ca="right">
                        <p>0.40</p>
                     </c>
                     <c ca="right">
                        <p>26</p>
                     </c>
                     <c ca="left">
                        <p>tRNA processing</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>GO:0045087</p>
                     </c>
                     <c ca="right">
                        <p>0.39</p>
                     </c>
                     <c ca="right">
                        <p>57</p>
                     </c>
                     <c ca="left">
                        <p>Innate immune response</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>GO:0006633</p>
                     </c>
                     <c ca="right">
                        <p>0.37</p>
                     </c>
                     <c ca="right">
                        <p>20</p>
                     </c>
                     <c ca="left">
                        <p>Fatty acid biosynthetic process</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>GO:0006986</p>
                     </c>
                     <c ca="right">
                        <p>0.14</p>
                     </c>
                     <c ca="right">
                        <p>40</p>
                     </c>
                     <c ca="left">
                        <p>Response to unfolded protein</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>GO:0006445</p>
                     </c>
                     <c ca="right">
                        <p>0.14</p>
                     </c>
                     <c ca="right">
                        <p>26</p>
                     </c>
                     <c ca="left">
                        <p>Regulation of translation</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>GO:0006096</p>
                     </c>
                     <c ca="right">
                        <p>0.14</p>
                     </c>
                     <c ca="right">
                        <p>37</p>
                     </c>
                     <c ca="left">
                        <p>Glycolysis</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>GO:0007420</p>
                     </c>
                     <c ca="right">
                        <p>0.13</p>
                     </c>
                     <c ca="right">
                        <p>25</p>
                     </c>
                     <c ca="left">
                        <p>Brain development</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>GO:0006334</p>
                     </c>
                     <c ca="right">
                        <p>0.13</p>
                     </c>
                     <c ca="right">
                        <p>38</p>
                     </c>
                     <c ca="left">
                        <p>Nucleosome assembly</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>GO:0006816</p>
                     </c>
                     <c ca="right">
                        <p>0.12</p>
                     </c>
                     <c ca="right">
                        <p>61</p>
                     </c>
                     <c ca="left">
                        <p>Calcium ion transport</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>GO:0007411</p>
                     </c>
                     <c ca="right">
                        <p>0.12</p>
                     </c>
                     <c ca="right">
                        <p>20</p>
                     </c>
                     <c ca="left">
                        <p>Axon guidance</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>GO:0006333</p>
                     </c>
                     <c ca="right">
                        <p>0.10</p>
                     </c>
                     <c ca="right">
                        <p>22</p>
                     </c>
                     <c ca="left">
                        <p>Chromatin assembly or disassembly</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>GO:0000398</p>
                     </c>
                     <c ca="right">
                        <p>0.09</p>
                     </c>
                     <c ca="right">
                        <p>62</p>
                     </c>
                     <c ca="left">
                        <p>Nuclear mRNA splicing, via spliceosome</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>GO:0007205</p>
                     </c>
                     <c ca="right">
                        <p>0.07</p>
                     </c>
                     <c ca="right">
                        <p>21</p>
                     </c>
                     <c ca="left">
                        <p>Protein kinase C activation</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>Top part: ten GO categories with the highest A/S ratios. Bottom part: ten GO categories with the lowest A/S ratios.</p>
               </tblfn>
            </tbl>
            <p>We also sought to quantify the selective pressure on protein families. Of the 13,686 proteins in our data set, 10,629 can be assigned to at least one Pfam <abbrgrp><abbr bid="B53">53</abbr></abbrgrp> family using the HMMER program. Among the 190 Pfam families that have at least 20 members, the families with the lowest A/S ratios include protein kinase C-terminal domain family (PF00433) and core histones (PF00125); on the high end there are mammalian taste receptors (PF05296), the rhodopsin family (PF00001), and glutathione S-transferases (PF02798 and PF00043) (Table <tblr tid="T2">2</tblr>). We took a closer look at the G protein-coupled receptor (GPCR) family. GPCRs comprise a large protein family of seven transmembrane receptors that play important roles in sensing environmental signals. They are the targets of more than 40% of all modern drugs. There are five Pfam GPCR families that have more than 20 proteins in our data set. Mammalian taste receptor proteins (PF05296) and rhodopsin family (PF00001) are among the most variable protein families, with an A/S ratio of 0.49. The other three (PF00002 secretin family, PF00003 metabotropic glutamate family, and PF01461 7TM chemoreceptor) have A/S ratios of around 0.25, similar to the overall A/S ratio of 0.28 in our entire dataset. There are 558 proteins that belong to the rhodopsin family, including 286 olfactory receptors. The elevated A/S ratio in the family can be largely attributed to olfactory receptors (A/S = 0.73): the non-olfactory receptors in this family have an A/S ratio of 0.30. Therefore, it appears that among GPCRs, only olfactory and taste receptors have extraordinarily high variations, while other proteins behave like average human proteins.</p>
            <tbl id="T2">
               <title>
                  <p>Table 2</p>
               </title>
               <caption>
                  <p>Pfam families with the highest and lowest SNP A/S ratios</p>
               </caption>
               <tblbdy cols="4">
                  <r>
                     <c ca="left">
                        <p>Pfam accession</p>
                     </c>
                     <c ca="right">
                        <p>A/S ratio</p>
                     </c>
                     <c ca="right">
                        <p>Number of proteins</p>
                     </c>
                     <c ca="left">
                        <p>Pfam description</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="4">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>PF05296</p>
                     </c>
                     <c ca="right">
                        <p>0.49</p>
                     </c>
                     <c ca="right">
                        <p>55</p>
                     </c>
                     <c ca="left">
                        <p>Mammalian taste receptor protein (TAS2R)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>PF00001</p>
                     </c>
                     <c ca="right">
                        <p>0.49</p>
                     </c>
                     <c ca="right">
                        <p>558</p>
                     </c>
                     <c ca="left">
                        <p>7 transmembrane receptor (rhodopsin family)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>PF02798</p>
                     </c>
                     <c ca="right">
                        <p>0.47</p>
                     </c>
                     <c ca="right">
                        <p>20</p>
                     </c>
                     <c ca="left">
                        <p>Glutathione S-transferase, amino-terminal domain</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>PF00043</p>
                     </c>
                     <c ca="right">
                        <p>0.46</p>
                     </c>
                     <c ca="right">
                        <p>24</p>
                     </c>
                     <c ca="left">
                        <p>Glutathione S-transferase, carboxy-terminal domain</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>PF01454</p>
                     </c>
                     <c ca="right">
                        <p>0.45</p>
                     </c>
                     <c ca="right">
                        <p>24</p>
                     </c>
                     <c ca="left">
                        <p>MAGE family</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>PF09723</p>
                     </c>
                     <c ca="right">
                        <p>0.44</p>
                     </c>
                     <c ca="right">
                        <p>42</p>
                     </c>
                     <c ca="left">
                        <p>Putative regulatory protein (CxxC_CxxC_SSSS)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>PF02023</p>
                     </c>
                     <c ca="right">
                        <p>0.44</p>
                     </c>
                     <c ca="right">
                        <p>39</p>
                     </c>
                     <c ca="left">
                        <p>SCAN domain</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>PF00059</p>
                     </c>
                     <c ca="right">
                        <p>0.43</p>
                     </c>
                     <c ca="right">
                        <p>58</p>
                     </c>
                     <c ca="left">
                        <p>Lectin C-type domain</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>PF07859</p>
                     </c>
                     <c ca="right">
                        <p>0.42</p>
                     </c>
                     <c ca="right">
                        <p>21</p>
                     </c>
                     <c ca="left">
                        <p>alpha/beta Hydrolase fold</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>PF00048</p>
                     </c>
                     <c ca="right">
                        <p>0.40</p>
                     </c>
                     <c ca="right">
                        <p>23</p>
                     </c>
                     <c ca="left">
                        <p>Small cytokines (intecrine/chemokine), interleukin-8 like</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>PF00105</p>
                     </c>
                     <c ca="right">
                        <p>0.14</p>
                     </c>
                     <c ca="right">
                        <p>38</p>
                     </c>
                     <c ca="left">
                        <p>Zinc finger, C4 type (two domains)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>PF00536</p>
                     </c>
                     <c ca="right">
                        <p>0.14</p>
                     </c>
                     <c ca="right">
                        <p>68</p>
                     </c>
                     <c ca="left">
                        <p>SAM domain (Sterile alpha motif)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>PF07649</p>
                     </c>
                     <c ca="right">
                        <p>0.13</p>
                     </c>
                     <c ca="right">
                        <p>45</p>
                     </c>
                     <c ca="left">
                        <p>C1-like domain</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>PF00125</p>
                     </c>
                     <c ca="right">
                        <p>0.13</p>
                     </c>
                     <c ca="right">
                        <p>25</p>
                     </c>
                     <c ca="left">
                        <p>Core histone H2A/H2B/H3/H4</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>PF00535</p>
                     </c>
                     <c ca="right">
                        <p>0.13</p>
                     </c>
                     <c ca="right">
                        <p>27</p>
                     </c>
                     <c ca="left">
                        <p>Glycosyl transferase family 2</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>PF01437</p>
                     </c>
                     <c ca="right">
                        <p>0.13</p>
                     </c>
                     <c ca="right">
                        <p>31</p>
                     </c>
                     <c ca="left">
                        <p>Plexin repeat</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>PF00335</p>
                     </c>
                     <c ca="right">
                        <p>0.13</p>
                     </c>
                     <c ca="right">
                        <p>23</p>
                     </c>
                     <c ca="left">
                        <p>Tetraspanin family</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>PF00350</p>
                     </c>
                     <c ca="right">
                        <p>0.12</p>
                     </c>
                     <c ca="right">
                        <p>28</p>
                     </c>
                     <c ca="left">
                        <p>Dynamin family</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>PF07707</p>
                     </c>
                     <c ca="right">
                        <p>0.11</p>
                     </c>
                     <c ca="right">
                        <p>36</p>
                     </c>
                     <c ca="left">
                        <p>BTB And C-terminal Kelch</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>PF00433</p>
                     </c>
                     <c ca="right">
                        <p>0.09</p>
                     </c>
                     <c ca="right">
                        <p>33</p>
                     </c>
                     <c ca="left">
                        <p>Protein kinase C terminal domain</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>Top part: ten families with the highest A/S ratios. Bottom part: ten families with the lowest A/S ratios.</p>
               </tblfn>
            </tbl>
         </sec>
         <sec>
            <st>
               <p>Selective pressure on disease-related proteins</p>
            </st>
            <p>Knowledge about the degree of selection for disease-related genes can help us understand the etiology of human diseases. An early study found that human disease genes evolve faster at both synonymous and non-synonymous sites than non-disease genes, and Ka/Ks ratios of disease genes are 24% higher <abbrgrp><abbr bid="B54">54</abbr></abbrgrp>. Although the elevated Ks has subsequently been confirmed by others, later studies reported no difference in Ka/Ks between disease genes and non-disease genes <abbrgrp><abbr bid="B55">55</abbr></abbrgrp> or lower Ka for disease genes <abbrgrp><abbr bid="B56">56</abbr></abbrgrp>. It has also been shown that significant differences exist between the Ka/Ks ratio for different pathophysiological classes: genes related to neurological diseases evolve much slower than those associated with immune, hematological and pulmonary diseases <abbrgrp><abbr bid="B55">55</abbr></abbrgrp>. We investigated the SNP distribution of human disease genes using two cancer-related gene collections (243 genes from Cancer Gene Census (CGC) <abbrgrp><abbr bid="B57">57</abbr></abbrgrp>, and 3,103 genes from the Catalogue of Somatic Mutations in Cancer (COSMIC) <abbrgrp><abbr bid="B58">58</abbr></abbrgrp>) and the catalog of heritable human disease genes from Online Mendelian Inheritance in Man (OMIM; 2,334 genes) <abbrgrp><abbr bid="B27">27</abbr></abbrgrp>. These three data sets represent 4,649 unique human genes, and 139 genes are common to all three sets. Our analysis of the SNP data shows that disease related genes indeed have a higher synonymous SNP density (OMIM, 5.14; COSMIC, 4.41; CGC, 4.73; non-disease, 4.19, per 1,000 synonymous sites). However, the numbers of non-synonymous SNPs per site for disease genes are lower than that for non-disease genes, resulting in significantly lower A/S ratios in disease genes (<it>p</it>-value &lt; 1e-04; Figure <figr fid="F7">7</figr>). The difference between our analysis and some previous studies could be explained by two factors. First, our data sets are substantially bigger than what were used in previous studies. For example, the Smith and Eyre-Walker study <abbrgrp><abbr bid="B54">54</abbr></abbrgrp> analyzed only 392 genes in the disease set and 2,038 genes in the non-disease set, and the Huang <it>et al</it>. study <abbrgrp><abbr bid="B55">55</abbr></abbrgrp> included 1,178 human disease genes. The other possibility is that the evolution of disease-related genes has different patterns in the human lineage, leading to the difference in SNP A/S ratios and Ka/Ks ratios from human-rodent alignments. It has also been suggested that when non-disease genes are partitioned into housekeeping genes and others, the evolutionary rates of disease genes lie between them <abbrgrp><abbr bid="B59">59</abbr></abbrgrp>. This is consistent with our data: the SNP A/S ratio for OMIM is 0.24, indeed higher than housekeeping genes (genes with the broadest expression patterns, A/S = 0.19; Figure <figr fid="F2">2c</figr>). Moreover, when controlling for expression breadth (so that different groups have the same distribution of expression breadth, and thus the same proportion of housekeeping genes), non-disease genes still showed significantly higher A/S ratios than genes in the OMIM and COSMIC sets, while the confidence interval of A/S ratios for genes in the CGC set slightly overlapped with that for non-disease genes (Figure S2c in Additional data file 1), mostly due to large variance in the CGC set resulting from a smaller number of genes.</p>
            <fig id="F7">
               <title>
                  <p>Figure 7</p>
               </title>
               <caption>
                  <p>Disease-related genes are under stronger selective pressure</p>
               </caption>
               <text>
                  <p>Disease-related genes are under stronger selective pressure. Disease related genes were obtained from CGC (243 genes), COSMIC (3,103 genes), and OMIM (2,334 genes) databases. The SNP A/S ratio was calculated for each group. The 95th percentile confidence intervals from bootstrap resampling (shown as error bars) are [0.19, 0.27] for CGC, [0.20, 0.22] for COSMIC, [0.23, 0.26] for OMIM, and [0.31, 0.33] for others.</p>
               </text>
               <graphic file="gb-2008-9-4-r69-7"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Selective constraints and protein-protein interaction</p>
            </st>
            <p>It is still debatable whether there is any correlation between protein-protein interaction and selective pressure. Most studies so far have been based on data from the budding yeast <it>Saccharomyces cerevisiae</it>. After an initial report that yeast proteins with more interaction partners evolve slowly <abbrgrp><abbr bid="B60">60</abbr></abbrgrp>, several studies suggested that the correlation is dependent on interaction data sets <abbrgrp><abbr bid="B61">61</abbr></abbrgrp>, or that it may be a secondary effect due to protein abundance <abbrgrp><abbr bid="B62">62</abbr></abbrgrp>. The latest and most conclusive study in yeast suggested that there is no correlation between connectivity and evolutionary rate in a higher quality literature curated interaction data set, while negative correlations observed in some high-throughput data sets even after controlling for expression could be artifacts of the data sets <abbrgrp><abbr bid="B63">63</abbr></abbrgrp>. We obtained human protein-protein interaction data from the IntAct database <abbrgrp><abbr bid="B64">64</abbr></abbrgrp> and examined how SNP A/S ratios are correlated with the connectivity of proteins in the protein-protein interaction network. When all types of interactions were included, proteins with more than five interaction partners appear to have significantly lower A/S ratios than proteins with no more than one partner (Figure <figr fid="F8">8a</figr>, gray bars). We also noticed that proteins with more interaction partners tend to have higher mRNA expression (Figure <figr fid="F8">8b</figr>, gray bars). The Kendall's rank correlation between connectivity and the SNP A/S ratio was -0.131 (<it>p</it>-value &lt; 1e-04), and it dropped to -0.106 (<it>p</it>-value &lt; 1e-04) after controlling for both mean expression level and expression breadth. The correlation between protein abundance and high connectivity in the interaction network could be either a real biological phenomenon or experimental bias; for example, mass spectrometry-based protein complex pulldown experiments are more likely to identify interaction partners for abundant proteins. When we included only yeast two-hybrid interactions in our analysis, which are supposedly less biased with respect to intrinsic expression levels, the correlation between connectivity and abundance largely disappeared, except for the proteins with no interaction partners in the database (Figure <figr fid="F8">8b</figr>, white bars); at the same time, the difference in A/S ratios between proteins with only one partner and those with more than one became smaller and lost statistical significance in some cases according to bootstrap analysis (Figure <figr fid="F8">8a</figr>, white bars). For yeast two-hybrid interactions only, the correlation between connectivity and the SNP A/S ratio was -0.100, and it dropped only slightly to -0.090 (<it>p</it>-value = 0.007) after controlling for expression. Nevertheless, the partial correlations were still statistically significant in both the yeast two-hybrid interaction set and the all interaction set. Our analysis supports the idea that the correlation between evolutionary rate and connectivity in the interaction network can, in part, be explained by protein abundance and that some of the correlation may result from experimental bias. Similar to all the conflicting studies in yeast, it is likely that this result is inconclusive and may vary from data set to data set.</p>
            <fig id="F8">
               <title>
                  <p>Figure 8</p>
               </title>
               <caption>
                  <p>Selective pressures on connectivity in protein-protein interaction networks</p>
               </caption>
               <text>
                  <p>Selective pressures on connectivity in protein-protein interaction networks. Error bars represent 95th percentile confidence intervals from bootstrap resampling. <b>(a) </b>Proteins with more interaction partners appear to have lower A/S ratios (gray bars); however, for yeast two-hybrid interactions, the differences are less significant for proteins with at least one interaction partner (white bars). <b>(b) </b>Proteins with more interaction partners tend to have higher mRNA expression levels (gray bars). This could result from experimental bias: for yeast two-hybrid interactions, the differences are not significant for proteins with at least one interaction partner (white bars).</p>
               </text>
               <graphic file="gb-2008-9-4-r69-8"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>SNP A/S ratio and splicing</p>
            </st>
            <p>A recent study suggested that protein evolution is strongly affected by mRNA splicing, in addition to the biology of the protein <abbrgrp><abbr bid="B65">65</abbr></abbrgrp>. Based on Ka and Ka/Ks from the human-mouse comparison, it was reported that the proportion of sequence near intron-exon boundaries is a strong predictor of evolutionary rates in human, in part due to splice enhancers located close to intron-exon junctions. We were able to confirm this result using SNP data. Codons within 70 bp of the intron-exon boundaries have a SNP A/S ratio of 0.22, significantly lower than 0.30 for codons that are far from the junction. At the protein level, proteins with more than 80% of the sequences within 70 bp of the boundaries have an SNP A/S ratio of 0.20, much lower than those with only 5% close to the boundaries (Figure <figr fid="F9">9</figr>). The Kendall's rank correlation between the proportion of sequence near intron-exon boundaries and SNP A/S ratio is -0.163, comparable to the correlation between mean mRNA expression levels and the SNP A/S ratios. After controlling for expression breadth, the correlation remained significant (&#964; = -0.147, <it>p</it>-value &lt; 1e-04).</p>
            <fig id="F9">
               <title>
                  <p>Figure 9</p>
               </title>
               <caption>
                  <p>The SNP A/S ratio negatively correlates with the proportion of coding sequence (CDS) within 70 bp of an exon-intron junction</p>
               </caption>
               <text>
                  <p>The SNP A/S ratio negatively correlates with the proportion of coding sequence (CDS) within 70 bp of an exon-intron junction. Genes were grouped into bins of nine equal intervals according to the proportion of sequence within 70 bp of an exon-intron junction, and the SNP A/S ratio was obtained for each bin. Error bars represent 95th percentile confidence intervals from bootstrap resampling.</p>
               </text>
               <graphic file="gb-2008-9-4-r69-9"/>
            </fig>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Discussion</p>
         </st>
         <sec>
            <st>
               <p>Measuring selective constraints with SNP A/S ratios</p>
            </st>
            <p>The average SNP density in our data set is about 2 SNPs per 1,000 nucleotides. Although we have limited our analysis to proteins with at least one validated coding SNP, many proteins in our set still have no non-synonymous SNPs or synonymous SNPs. Therefore, it is neither practical nor reliable to measure the selective constraints on individual proteins using the SNP A/S ratio. Nevertheless, we have demonstrated that when a group of proteins (or residues) are measured together, the measure can be quite robust and often in good agreement with Ka/Ks for divergence. For example, the SNP A/S ratio for proteins with paralogs in our data set is 0.47, very close to the Ka/Ks ratio of 0.45 for duplicated mammalian genes reported earlier <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>. Although this may not seem surprising given that it is an expected prediction from the neutral theory of evolution, our analysis showed that the SNP A/S ratio can indeed be a practical and robust measure given the current volume of SNP data in public databases.</p>
            <p>There are two unique advantages of using SNP A/S ratios to measure selective constraints. First, for human-specific genes and genes whose ortholog relationships can not be determined reliably, inter-species Ka/Ks ratios can not be estimated. For example, among the 13,686 genes in our data set, human-mouse Ka/Ks ratios for 3,937 genes are not available through Ensemble. In those cases, SNP A/S ratios provide an alternative way of measuring selective pressure by the same evolutionary principle. Second, the SNP A/S ratio is a direct measure of the selective constraints specific to the human lineage, which can not be obtained from Ka/Ks for species divergence.</p>
            <p>In comparison with other simpler measures used in some earlier SNP studies (for example, <abbrgrp><abbr bid="B38">38</abbr></abbrgrp>), such as density of synonymous and non-synonymous SNPs or the fraction of SNPs in different protein classes, the SNP A/S ratio offers several advantages: its interpretation is clearer from evolutionary theories; it is not subject to data selection bias arising from the popularity of genes in sequencing efforts, which could be a problem when using SNP density; and it is normalized by the number of synonymous and non-synonymous sites, so the numbers are comparable across different protein classes or different studies.</p>
         </sec>
         <sec>
            <st>
               <p>Correlation between SNP distribution and structural/functional constraints</p>
            </st>
            <p>To address the issue of determinants of molecular evolution, many studies have been published examining the correlation between variables that characterize the evolution, expression and function of genes. However, as noted in the Results section, there are many controversies among those reports. It is clear that the validity and significance of correlations depend on many factors, including feature variables, data sets, and the correlation measure. Here, we provide the first large-scale study of SNP A/S ratios, and reported correlations between the ratio and a number of variables; not surprisingly, some are not in agreement with previous studies. Although further studies are still needed to draw definitive conclusions about the major determinants, we agree with many others that it is likely that many of these variables correlate with each other in some way, and some of them are secondary effects rather than primary determinants <abbrgrp><abbr bid="B66">66</abbr></abbrgrp>. Several statistical methods, such as partial correlation and principle component regression, have been used to attempt to dissect these complex and rich connections; yet it remains an important open challenge in molecular evolution.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>Molecular evolution is a dynamic process, with strong implications for both differences between species and variations within a given species. SNPs in the human genome perhaps capture a glimpse of this dynamic process and, therefore, could offer key insights missed by conventional cross-species comparisons. Here, we first established the SNP A/S ratio as a reliable metric for studying the selective constraints for molecular evolution, and then used this metric to systematically investigate a large number of protein features that contribute to differences in molecular evolution rate. Our study provided the first such large-scale survey based on SNP A/S ratios, leading to novel insights into features that have not been examined before and clarification of findings that were contradictory in the literature.</p>
      </sec>
      <sec>
         <st>
            <p>Materials and methods</p>
         </st>
         <sec>
            <st>
               <p>Source of sequences and SNP data</p>
            </st>
            <p>We first filtered the human entries in the NCBI's Entrez Gene database <abbrgrp><abbr bid="B27">27</abbr></abbrgrp> by the presence of RefSeq <abbrgrp><abbr bid="B27">27</abbr></abbrgrp> records, and limited our analysis to genes that have at least one validated coding SNP according to dbSNP <abbrgrp><abbr bid="B6">6</abbr></abbrgrp> build 127. For each gene, we chose one transcript (and its corresponding protein product) by selecting the transcript with the most validated coding SNPs, most advanced RefSeq annotation status, and the longest sequence. We also discarded proteins that are longer than 5,000 residues or shorter than 25 residues. Our final data set has 13,686 proteins with 45,538 coding SNPs.</p>
         </sec>
         <sec>
            <st>
               <p>mRNA expression data</p>
            </st>
            <p>mRNA expression data were obtained from Gene Expression Atlas <abbrgrp><abbr bid="B24">24</abbr><abbr bid="B67">67</abbr></abbrgrp>. Only normal adult samples were included in the analysis. Samples were sorted into 54 non-redundant tissue types. The expression level of each probe set in a given tissue was calculated as the mean of log (base 10) MAS5 signal intensities of all samples in that tissue. The 'mean expression level' of a probe set was defined as the mean across all tissues, while 'peak expression level' was defined as the maximum among all tissues. The tissue specificity of a probe set was defined as the heterogeneity of its expression level across all tissues. It was calculated according to <abbrgrp><abbr bid="B68">68</abbr></abbrgrp> as:</p>
            <p>
               <display-formula>
                  <m:math name="gb-2008-9-4-r69-i1" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:mfrac>
                              <m:mrow>
                                 <m:mstyle displaystyle="true">
                                    <m:msubsup>
                                       <m:mo>&#8721;</m:mo>
                                       <m:mrow>
                                          <m:mi>j</m:mi>
                                          <m:mo>=</m:mo>
                                          <m:mn>1</m:mn>
                                       </m:mrow>
                                       <m:mi>n</m:mi>
                                    </m:msubsup>
                                    <m:mrow>
                                       <m:mo stretchy="false">(</m:mo>
                                       <m:mn>1</m:mn>
                                       <m:mo>&#8722;</m:mo>
                                       <m:mfrac>
                                          <m:mrow>
                                             <m:mi>log</m:mi>
                                             <m:mo>&#8289;</m:mo>
                                             <m:msub>
                                                <m:mi>S</m:mi>
                                                <m:mi>j</m:mi>
                                             </m:msub>
                                          </m:mrow>
                                          <m:mrow>
                                             <m:mi>log</m:mi>
                                             <m:mo>&#8289;</m:mo>
                                             <m:msub>
                                                <m:mi>S</m:mi>
                                                <m:mi>max</m:mi>
                                                <m:mo>&#8289;</m:mo>
                                             </m:msub>
                                          </m:mrow>
                                       </m:mfrac>
                                       <m:mo stretchy="false">)</m:mo>
                                    </m:mrow>
                                 </m:mstyle>
                              </m:mrow>
                              <m:mrow>
                                 <m:mi>n</m:mi>
                                 <m:mo>&#8722;</m:mo>
                                 <m:mn>1</m:mn>
                              </m:mrow>
                           </m:mfrac>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2Caerbhv2BYDwAHbqedmvETj2BSbqee0evGueE0jxyaibaiKI8=vI8GiVeY=Pipec8Eeeu0xXdbba9frFj0xb9Lqpepeea0xd9q8qiYRWxGi6xij=hbbc9s8aq0=yqpe0xbbG8A8frFve9Fve9Fj0dmeaabaqaciGacaGaaeqabaqabeGadaaakeaajuaGdaWcaaqaamaaqadabaGaaiikaiaaigdacqGHsisldaWcaaqaaiGacYgacaGGVbGaai4zaiaadofadaWgaaqaaiaadQgaaeqaaaqaaiGacYgacaGGVbGaai4zaiaadofaciGGTbGaaiyyaiaacIhaaaGaaiykaaqaaiaadQgacqGH9aqpcaaIXaaabaGaamOBaaGaeyyeIuoaaeaacaWGUbGaeyOeI0IaaGymaaaaaaa@4750@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>where <it>n</it> = 54 is the number of human tissues examined here, S<sub><it>j</it></sub> is the expression level in each tissue, and <it>S</it><sub>max</sub> is the highest expression level of the probe set across all tissues. When a gene has multiple probe sets, its expression levels and tissue specificity were represented by the probe set with the highest mean expression level.</p>
            <p>Affymetrix present/absent calls were used to calculate the breadth of the expression. A probe set was considered 'present' in a tissue if it had 'present' calls in no less than half of the samples in that tissue, and the expression breadth of a probe set was defined as the number of tissues in which the probe set was 'present'. When a gene has multiple probe sets, its expression breadth was represented by the probe set with the highest value of breadth.</p>
         </sec>
         <sec>
            <st>
               <p>Structural and functional features</p>
            </st>
            <p>Ka/Ks ratios from human-mouse alignments were downloaded from Ensembl <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>. The information about human paralogs was extracted from HomoloGene database <abbrgrp><abbr bid="B27">27</abbr><abbr bid="B28">28</abbr></abbrgrp> release 57; the existence of paralogs is indicated by the presence of other human proteins in the same homology group. To investigate the degree of conservation throughout the evolutionary history, that is, the age of a protein, we performed BLAST searches <abbrgrp><abbr bid="B15">15</abbr></abbrgrp> of each protein sequence against NCBI's RefSeq database, and collected all hits with an e-value &lt; 1e-10 and a protein length difference smaller than 30% as potential homologs. The query protein was then classified into one of the seven age groups (human, primate, mammal, vertebrate, animal, eukaryote, or universal) according to its most ancient homolog. The use of different e-values and length difference cutoffs did not change the results qualitatively. We also obtained the conservation score for each residue in all proteins in our data set by running PSI-BLAST <abbrgrp><abbr bid="B15">15</abbr></abbrgrp> against NCBI's nr database (parameters: '-j 3 -h 5e-3 -F F') and taking the values in the 'information per position' (a measure of alignment entropy) column from the ASCII format of PSI-BLAST profiles.</p>
            <p>We obtained protein structure features by the following computational methods using their default parameters: two-state (exposed or buried) solvent accessibility by PROFacc <abbrgrp><abbr bid="B40">40</abbr></abbrgrp>, signal peptides by SignalP version 3.0 <abbrgrp><abbr bid="B49">49</abbr></abbrgrp>, transmembrane helices by TMHMM 2.0 <abbrgrp><abbr bid="B50">50</abbr></abbrgrp>, secondary structures by PSIPRED <abbrgrp><abbr bid="B41">41</abbr></abbrgrp>, sequence complexity by SEG <abbrgrp><abbr bid="B46">46</abbr></abbrgrp>, and natively disordered proteins by Disopred2 <abbrgrp><abbr bid="B43">43</abbr></abbrgrp>. Disopred2 predictions were subsequently filtered to retain only the disordered regions longer than 30 residues.</p>
            <p>GO annotations <abbrgrp><abbr bid="B32">32</abbr></abbrgrp> were extracted from the GenBank record of each gene, and the GO terms were subsequently mapped to a selection of high-level terms according to GOA slim <abbrgrp><abbr bid="B52">52</abbr><abbr bid="B69">69</abbr></abbrgrp>. For cellular component annotations, the GOA slim terms were further collapsed into four categories (extracellular region, nucleus, cytoplasm, and membrane) where appropriate. Proteins assigned to more than one of these four categories were excluded from the analysis of subcellular localization by GO annotation. Subcellular localization prediction for non-membrane proteins (that is, no transmembrane helix predictions from TMHMM) were also obtained by using LOCtree <abbrgrp><abbr bid="B51">51</abbr></abbrgrp>. To assign proteins to protein families, we ran hmmpfam from the HMMER package (version 2.3) against Pfam_ls <abbrgrp><abbr bid="B53">53</abbr></abbrgrp> models (release 21.0) and obtained Pfam family hits with an e-value &lt; 0.01. Numbers of protein-protein interaction partners were obtained from the IntAct database <abbrgrp><abbr bid="B64">64</abbr><abbr bid="B70">70</abbr></abbrgrp> (28 September 2007 release). We also obtained disease related genes from CGC (243 genes) <abbrgrp><abbr bid="B57">57</abbr><abbr bid="B71">71</abbr></abbrgrp>, COSMIC <abbrgrp><abbr bid="B58">58</abbr><abbr bid="B72">72</abbr></abbrgrp> (3,103 genes), and OMIM databases <abbrgrp><abbr bid="B27">27</abbr><abbr bid="B73">73</abbr></abbrgrp> on 28 September 2007. For OMIM, only those genes with the 'confirmed' status (2,334 genes) were included in the analysis.</p>
            <p>Sequences within 70 bp of an exon-intron junction were collected in the same way as described in <abbrgrp><abbr bid="B65">65</abbr></abbrgrp>. Briefly, transcripts with less than three exons were excluded from the analysis. All internal exons were trimmed so that the first base was the first base of the first complete codon, and the last base the last of the final complete codon. The first and last codons were then removed from each exon, and remaining codons within 70 bp of the intron-exon boundary were defined as sequences close to the boundary.</p>
            <p>The features we considered can also be divided into two classes: protein-level features and residue-level features. A protein-level feature describes the property of an entire protein, for example, whether a protein has a transmembrane helix or not; in contrast, a residue-level feature describes the property of a subset of residues within a protein, for example, whether a residue resides in the transmembrane helix or not.</p>
         </sec>
         <sec>
            <st>
               <p>Data analysis</p>
            </st>
            <p>The SNP A/S ratio, also known as the Ka/Ks ratio for polymorphism, is defined as the ratio of the number of non-synonymous SNPs per non-synonymous site to the number of synonymous SNPs per synonymous site. The numbers of synonymous sites and non-synonymous sites were calculated using the method of Miyata and Yasunaga <abbrgrp><abbr bid="B74">74</abbr></abbrgrp>. The ratio for a set of proteins (or residues) was calculated by summing the number of SNPs and the number of sites to obtain A and S for the concatenated set before taking the ratio.</p>
            <p>We performed bootstrap re-sampling analysis to assess the statistical significance of the differences observed in A/S ratios between different groups. We obtained 10,000 bootstrap replicates by re-sampling with replacement from the original data set. A/S ratios for different groups were calculated for each replicate, and confidence intervals and <it>p</it>-values of the differences were obtained from those 10,000 sets of A/S ratios. For protein-level feature groups, re-samplings were performed within each group, so that those re-sampled data sets had the same number of proteins in each group as the original data set had. For residue-level feature groups, the re-sampled data sets were constructed by re-sampling the 13,686 proteins in our entire data set.</p>
            <p>Monte Carlo samplings were used to assess the differences in A/S ratios between groups when controlling for expression parameters. Briefly, the distribution of expression breadth (or expression level) for the group with the smallest number of genes was set as the target distribution; genes in all other groups were then sampled without replacement using Monte Carlo simulation so that all groups had the same distribution of the expression breadth (or level). The effectiveness of Monte Carlo samplings was confirmed by Kolmogorov-Smirnov tests. An example is shown in Figure S1 in Additional data file 1. For each group, 100 samplings were performed, and the A/S ratio for the group was taken as the mean of A/S ratios from the 100 Monte Carlo samples.</p>
            <p>Since the average SNP density in our data set is about 2 SNPs per 1,000 nucleotides, and many proteins in our data set have either no non-synonymous SNPs or no synonymous SNPs, it is not possible to reliably calculate the correlation between the SNP A/S ratio and other continuous variables using each protein as a data point. We chose to randomly group every six proteins together as a data point so that, on average, each data point had roughly the same number of SNPs as the reported number of single-nucleotide substitutions (1.23%) between the human and chimpanzee genomes <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>. Non-parametric Kendall's &#964; rank correlation coefficients <abbrgrp><abbr bid="B14">14</abbr></abbrgrp> and two-tailed <it>p</it>-values were used throughout the study. When controlling for expression parameters, Kendall's partial correlation between x and y controlling for z was calculated as:</p>
            <p>
               <display-formula>
                  <m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="gb-2008-9-4-r69-i2">
                     <m:semantics>
                        <m:mrow>
                           <m:msub>
                              <m:mi>&#964;</m:mi>
                              <m:mrow>
                                 <m:mi>x</m:mi>
                                 <m:mi>y</m:mi>
                                 <m:mo>.</m:mo>
                                 <m:mi>z</m:mi>
                              </m:mrow>
                           </m:msub>
                           <m:mo>=</m:mo>
                           <m:mfrac>
                              <m:mrow>
                                 <m:msub>
                                    <m:mi>&#964;</m:mi>
                                    <m:mrow>
                                       <m:mi>x</m:mi>
                                       <m:mi>y</m:mi>
                                    </m:mrow>
                                 </m:msub>
                                 <m:mo>&#8722;</m:mo>
                                 <m:msub>
                                    <m:mi>&#964;</m:mi>
                                    <m:mrow>
                                       <m:mi>x</m:mi>
                                       <m:mi>z</m:mi>
                                    </m:mrow>
                                 </m:msub>
                                 <m:msub>
                                    <m:mi>&#964;</m:mi>
                                    <m:mrow>
                                       <m:mi>y</m:mi>
                                       <m:mi>z</m:mi>
                                    </m:mrow>
                                 </m:msub>
                              </m:mrow>
                              <m:mrow>
                                 <m:msqrt>
                                    <m:mrow>
                                       <m:mo stretchy="false">(</m:mo>
                                       <m:mn>1</m:mn>
                                       <m:mo>&#8722;</m:mo>
                                       <m:msubsup>
                                          <m:mi>&#964;</m:mi>
                                          <m:mrow>
                                             <m:mi>x</m:mi>
                                             <m:mi>z</m:mi>
                                          </m:mrow>
                                          <m:mn>2</m:mn>
                                       </m:msubsup>
                                       <m:mo stretchy="false">)</m:mo>
                                       <m:mo stretchy="false">(</m:mo>
                                       <m:mn>1</m:mn>
                                       <m:mo>&#8722;</m:mo>
                                       <m:msubsup>
                                          <m:mi>&#964;</m:mi>
                                          <m:mrow>
                                             <m:mi>y</m:mi>
                                             <m:mi>z</m:mi>
                                          </m:mrow>
                                          <m:mn>2</m:mn>
                                       </m:msubsup>
                                       <m:mo stretchy="false">)</m:mo>
                                    </m:mrow>
                                 </m:msqrt>
                              </m:mrow>
                           </m:mfrac>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2Caerbhv2BYDwAHbqedmvETj2BSbqee0evGueE0jxyaibaiKI8=vI8GiVeY=Pipec8Eeeu0xXdbba9frFj0xb9Lqpepeea0xd9q8qiYRWxGi6xij=hbbc9s8aq0=yqpe0xbbG8A8frFve9Fve9Fj0dmeaabaqaciGacaGaaeqabaqabeGadaaakeaacqaHepaDdaWgaaWcbaGaamiEaiaadMhacaGGUaGaamOEaaqabaGccqGH9aqpjuaGdaWcaaqaaiabes8a0naaBaaaleaacaWG4bGaamyEaaqcfayabaGaeyOeI0IaeqiXdq3aaSbaaSqaaiaadIhacaWG6baajuaGbeaacqaHepaDdaWgaaWcbaGaamyEaiaadQhaaKqbagqaaaqaamaakaaabaGaaiikaiaaigdacqGHsislcqaHepaDlmaaDaaabaGaamiEaiaadQhaaeaacaaIYaaaaKqbakaacMcacaGGOaGaaGymaiabgkHiTiabes8a0TWaa0baaeaacaWG5bGaamOEaaqaaiaaikdaaaqcfaOaaiykaaqabaaaaaaa@5671@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>The random grouping was performed 100 times. The correlation and partial correlation coefficients were computed from these 100 samples, and the medians of those 100 sets of coefficients were reported.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Abbreviations</p>
         </st>
         <p>CGC, Cancer Gene Census; COSMIC, Catalogue of Somatic Mutations in Cancer; GO, Gene Ontology; GPCR, G protein-coupled receptor; Ka, non-synonymous substitutions per non-synonymous site; Ks, synonymous substitutions per synonymous site; OMIM, Online Mendelian Inheritance in Man; SNP, single nucleotide polymorphism.</p>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>ZZ and JL designed the study. JL and YZ collected the data and performed the data analysis. XL participated in the statistical analysis. JL and ZZ drafted the manuscript. All authors read and approved the final manuscript.</p>
      </sec>
      <sec>
         <st>
            <p>Additional data files</p>
         </st>
         <p>The following additional data are available. Additional data file <supplr sid="S1">1</supplr> includes Table S1 and Figures S1-S3.</p>
         <suppl id="S1">
            <title>
               <p>Additional data file 1</p>
            </title>
            <caption>
               <p>Table S1 and Figures S1-S3</p>
            </caption>
            <text>
               <p>Table S1 presents correlation and partial correlations between SNP A/S ratios and expression parameters. Figure S1 shows an example of using Monte Carlo sampling to get the same distributions of expression breadth for different groups of proteins. Figure S2 and S3 demonstrate that for most variables in our study, the differences in A/S ratios between groups do not change qualitatively after controlling for expression breadth.</p>
            </text>
            <file name="gb-2008-9-4-r69-S1.pdf">
               <p>Click here for file</p>
            </file>
         </suppl>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>We thank Kiran Mukhyala and Reece Hart for technical assistance, Joshua Kaminker, Peter Haverty, Shiuh-Ming Luoh, Peng Yue, and Colin Watanabe for helpful discussions, and Burkhard Rost and Rajesh Nair (Columbia University) for providing LOCtree and PROFacc predictions.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>The Ka/Ks ratio: diagnosing the form of sequence evolution.</p>
            </title>
            <aug>
               <au>
                  <snm>Hurst</snm>
                  <fnm>LD</fnm>
               </au>
            </aug>
            <source>Trends Genet</source>
            <pubdate>2002</pubdate>
            <volume>18</volume>
            <fpage>486</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0168-9525(02)02722-1</pubid>
                  <pubid idtype="pmpid" link="fulltext">12175810</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <aug>
               <au>
                  <snm>Li</snm>
                  <fnm>W-H</fnm>
               </au>
            </aug>
            <source>Molecular Evolution</source>
            <publisher>Sunderland, Massachusetts: Sinauer Associates, Inc.</publisher>
            <pubdate>1997</pubdate>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Essential genes are more evolutionarily conserved than are nonessential genes in bacteria.</p>
            </title>
            <aug>
               <au>
                  <snm>Jordan</snm>
                  <fnm>IK</fnm>
               </au>
               <au>
                  <snm>Rogozin</snm>
                  <fnm>IB</fnm>
               </au>
               <au>
                  <snm>Wolf</snm>
                  <fnm>YI</fnm>
               </au>
               <au>
                  <snm>Koonin</snm>
                  <fnm>EV</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2002</pubdate>
            <volume>12</volume>
            <fpage>962</fpage>
            <lpage>968</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1383730</pubid>
                  <pubid idtype="pmpid" link="fulltext">12045149</pubid>
                  <pubid idtype="doi">10.1101/gr.87702. Article published online before print in May 2002</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Mammalian housekeeping genes evolve more slowly than tissue-specific genes.</p>
            </title>
            <aug>
               <au>
                  <snm>Zhang</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>WH</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2004</pubdate>
            <volume>21</volume>
            <fpage>236</fpage>
            <lpage>239</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/molbev/msh010</pubid>
                  <pubid idtype="pmpid" link="fulltext">14595094</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Initial sequencing and comparative analysis of the mouse genome.</p>
            </title>
            <aug>
               <au>
                  <cnm>Mouse Genome Sequencing Consortium</cnm>
               </au>
               <au>
                  <snm>Waterston</snm>
                  <fnm>RH</fnm>
               </au>
               <au>
                  <snm>Lindblad-Toh</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Birney</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Rogers</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Abril</snm>
                  <fnm>JF</fnm>
               </au>
               <au>
                  <snm>Agarwal</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Agarwala</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Ainscough</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Alexandersson</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>An</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Antonarakis</snm>
                  <fnm>SE</fnm>
               </au>
               <au>
                  <snm>Attwood</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Baertsch</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Bailey</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Barlow</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Beck</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Berry</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Birren</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Bloom</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Bork</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Botcherby</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Bray</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Brent</snm>
                  <fnm>MR</fnm>
               </au>
               <au>
                  <snm>Brown</snm>
                  <fnm>DG</fnm>
               </au>
               <au>
                  <snm>Brown</snm>
                  <fnm>SD</fnm>
               </au>
               <au>
                  <snm>Bult</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Burton</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Butler</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Campbell</snm>
                  <fnm>RD</fnm>
               </au>
               <au>
                  <snm>Carninci</snm>
                  <fnm>P</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nature</source>
            <pubdate>2002</pubdate>
            <volume>420</volume>
            <fpage>520</fpage>
            <lpage>562</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nature01262</pubid>
                  <pubid idtype="pmpid" link="fulltext">12466850</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>dbSNP: the NCBI database of genetic variation.</p>
            </title>
            <aug>
               <au>
                  <snm>Sherry</snm>
                  <fnm>ST</fnm>
               </au>
               <au>
                  <snm>Ward</snm>
                  <fnm>MH</fnm>
               </au>
               <au>
                  <snm>Kholodov</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Baker</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Phan</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Smigielski</snm>
                  <fnm>EM</fnm>
               </au>
               <au>
                  <snm>Sirotkin</snm>
                  <fnm>K</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2001</pubdate>
            <volume>29</volume>
            <fpage>308</fpage>
            <lpage>311</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">29783</pubid>
                  <pubid idtype="pmpid" link="fulltext">11125122</pubid>
                  <pubid idtype="doi">10.1093/nar/29.1.308</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Genome-wide evaluation of the public SNP databases.</p>
            </title>
            <aug>
               <au>
                  <snm>Jiang</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Duan</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Windemuth</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Stephens</snm>
                  <fnm>JC</fnm>
               </au>
               <au>
                  <snm>Judson</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Xu</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Pharmacogenomics</source>
            <pubdate>2003</pubdate>
            <volume>4</volume>
            <fpage>779</fpage>
            <lpage>789</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1517/phgs.4.6.779.22821</pubid>
                  <pubid idtype="pmpid" link="fulltext">14596641</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Systematic investigation of genetic variability in 111 human genes-implications for studying variable drug response.</p>
            </title>
            <aug>
               <au>
                  <snm>Freudenberg-Hua</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Freudenberg</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Winantea</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Kluck</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Cichon</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Bruss</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Propping</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>N&#246;then</snm>
                  <fnm>MM</fnm>
               </au>
            </aug>
            <source>Pharmacogenomics J</source>
            <pubdate>2005</pubdate>
            <volume>5</volume>
            <fpage>183</fpage>
            <lpage>192</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/sj.tpj.6500306</pubid>
                  <pubid idtype="pmpid" link="fulltext">15809674</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>The neutral theory of molecular evolution.</p>
            </title>
            <aug>
               <au>
                  <snm>Kimura</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Evolution of Genes and Proteins</source>
            <publisher>Sunderland, Massachusetts: Sinauer Associates, Inc.</publisher>
            <editor>Nei M, Koehn RK</editor>
            <pubdate>1983</pubdate>
            <fpage>208</fpage>
            <lpage>233</lpage>
         </bibl>
         <bibl id="B10">
            <title>
               <p>Positive and negative selection on the human genome.</p>
            </title>
            <aug>
               <au>
                  <snm>Fay</snm>
                  <fnm>JC</fnm>
               </au>
               <au>
                  <snm>Wyckoff</snm>
                  <fnm>GJ</fnm>
               </au>
               <au>
                  <snm>Wu</snm>
                  <fnm>CI</fnm>
               </au>
            </aug>
            <source>Genetics</source>
            <pubdate>2001</pubdate>
            <volume>158</volume>
            <fpage>1227</fpage>
            <lpage>1234</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1461725</pubid>
                  <pubid idtype="pmpid" link="fulltext">11454770</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Human SNPs reveal no evidence of frequent positive selection.</p>
            </title>
            <aug>
               <au>
                  <snm>Zhang</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>WH</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2005</pubdate>
            <volume>22</volume>
            <fpage>2504</fpage>
            <lpage>2507</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/molbev/msi240</pubid>
                  <pubid idtype="pmpid" link="fulltext">16107590</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Initial sequence of the chimpanzee genome and comparison with the human genome.</p>
            </title>
            <aug>
               <au>
                  <cnm>Chimpanzee Sequencing and Analysis Consortium</cnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2005</pubdate>
            <volume>437</volume>
            <fpage>69</fpage>
            <lpage>87</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nature04072</pubid>
                  <pubid idtype="pmpid" link="fulltext">16136131</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Ensembl</p>
            </title>
            <url>http://www.ensembl.org</url>
         </bibl>
         <bibl id="B14">
            <aug>
               <au>
                  <snm>Gibbons</snm>
                  <fnm>JD</fnm>
               </au>
            </aug>
            <source>Nonparametric Measures of Association</source>
            <publisher>Newbury Park: Sage Publications</publisher>
            <pubdate>1993</pubdate>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Gapped Blast and PSI-Blast: a new generation of protein database search programs.</p>
            </title>
            <aug>
               <au>
                  <snm>Altschul</snm>
                  <fnm>SF</fnm>
               </au>
               <au>
                  <snm>Madden</snm>
                  <fnm>TL</fnm>
               </au>
               <au>
                  <snm>Sh&#228;ffer</snm>
                  <fnm>AA</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Lipman</snm>
                  <fnm>DJ</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1997</pubdate>
            <volume>25</volume>
            <fpage>3389</fpage>
            <lpage>3402</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">146917</pubid>
                  <pubid idtype="pmpid" link="fulltext">9254694</pubid>
                  <pubid idtype="doi">10.1093/nar/25.17.3389</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>An analysis of determinants of amino acids substitution rates in bacterial proteins.</p>
            </title>
            <aug>
               <au>
                  <snm>Rocha</snm>
                  <fnm>EP</fnm>
               </au>
               <au>
                  <snm>Danchin</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2004</pubdate>
            <volume>21</volume>
            <fpage>108</fpage>
            <lpage>116</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/molbev/msh004</pubid>
                  <pubid idtype="pmpid" link="fulltext">14595100</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>Highly expressed genes in yeast evolve slowly.</p>
            </title>
            <aug>
               <au>
                  <snm>P&#225;l</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Papp</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Hurst</snm>
                  <fnm>LD</fnm>
               </au>
            </aug>
            <source>Genetics</source>
            <pubdate>2001</pubdate>
            <volume>158</volume>
            <fpage>927</fpage>
            <lpage>931</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1461684</pubid>
                  <pubid idtype="pmpid" link="fulltext">11430355</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>Gene expression intensity shapes evolutionary rates of the proteins encoded by the vertebrate genome.</p>
            </title>
            <aug>
               <au>
                  <snm>Subramanian</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Kumar</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Genetics</source>
            <pubdate>2004</pubdate>
            <volume>168</volume>
            <fpage>373</fpage>
            <lpage>381</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1448110</pubid>
                  <pubid idtype="pmpid" link="fulltext">15454550</pubid>
                  <pubid idtype="doi">10.1534/genetics.104.028944</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>A single determinant dominates the rate of yeast protein evolution.</p>
            </title>
            <aug>
               <au>
                  <snm>Drummond</snm>
                  <fnm>DA</fnm>
               </au>
               <au>
                  <snm>Raval</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Wilke</snm>
                  <fnm>CO</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2006</pubdate>
            <volume>23</volume>
            <fpage>327</fpage>
            <lpage>337</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/molbev/msj038</pubid>
                  <pubid idtype="pmpid" link="fulltext">16237209</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>Why highly expressed proteins evolve slowly.</p>
            </title>
            <aug>
               <au>
                  <snm>Drummond</snm>
                  <fnm>DA</fnm>
               </au>
               <au>
                  <snm>Bloom</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Adami</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Wilke</snm>
                  <fnm>CO</fnm>
               </au>
               <au>
                  <snm>Arnold</snm>
                  <fnm>FH</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2005</pubdate>
            <volume>102</volume>
            <fpage>14338</fpage>
            <lpage>14343</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1242296</pubid>
                  <pubid idtype="pmpid" link="fulltext">16176987</pubid>
                  <pubid idtype="doi">10.1073/pnas.0504070102</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>Assessing the determinants of evolutionary rates in the presence of noise.</p>
            </title>
            <aug>
               <au>
                  <snm>Plotkin</snm>
                  <fnm>JB</fnm>
               </au>
               <au>
                  <snm>Fraser</snm>
                  <fnm>HB</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2007</pubdate>
            <volume>24</volume>
            <fpage>1113</fpage>
            <lpage>1121</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/molbev/msm044</pubid>
                  <pubid idtype="pmpid" link="fulltext">17347158</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>Determinants of substitution rates in mammalian genes: expression pattern affects selection intensity but not mutation rate.</p>
            </title>
            <aug>
               <au>
                  <snm>Duret</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Mouchiroud</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2000</pubdate>
            <volume>17</volume>
            <fpage>68</fpage>
            <lpage>74</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">10666707</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>Genomic regionality in rates of evolution is not explained by clustering of genes of comparable expression profile.</p>
            </title>
            <aug>
               <au>
                  <snm>Lercher</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Chamary</snm>
                  <fnm>JV</fnm>
               </au>
               <au>
                  <snm>Hurst</snm>
                  <fnm>LD</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2004</pubdate>
            <volume>14</volume>
            <fpage>1002</fpage>
            <lpage>1013</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">419778</pubid>
                  <pubid idtype="pmpid" link="fulltext">15173108</pubid>
                  <pubid idtype="doi">10.1101/gr.1597404</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>A gene atlas of the mouse and human protein-encoding transcriptomes.</p>
            </title>
            <aug>
               <au>
                  <snm>Su</snm>
                  <fnm>AI</fnm>
               </au>
               <au>
                  <snm>Wiltshire</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Batalov</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Lapp</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Ching</snm>
                  <fnm>KA</fnm>
               </au>
               <au>
                  <snm>Block</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Soden</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Hayakawa</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Kreiman</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Cooke</snm>
                  <fnm>MP</fnm>
               </au>
               <au>
                  <snm>Walker</snm>
                  <fnm>JR</fnm>
               </au>
               <au>
                  <snm>Hogenesch</snm>
                  <fnm>JB</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2004</pubdate>
            <volume>101</volume>
            <fpage>6062</fpage>
            <lpage>6067</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">395923</pubid>
                  <pubid idtype="pmpid" link="fulltext">15075390</pubid>
                  <pubid idtype="doi">10.1073/pnas.0400782101</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>Selection in the evolution of gene duplications.</p>
            </title>
            <aug>
               <au>
                  <snm>Kondrashov</snm>
                  <fnm>FA</fnm>
               </au>
               <au>
                  <snm>Rogozin</snm>
                  <fnm>IB</fnm>
               </au>
               <au>
                  <snm>Wolf</snm>
                  <fnm>YI</fnm>
               </au>
               <au>
                  <snm>Koonin</snm>
                  <fnm>EV</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2002</pubdate>
            <volume>3</volume>
            <fpage>research0008.1</fpage>
            <lpage>0008.9</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1186/gb-2002-3-2-research0008</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>The evolutionary fate and consequences of duplicate genes.</p>
            </title>
            <aug>
               <au>
                  <snm>Lynch</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Conery</snm>
                  <fnm>JS</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2000</pubdate>
            <volume>290</volume>
            <fpage>1151</fpage>
            <lpage>1155</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.290.5494.1151</pubid>
                  <pubid idtype="pmpid" link="fulltext">11073452</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B27">
            <title>
               <p>Database resources of the National Center for Biotechnology Information.</p>
            </title>
            <aug>
               <au>
                  <snm>Wheeler</snm>
                  <fnm>DL</fnm>
               </au>
               <au>
                  <snm>Barrett</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Benson</snm>
                  <fnm>DA</fnm>
               </au>
               <au>
                  <snm>Bryant</snm>
                  <fnm>SH</fnm>
               </au>
               <au>
                  <snm>Canese</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Chetvernin</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Church</snm>
                  <fnm>DM</fnm>
               </au>
               <au>
                  <snm>DiCuccio</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Edgar</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Federhen</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Geer</snm>
                  <fnm>LY</fnm>
               </au>
               <au>
                  <snm>Kapustin</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Khovayko</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Landsman</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Lipman</snm>
                  <fnm>DJ</fnm>
               </au>
               <au>
                  <snm>Madden</snm>
                  <fnm>TL</fnm>
               </au>
               <au>
                  <snm>Maglott</snm>
                  <fnm>DR</fnm>
               </au>
               <au>
                  <snm>Ostell</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Pruitt</snm>
                  <fnm>KD</fnm>
               </au>
               <au>
                  <snm>Schuler</snm>
                  <fnm>GD</fnm>
               </au>
               <au>
                  <snm>Sequeira</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Sherry</snm>
                  <fnm>ST</fnm>
               </au>
               <au>
                  <snm>Sirotkin</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Souvorov</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Starchenko</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Tatusov</snm>
                  <fnm>RL</fnm>
               </au>
               <au>
                  <snm>Tatusova</snm>
                  <fnm>TA</fnm>
               </au>
               <au>
                  <snm>Wagner</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Yaschenko</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2007</pubdate>
            <volume>35</volume>
            <issue>Database issue</issue>
            <fpage>D5</fpage>
            <lpage>D12</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1781113</pubid>
                  <pubid idtype="pmpid" link="fulltext">17170002</pubid>
                  <pubid idtype="doi">10.1093/nar/gkl1031</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B28">
            <title>
               <p>HomoloGene</p>
            </title>
            <url>http://www.ncbi.nlm.nih.gov/sites/entrez?db=homologene</url>
         </bibl>
         <bibl id="B29">
            <title>
               <p>Inverse relationship between evolutionary rate and age of mammalian genes.</p>
            </title>
            <aug>
               <au>
                  <snm>Alb&#224;</snm>
                  <fnm>MM</fnm>
               </au>
               <au>
                  <snm>Castresana</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2005</pubdate>
            <volume>22</volume>
            <fpage>598</fpage>
            <lpage>606</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/molbev/msi045</pubid>
                  <pubid idtype="pmpid" link="fulltext">15537804</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B30">
            <title>
               <p>On homology searches by protein Blast and the characterization of the age of genes.</p>
            </title>
            <aug>
               <au>
                  <snm>Alb&#224;</snm>
                  <fnm>MM</fnm>
               </au>
               <au>
                  <snm>Castresana</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>BMC Evol Biol</source>
            <pubdate>2007</pubdate>
            <volume>7</volume>
            <fpage>53</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1855329</pubid>
                  <pubid idtype="pmpid" link="fulltext">17408474</pubid>
                  <pubid idtype="doi">10.1186/1471-2148-7-53</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B31">
            <title>
               <p>The "inverse relationship between evolutionary rate and age of mammalian genes" is an artifact of increased genetic distance with rate of evolution and time of divergence.</p>
            </title>
            <aug>
               <au>
                  <snm>Elhaik</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Sabath</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Graur</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2006</pubdate>
            <volume>23</volume>
            <fpage>1</fpage>
            <lpage>3</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/molbev/msj006</pubid>
                  <pubid idtype="pmpid" link="fulltext">16151190</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B32">
            <title>
               <p>Gene ontology: tool for the unification of biology. The Gene Ontology Consortium.</p>
            </title>
            <aug>
               <au>
                  <snm>Ashburner</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Ball</snm>
                  <fnm>CA</fnm>
               </au>
               <au>
                  <snm>Blake</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Botstein</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Butler</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Cherry</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Davis</snm>
                  <fnm>AP</fnm>
               </au>
               <au>
                  <snm>Dolinski</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Dwight</snm>
                  <fnm>SS</fnm>
               </au>
               <au>
                  <snm>Eppig</snm>
                  <fnm>JT</fnm>
               </au>
               <au>
                  <snm>Harris</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Hill</snm>
                  <fnm>DP</fnm>
               </au>
               <au>
                  <snm>Issel-Tarver</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Kasarskis</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Lewis</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Matese</snm>
                  <fnm>JC</fnm>
               </au>
               <au>
                  <snm>Richardson</snm>
                  <fnm>JE</fnm>
               </au>
               <au>
                  <snm>Ringwald</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Rubin</snm>
                  <fnm>GM</fnm>
               </au>
               <au>
                  <snm>Sherlock</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Nat Genet</source>
            <pubdate>2000</pubdate>
            <volume>25</volume>
            <fpage>25</fpage>
            <lpage>29</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/75556</pubid>
                  <pubid idtype="pmpid" link="fulltext">10802651</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B33">
            <title>
               <p>Evolution of proteins and gene expression levels are coupled in <it>Drosophila </it>and are independently associated with mRNA abundance, protein length, and number of protein-protein interactions.</p>
            </title>
            <aug>
               <au>
                  <snm>Lemos</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Bettencourt</snm>
                  <fnm>BR</fnm>
               </au>
               <au>
                  <snm>Meiklejohn</snm>
                  <fnm>CD</fnm>
               </au>
               <au>
                  <snm>Hartl</snm>
                  <fnm>DL</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2005</pubdate>
            <volume>22</volume>
            <fpage>1345</fpage>
            <lpage>1354</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/molbev/msi122</pubid>
                  <pubid idtype="pmpid" link="fulltext">15746013</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B34">
            <title>
               <p>Evolution of protein-coding genes in <it>Drosophila</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Larracuente</snm>
                  <fnm>AM</fnm>
               </au>
               <au>
                  <snm>Sackton</snm>
                  <fnm>TB</fnm>
               </au>
               <au>
                  <snm>Greenberg</snm>
                  <fnm>AJ</fnm>
               </au>
               <au>
                  <snm>Wong</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Singh</snm>
                  <fnm>ND</fnm>
               </au>
               <au>
                  <snm>Sturgill</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Oliver</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Clark</snm>
                  <fnm>AG</fnm>
               </au>
            </aug>
            <source>Trends Genet</source>
            <pubdate>2008</pubdate>
            <volume>24</volume>
            <fpage>114</fpage>
            <lpage>123</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.tig.2007.12.001</pubid>
                  <pubid idtype="pmpid" link="fulltext">18249460</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B35">
            <title>
               <p>Protein-length distributions for the three domains of life.</p>
            </title>
            <aug>
               <au>
                  <snm>Zhang</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Trends Genet</source>
            <pubdate>2000</pubdate>
            <volume>16</volume>
            <fpage>107</fpage>
            <lpage>109</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0168-9525(99)01922-8</pubid>
                  <pubid idtype="pmpid" link="fulltext">10689349</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B36">
            <title>
               <p>Impacts of gene essentiality, expression pattern, and gene compactness on the evolutionary rate of mammalian proteins.</p>
            </title>
            <aug>
               <au>
                  <snm>Liao</snm>
                  <fnm>BY</fnm>
               </au>
               <au>
                  <snm>Scott</snm>
                  <fnm>NM</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2006</pubdate>
            <volume>23</volume>
            <fpage>2072</fpage>
            <lpage>2080</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/molbev/msl076</pubid>
                  <pubid idtype="pmpid" link="fulltext">16887903</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B37">
            <title>
               <p>The relationship of protein conservation and sequence length.</p>
            </title>
            <aug>
               <au>
                  <snm>Lipman</snm>
                  <fnm>DJ</fnm>
               </au>
               <au>
                  <snm>Souvorov</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Koonin</snm>
                  <fnm>EV</fnm>
               </au>
               <au>
                  <snm>Panchenko</snm>
                  <fnm>AR</fnm>
               </au>
               <au>
                  <snm>Tatusova</snm>
                  <fnm>TA</fnm>
               </au>
            </aug>
            <source>BMC Evol Biol</source>
            <pubdate>2002</pubdate>
            <volume>2</volume>
            <fpage>20</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">137605</pubid>
                  <pubid idtype="pmpid" link="fulltext">12410938</pubid>
                  <pubid idtype="doi">10.1186/1471-2148-2-20</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B38">
            <title>
               <p>Characterization of disease-associated single amino acid polymorphisms in terms of sequence and structure properties.</p>
            </title>
            <aug>
               <au>
                  <snm>Ferrer-Costa</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Orozco</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>de la Cruz</snm>
                  <fnm>X</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2002</pubdate>
            <volume>315</volume>
            <fpage>771</fpage>
            <lpage>786</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.2001.5255</pubid>
                  <pubid idtype="pmpid" link="fulltext">11812146</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B39">
            <title>
               <p>SNPs, protein structure, and disease.</p>
            </title>
            <aug>
               <au>
                  <snm>Wang</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Moult</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Hum Mutat</source>
            <pubdate>2001</pubdate>
            <volume>17</volume>
            <fpage>263</fpage>
            <lpage>270</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1002/humu.22</pubid>
                  <pubid idtype="pmpid" link="fulltext">11295823</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B40">
            <title>
               <p>How to use protein 1D structure predicted by PROFphd.</p>
            </title>
            <aug>
               <au>
                  <snm>Rost</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>The Proteomics Protocols Handbook</source>
            <publisher>Totowa NJ: Humana</publisher>
            <editor>Walker JE</editor>
            <pubdate>2005</pubdate>
            <fpage>875</fpage>
            <lpage>901</lpage>
         </bibl>
         <bibl id="B41">
            <title>
               <p>Protein secondary structure prediction based on position-specific scoring matrices.</p>
            </title>
            <aug>
               <au>
                  <snm>Jones</snm>
                  <fnm>DT</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1999</pubdate>
            <volume>292</volume>
            <fpage>195</fpage>
            <lpage>202</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.1999.3091</pubid>
                  <pubid idtype="pmpid" link="fulltext">10493868</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B42">
            <title>
               <p>Intrinsically disordered protein.</p>
            </title>
            <aug>
               <au>
                  <snm>Dunker</snm>
                  <fnm>AK</fnm>
               </au>
               <au>
                  <snm>Lawson</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Brown</snm>
                  <fnm>CJ</fnm>
               </au>
               <au>
                  <snm>Williams</snm>
                  <fnm>RM</fnm>
               </au>
               <au>
                  <snm>Romero</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Oh</snm>
                  <fnm>JS</fnm>
               </au>
               <au>
                  <snm>Oldfield</snm>
                  <fnm>CJ</fnm>
               </au>
               <au>
                  <snm>Campen</snm>
                  <fnm>AM</fnm>
               </au>
               <au>
                  <snm>Ratliff</snm>
                  <fnm>CM</fnm>
               </au>
               <au>
                  <snm>Hipps</snm>
                  <fnm>KW</fnm>
               </au>
               <au>
                  <snm>Ausio</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Nissen</snm>
                  <fnm>MS</fnm>
               </au>
               <au>
                  <snm>Reeves</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Kang</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Kissinger</snm>
                  <fnm>CR</fnm>
               </au>
               <au>
                  <snm>Bailey</snm>
                  <fnm>RW</fnm>
               </au>
               <au>
                  <snm>Griswold</snm>
                  <fnm>MD</fnm>
               </au>
               <au>
                  <snm>Chiu</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Garner</snm>
                  <fnm>EC</fnm>
               </au>
               <au>
                  <snm>Obradovic</snm>
                  <fnm>Z</fnm>
               </au>
            </aug>
            <source>J Mol Graph Model</source>
            <pubdate>2001</pubdate>
            <volume>19</volume>
            <fpage>26</fpage>
            <lpage>59</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S1093-3263(00)00138-8</pubid>
                  <pubid idtype="pmpid" link="fulltext">11381529</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B43">
            <title>
               <p>Prediction and functional analysis of native disorder in proteins from the three kingdoms of life.</p>
            </title>
            <aug>
               <au>
                  <snm>Ward</snm>
                  <fnm>JJ</fnm>
               </au>
               <au>
                  <snm>Sodhi</snm>
                  <fnm>JS</fnm>
               </au>
               <au>
                  <snm>McGuffin</snm>
                  <fnm>LJ</fnm>
               </au>
               <au>
                  <snm>Buxton</snm>
                  <fnm>BF</fnm>
               </au>
               <au>
                  <snm>Jones</snm>
                  <fnm>DT</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2004</pubdate>
            <volume>337</volume>
            <fpage>635</fpage>
            <lpage>645</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.jmb.2004.02.002</pubid>
                  <pubid idtype="pmpid" link="fulltext">15019783</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B44">
            <title>
               <p>Loopy proteins appear conserved in evolution.</p>
            </title>
            <aug>
               <au>
                  <snm>Liu</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Tan</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Rost</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2002</pubdate>
            <volume>322</volume>
            <fpage>53</fpage>
            <lpage>64</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0022-2836(02)00736-2</pubid>
                  <pubid idtype="pmpid" link="fulltext">12215414</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B45">
            <title>
               <p>Flexible nets. The roles of intrinsic disorder in protein interaction networks.</p>
            </title>
            <aug>
               <au>
                  <snm>Dunker</snm>
                  <fnm>AK</fnm>
               </au>
               <au>
                  <snm>Cortese</snm>
                  <fnm>MS</fnm>
               </au>
               <au>
                  <snm>Romero</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Iakoucheva</snm>
                  <fnm>LM</fnm>
               </au>
               <au>
                  <snm>Uversky</snm>
                  <fnm>VN</fnm>
               </au>
            </aug>
            <source>FEBS J</source>
            <pubdate>2005</pubdate>
            <volume>272</volume>
            <fpage>5129</fpage>
            <lpage>5148</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1111/j.1742-4658.2005.04948.x</pubid>
                  <pubid idtype="pmpid" link="fulltext">16218947</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B46">
            <title>
               <p>Analysis of compositionally biased regions in sequence databases.</p>
            </title>
            <aug>
               <au>
                  <snm>Wootton</snm>
                  <fnm>JC</fnm>
               </au>
               <au>
                  <snm>Federhen</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Methods Enzymol</source>
            <pubdate>1996</pubdate>
            <volume>266</volume>
            <fpage>554</fpage>
            <lpage>571</lpage>
            <xrefbib>
               <pubid idtype="pmpid">8743706</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B47">
            <title>
               <p>Human non-synonymous SNPs: server and survey.</p>
            </title>
            <aug>
               <au>
                  <snm>Ramensky</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Bork</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Sunyaev</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2002</pubdate>
            <volume>30</volume>
            <fpage>3894</fpage>
            <lpage>3900</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">137415</pubid>
                  <pubid idtype="pmpid" link="fulltext">12202775</pubid>
                  <pubid idtype="doi">10.1093/nar/gkf493</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B48">
            <title>
               <p>Protein evolution is faster outside the cell.</p>
            </title>
            <aug>
               <au>
                  <snm>Julenius</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Pedersen</snm>
                  <fnm>AG</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2006</pubdate>
            <volume>23</volume>
            <fpage>2039</fpage>
            <lpage>2048</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/molbev/msl081</pubid>
                  <pubid idtype="pmpid" link="fulltext">16891379</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B49">
            <title>
               <p>Improved prediction of signal peptides: SignalP 3.0.</p>
            </title>
            <aug>
               <au>
                  <snm>Bendtsen</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Nielsen</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>von Heijne</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Brunak</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2004</pubdate>
            <volume>340</volume>
            <fpage>783</fpage>
            <lpage>795</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.jmb.2004.05.028</pubid>
                  <pubid idtype="pmpid" link="fulltext">15223320</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B50">
            <title>
               <p>Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes.</p>
            </title>
            <aug>
               <au>
                  <snm>Krogh</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Larsson</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>von Heijne</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Sonnhammer</snm>
                  <fnm>EL</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2001</pubdate>
            <volume>305</volume>
            <fpage>567</fpage>
            <lpage>580</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.2000.4315</pubid>
                  <pubid idtype="pmpid" link="fulltext">11152613</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B51">
            <title>
               <p>Mimicking cellular sorting improves prediction of subcellular localization.</p>
            </title>
            <aug>
               <au>
                  <snm>Nair</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Rost</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2005</pubdate>
            <volume>348</volume>
            <fpage>85</fpage>
            <lpage>100</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.jmb.2005.02.025</pubid>
                  <pubid idtype="pmpid" link="fulltext">15808855</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B52">
            <title>
               <p>The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology.</p>
            </title>
            <aug>
               <au>
                  <snm>Camon</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Magrane</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Barrell</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Lee</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Dimmer</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Maslen</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Binns</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Harte</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Lopez</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Apweiler</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2004</pubdate>
            <volume>32</volume>
            <issue>Database issue</issue>
            <fpage>D262</fpage>
            <lpage>D266</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">308756</pubid>
                  <pubid idtype="pmpid" link="fulltext">14681408</pubid>
                  <pubid idtype="doi">10.1093/nar/gkh021</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B53">
            <title>
               <p>Pfam: clans, web tools and services.</p>
            </title>
            <aug>
               <au>
                  <snm>Finn</snm>
                  <fnm>RD</fnm>
               </au>
               <au>
                  <snm>Mistry</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Schuster-Bockler</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Griffiths-Jones</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Hollich</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Lassmann</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Moxon</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Marshall</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Khanna</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Durbin</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Eddy</snm>
                  <fnm>SR</fnm>
               </au>
               <au>
                  <snm>Sonnhammer</snm>
                  <fnm>EL</fnm>
               </au>
               <au>
                  <snm>Bateman</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2006</pubdate>
            <volume>34</volume>
            <issue>Database issue</issue>
            <fpage>D247</fpage>
            <lpage>D251</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1347511</pubid>
                  <pubid idtype="pmpid" link="fulltext">16381856</pubid>
                  <pubid idtype="doi">10.1093/nar/gkj149</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B54">
            <title>
               <p>Human disease genes: patterns and predictions.</p>
            </title>
            <aug>
               <au>
                  <snm>Smith</snm>
                  <fnm>NG</fnm>
               </au>
               <au>
                  <snm>Eyre-Walker</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Gene</source>
            <pubdate>2003</pubdate>
            <volume>318</volume>
            <fpage>169</fpage>
            <lpage>175</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0378-1119(03)00772-8</pubid>
                  <pubid idtype="pmpid" link="fulltext">14585509</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B55">
            <title>
               <p>Evolutionary conservation and selection of human disease gene orthologs in the rat and mouse genomes.</p>
            </title>
            <aug>
               <au>
                  <snm>Huang</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Winter</snm>
                  <fnm>EE</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Weinstock</snm>
                  <fnm>KG</fnm>
               </au>
               <au>
                  <snm>Xing</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Goodstadt</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Stenson</snm>
                  <fnm>PD</fnm>
               </au>
               <au>
                  <snm>Cooper</snm>
                  <fnm>DN</fnm>
               </au>
               <au>
                  <snm>Smith</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Alba</snm>
                  <fnm>MM</fnm>
               </au>
               <au>
                  <snm>Ponting</snm>
                  <fnm>CP</fnm>
               </au>
               <au>
                  <snm>Fechtel</snm>
                  <fnm>K</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2004</pubdate>
            <volume>5</volume>
            <fpage>R47</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">463309</pubid>
                  <pubid idtype="pmpid" link="fulltext">15239832</pubid>
                  <pubid idtype="doi">10.1186/gb-2004-5-7-r47</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B56">
            <title>
               <p>Bioinformatical assay of human gene morbidity.</p>
            </title>
            <aug>
               <au>
                  <snm>Kondrashov</snm>
                  <fnm>FA</fnm>
               </au>
               <au>
                  <snm>Ogurtsov</snm>
                  <fnm>AY</fnm>
               </au>
               <au>
                  <snm>Kondrashov</snm>
                  <fnm>AS</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2004</pubdate>
            <volume>32</volume>
            <fpage>1731</fpage>
            <lpage>1737</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">390328</pubid>
                  <pubid idtype="pmpid" link="fulltext">15020709</pubid>
                  <pubid idtype="doi">10.1093/nar/gkh330</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B57">
            <title>
               <p>A census of human cancer genes.</p>
            </title>
            <aug>
               <au>
                  <snm>Futreal</snm>
                  <fnm>PA</fnm>
               </au>
               <au>
                  <snm>Coin</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Marshall</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Down</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Hubbard</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Wooster</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Rahman</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Stratton</snm>
                  <fnm>MR</fnm>
               </au>
            </aug>
            <source>Nat Rev Cancer</source>
            <pubdate>2004</pubdate>
            <volume>4</volume>
            <fpage>177</fpage>
            <lpage>183</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nrc1299</pubid>
                  <pubid idtype="pmpid" link="fulltext">14993899</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B58">
            <title>
               <p>COSMIC 2005.</p>
            </title>
            <aug>
               <au>
                  <snm>Forbes</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Clements</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Dawson</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Bamford</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Webb</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Dogan</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Flanagan</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Teague</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Wooster</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Futreal</snm>
                  <fnm>PA</fnm>
               </au>
               <au>
                  <snm>Stratton</snm>
                  <fnm>MR</fnm>
               </au>
            </aug>
            <source>Br J Cancer</source>
            <pubdate>2006</pubdate>
            <volume>94</volume>
            <fpage>318</fpage>
            <lpage>322</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/sj.bjc.6602928</pubid>
                  <pubid idtype="pmpid" link="fulltext">16421597</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B59">
            <title>
               <p>Further understanding human disease genes by comparing with housekeeping genes and other genes.</p>
            </title>
            <aug>
               <au>
                  <snm>Tu</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Xu</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Zhou</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Sun</snm>
                  <fnm>F</fnm>
               </au>
            </aug>
            <source>BMC Genomics</source>
            <pubdate>2006</pubdate>
            <volume>7</volume>
            <fpage>31</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1397819</pubid>
                  <pubid idtype="pmpid" link="fulltext">16504025</pubid>
                  <pubid idtype="doi">10.1186/1471-2164-7-31</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B60">
            <title>
               <p>Evolutionary rate in the protein interaction network.</p>
            </title>
            <aug>
               <au>
                  <snm>Fraser</snm>
                  <fnm>HB</fnm>
               </au>
               <au>
                  <snm>Hirsh</snm>
                  <fnm>AE</fnm>
               </au>
               <au>
                  <snm>Steinmetz</snm>
                  <fnm>LM</fnm>
               </au>
               <au>
                  <snm>Scharfe</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Feldman</snm>
                  <fnm>MW</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2002</pubdate>
            <volume>296</volume>
            <fpage>750</fpage>
            <lpage>752</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1068696</pubid>
                  <pubid idtype="pmpid" link="fulltext">11976460</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B61">
            <title>
               <p>No simple dependence between protein evolution rate and the number of protein-protein interactions: only the most prolific interactors tend to evolve slowly.</p>
            </title>
            <aug>
               <au>
                  <snm>Jordan</snm>
                  <fnm>IK</fnm>
               </au>
               <au>
                  <snm>Wolf</snm>
                  <fnm>YI</fnm>
               </au>
               <au>
                  <snm>Koonin</snm>
                  <fnm>EV</fnm>
               </au>
            </aug>
            <source>BMC Evol Biol</source>
            <pubdate>2003</pubdate>
            <volume>3</volume>
            <fpage>1</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">140311</pubid>
                  <pubid idtype="pmpid" link="fulltext">12515583</pubid>
                  <pubid idtype="doi">10.1186/1471-2148-3-1</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B62">
            <title>
               <p>Apparent dependence of protein evolutionary rate on number of interactions is linked to biases in protein-protein interactions data sets.</p>
            </title>
            <aug>
               <au>
                  <snm>Bloom</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Adami</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>BMC Evol Biol</source>
            <pubdate>2003</pubdate>
            <volume>3</volume>
            <fpage>21</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">270031</pubid>
                  <pubid idtype="pmpid" link="fulltext">14525624</pubid>
                  <pubid idtype="doi">10.1186/1471-2148-3-21</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B63">
            <title>
               <p>Evolutionary and physiological importance of hub proteins.</p>
            </title>
            <aug>
               <au>
                  <snm>Batada</snm>
                  <fnm>NN</fnm>
               </au>
               <au>
                  <snm>Hurst</snm>
                  <fnm>LD</fnm>
               </au>
               <au>
                  <snm>Tyers</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>PLoS Comput Biol</source>
            <pubdate>2006</pubdate>
            <volume>2</volume>
            <fpage>e88</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1500817</pubid>
                  <pubid idtype="pmpid" link="fulltext">16839197</pubid>
                  <pubid idtype="doi">10.1371/journal.pcbi.0020088</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B64">
            <title>
               <p>IntAct - open source resource for molecular interaction data.</p>
            </title>
            <aug>
               <au>
                  <snm>Kerrien</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Alam-Faruque</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Aranda</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Bancarz</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Bridge</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Derow</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Dimmer</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Feuermann</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Friedrichsen</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Huntley</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Kohler</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Khadake</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Leroy</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Liban</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Lieftink</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Montecchi-Palazzi</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Orchard</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Risse</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Robbe</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Roechert</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Thorneycroft</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Apweiler</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Hermjakob</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2007</pubdate>
            <volume>35</volume>
            <issue>Database issue</issue>
            <fpage>D561</fpage>
            <lpage>D565</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1751531</pubid>
                  <pubid idtype="pmpid" link="fulltext">17145710</pubid>
                  <pubid idtype="doi">10.1093/nar/gkl958</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B65">
            <title>
               <p>Splicing and the evolution of proteins in mammals.</p>
            </title>
            <aug>
               <au>
                  <snm>Parmley</snm>
                  <fnm>JL</fnm>
               </au>
               <au>
                  <snm>Urrutia</snm>
                  <fnm>AO</fnm>
               </au>
               <au>
                  <snm>Potrzebowski</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Kaessmann</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Hurst</snm>
                  <fnm>LD</fnm>
               </au>
            </aug>
            <source>PLoS Biol</source>
            <pubdate>2007</pubdate>
            <volume>5</volume>
            <fpage>e14</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1790955</pubid>
                  <pubid idtype="pmpid" link="fulltext">17298171</pubid>
                  <pubid idtype="doi">10.1371/journal.pbio.0050014</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B66">
            <title>
               <p>Evolutionary systems biology: links between gene evolution and function.</p>
            </title>
            <aug>
               <au>
                  <snm>Koonin</snm>
                  <fnm>EV</fnm>
               </au>
               <au>
                  <snm>Wolf</snm>
                  <fnm>YI</fnm>
               </au>
            </aug>
            <source>Curr Opin Biotechnol</source>
            <pubdate>2006</pubdate>
            <volume>17</volume>
            <fpage>481</fpage>
            <lpage>487</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.copbio.2006.08.003</pubid>
                  <pubid idtype="pmpid" link="fulltext">16962765</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B67">
            <title>
               <p>Gene Expression Atlas</p>
            </title>
            <url>http://wombat.gnf.org/index.html</url>
         </bibl>
         <bibl id="B68">
            <title>
               <p>Genome-wide midrange transcription profiles reveal expression level relationships in human tissue specification.</p>
            </title>
            <aug>
               <au>
                  <snm>Yanai</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Benjamin</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Shmoish</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Chalifa-Caspi</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Shklar</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Ophir</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Bar-Even</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Horn-Saban</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Safran</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Domany</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Lancet</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Shmueli</snm>
                  <fnm>O</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <fpage>650</fpage>
            <lpage>659</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bti042</pubid>
                  <pubid idtype="pmpid" link="fulltext">15388519</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B69">
            <title>
               <p>GOA slim</p>
            </title>
            <url>ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/goslim/goaslim.map</url>
         </bibl>
         <bibl id="B70">
            <title>
               <p>IntAct</p>
            </title>
            <url>ftp://ftp.ebi.ac.uk/pub/databases/intact/</url>
         </bibl>
         <bibl id="B71">
            <title>
               <p>Cancer Gene Census</p>
            </title>
            <url>http://www.sanger.ac.uk/genetics/CGP/Census/</url>
         </bibl>
         <bibl id="B72">
            <title>
               <p>Catalogue Of Somatic Mutations In Cancer</p>
            </title>
            <url>http://www.sanger.ac.uk/genetics/CGP/cosmic/</url>
         </bibl>
         <bibl id="B73">
            <title>
               <p>OMIM</p>
            </title>
            <url>http://www.ncbi.nlm.nih.gov/sites/entrez?db=OMIM</url>
         </bibl>
         <bibl id="B74">
            <title>
               <p>Molecular evolution of mRNA: a method for estimating evolutionary rates of synonymous and amino acid substitutions from homologous nucleotide sequences and its application.</p>
            </title>
            <aug>
               <au>
                  <snm>Miyata</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Yasunaga</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>J Mol Evol</source>
            <pubdate>1980</pubdate>
            <volume>16</volume>
            <fpage>23</fpage>
            <lpage>36</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1007/BF01732067</pubid>
                  <pubid idtype="pmpid">6449605</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
      </refgrp>
   </bm>
</art>

