<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>gb-2003-4-9-r56</ui>
   <ji>GBJ</ji>
   <fm>
      <dochead>Research</dochead>
      <bibl>
         <title>
            <p>Different evolutionary patterns between young duplicate genes in the human genome</p>
         </title>
         <aug>
            <au id="A1">
               <snm>Zhang</snm>
               <fnm>Peng</fnm>
               <insr iid="I1"/>
            </au>
            <au id="A2">
               <snm>Gu</snm>
               <fnm>Zhenglong</fnm>
               <insr iid="I1"/>
            </au>
            <au id="A3" ca="yes">
               <snm>Li</snm>
               <fnm>Wen-Hsiung</fnm>
               <insr iid="I1"/>
               <email>whli@uchicago.edu</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Department of Ecology and Evolution, University of Chicago, East 57th Street, Chicago, IL 60637, USA</p>
            </ins>
         </insg>
         <source>Genome Biology</source>
         <issn>1465-6906</issn>
         <pubdate>2003</pubdate>
         <volume>4</volume>
         <issue>9</issue>
         <fpage>R56</fpage>
         <url>http://genomebiology.com/2003/4/9/R56</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="doi">10.1186/gb-2003-4-9-r56</pubid>
               <pubid idtype="pmpid">12952535</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>15</day>
               <month>5</month>
               <year>2003</year>
            </date>
         </rec>
         <revrec>
            <date>
               <day>24</day>
               <month>6</month>
               <year>2003</year>
            </date>
         </revrec>
         <acc>
            <date>
               <day>24</day>
               <month>7</month>
               <year>2003</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>1</day>
               <month>9</month>
               <year>2003</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2003</year>
         <collab>Zhang et al.; licensee BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL.</collab>
      </cpyrt>
      <shorttitle>
         <p>Different evolutionary patterns between young duplicate genes in the human genome</p>
      </shorttitle>
      <shortabs>
         <p>Duplicate genes tend to evolve in different patterns following the duplication event. One copy evolves faster than the other and accumulates amino-acid substitutions evenly across the sequence, whereas the other copy evolves more slowly and accumulates amino-acid substitutions unevenly across the sequence.</p>
      </shortabs>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>Following gene duplication, two duplicate genes may experience relaxed functional constraints or acquire different mutations, and may also diverge in function. Whether the two copies will evolve in different patterns remains unclear, however, because previous studies have reached conflicting conclusions. In order to resolve this issue, by providing a general picture, we studied 250 independent pairs of young duplicate genes from the whole human genome.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>We showed that nearly 60% of the young duplicate gene pairs have evolved at the amino-acid level at significantly different rates from each other. More than 25% of these gene pairs also showed significantly different ratios of nonsynonymous to synonymous rates (K<sub>a</sub>/K<sub>s </sub>ratios). Moreover, duplicate pairs with different rates of amino-acid substitution also tend to differ in the K<sub>a</sub>/K<sub>s </sub>ratio, with the fast-evolving copy tending to have a slightly higher K<sub>s </sub>than the slow-evolving one. Lastly, a substantial portion of fast-evolving copies have accumulated amino-acid substitutions evenly across the protein sequences, whereas most of the slow-evolving copies exhibit uneven substitution patterns.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusions</p>
               </st>
               <p>Our results suggest that duplicate genes tend to evolve in different patterns following the duplication event. One copy evolves faster than the other and accumulates amino-acid substitutions evenly across the sequence, whereas the other copy evolves more slowly and accumulates amino-acid substitutions unevenly across the sequence. Such different evolutionary patterns may be largely due to different functional constraints on the two copies.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <meta>
      <classifications>
         <classification type="BMC" subtype="man_spc_id" id="30010010">Genome studies</classification>
         <classification type="BMC" subtype="man_spc_id" id="30010008">Evolution</classification>
         <classification type="BMC" subtype="man_spc_id" id="30010009">Genetics</classification>
      </classifications>
   </meta>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>Since Ohno's work <abbrgrp><abbr bid="B1">1</abbr></abbrgrp> gene duplication is widely believed to be the major source of genetic novelties. However, how the two duplicate genes evolve after the duplication event and what the major factors are that determine the fate of duplicate genes remain poorly understood and are currently under intense research.</p>
         <p>Lynch and Conery <abbrgrp><abbr bid="B2">2</abbr></abbrgrp> conducted a study of several eukaryotic genomes and concluded that duplicate genes often experience relaxed functional constraints and accumulate mutations at an accelerated rate. However, since their study used within-genome data without an outgroup, it could not reveal differences in evolutionary patterns between two duplicates. Hughes and Hughes <abbrgrp><abbr bid="B3">3</abbr></abbrgrp> used human genes as outgroups to examine 17 pairs of duplicated frog genes, most of which have been duplicated recently. They found that the two duplicate copies of a gene had evolved at approximately the same rate. However, human genes are only distantly related to frog genes and may not be suitable outgroups for young duplicate frog genes. In contrast, Robinson-Rechavi and Laudet <abbrgrp><abbr bid="B4">4</abbr></abbrgrp> and Van de Peer <it>et al</it>. <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>, who used human or mammalian genes as outgroups to zebrafish genes, found evidence of unequal evolutionary rates between duplicate genes in zebrafish, although the percentage of pairs with such a pattern differed between the two studies. Furthermore, in a study of young duplicate genes in humans and rodents, Kondrashov <it>et al</it>. <abbrgrp><abbr bid="B6">6</abbr></abbrgrp> found only two out of 49 duplicate genes showed different rates of evolution. Thus, the issue remains to be resolved.</p>
         <p>We are interested in the questions of whether duplicated genes in general undergo different evolutionary patterns and what the possible causes for this could be. To address these questions, young human duplicate genes, which are defined as duplicate genes with K<sub>s </sub>&lt; 0.3, are excellent materials for several reasons. Firstly, in young duplicates the K<sub>s </sub>(the number of synonymous substitutions per synonymous site) and K<sub>a </sub>(the number of nonsynonymous substitutions per nonsynonymous site) values are small and can be estimated more accurately than in older duplicates. Secondly, human genes usually have no strong codon usage bias, so the K<sub>s </sub>values are not strongly distorted by this effect. Thirdly, the mouse genome provides a suitable outgroup. With the use of outgroup sequences, the maximum likelihood method <abbrgrp><abbr bid="B7">7</abbr></abbrgrp> can be applied, which allows the comparison of various rate models <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>. Different models for amino-acid sequence evolution, with outgroups incorporated, can be compared to judge whether the amino-acid substitution rates are the same in two duplicate copies (Figure <figr fid="F1">1</figr>). Similarly, the models for coding sequences can also be compared to judge if the K<sub>a</sub>/K<sub>s </sub>ratios are the same in the two copies after a gene duplication. Traditionally, K<sub>a</sub>/K<sub>s </sub>is taken as an index for the strength of functional constraints. Different K<sub>a</sub>/K<sub>s </sub>ratios usually suggest different functional constraints on two duplicate copies.</p>
         <fig id="F1">
            <title>
               <p>Figure 1</p>
            </title>
            <caption>
               <p>Two models of protein sequence evolution</p>
            </caption>
            <text>
               <p>Two models of protein sequence evolution. By comparing the likelihood values of these two models, one can judge whether r1 = r2 (r1 and r2 are the amino-acid substitution rates on branches H1 and H2, respectively). <b>(a) </b>The model assumes r1 = r2. <b>(b) </b>The model allows r1 and r2 to be different. H1 and H2: two human duplicate genes. Outgroup: the mouse ortholog.</p>
            </text>
            <graphic file="gb-2003-4-9-r56-1"/>
         </fig>
         <p>Another way to examine if two duplicate copies have experienced different functional constraints is to see whether the distribution of substitutions along their sequences are the same. If a duplicate copy is free of functional constraints, then amino-acid substitutions should occur evenly across the sequence. On the other hand, if a duplicate copy is still under considerable functional constraints, then functionally important regions should be subject to stronger constraints than functionally less important regions, and will accumulate fewer substitutions, thus yielding an uneven substitution pattern. Tang and Lewontin <abbrgrp><abbr bid="B9">9</abbr></abbrgrp> described a statistical method to give a quantitative measure for distinguishing between even and uneven substitution patterns. The rationale of this method is that if substitutions occur evenly across a sequence, a cluster of short spaces (lengths between two consecutive substitutions) should not be extremely long. To test the significance, the longest stretch where every space is short is compared to a simulated distribution generated under the hypothesis of even substitution pattern. We combined Tang and Lewontin's <abbrgrp><abbr bid="B9">9</abbr></abbrgrp> method and the maximum likelihood method of locating substitutions <abbrgrp><abbr bid="B7">7</abbr></abbrgrp> to test the evenness of the substitution patterns of two duplicate genes.</p>
      </sec>
      <sec>
         <st>
            <p>Results and discussion</p>
         </st>
         <sec>
            <st>
               <p>Amino-acid substitution rates</p>
            </st>
            <p>We examined first whether the amino-acid substitution rates in two duplicate copies are the same. We found that among the 250 pairs of young human duplicates studied, 145 pairs showed significant evidence (at the 5% level) that one copy had evolved faster than the other at the amino-acid level. Among them, 130 pairs had significantly different rates at the 1% significance level.</p>
            <p>Hughes and Hughes <abbrgrp><abbr bid="B3">3</abbr></abbrgrp> found similar evolutionary rates in the 17 frog duplicates they studied, probably because the human outgroup they used was too distant to make the statistical test powerful. Using fairly closely related outgroups, our results suggest that the majority of young human duplicates evolve at different rates. Our results are consistent with those of Van de Peer <it>et al</it>. <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>, but the number of gene pairs with significantly unequal rates is much higher than Robinson-Rechavi and Laudet <abbrgrp><abbr bid="B4">4</abbr></abbrgrp> (four out of 19) and Kondrashov <it>et al</it>. <abbrgrp><abbr bid="B6">6</abbr></abbrgrp> (two out of 49) found.</p>
         </sec>
         <sec>
            <st>
               <p>K<sub>a</sub>/K<sub>s </sub>ratio</p>
            </st>
            <p>To discover if the functional constraints were the same on two duplicate genes, we examined the K<sub>a</sub>/K<sub>s </sub>ratio on each branch leading to the two copies. Among the 250 pairs, 65 pairs showed significantly different K<sub>a</sub>/K<sub>s </sub>ratios at the 5% level and 31 pairs showed a significant difference at the 1% level. As mentioned earlier, the K<sub>a</sub>/K<sub>s </sub>ratio is an important index of functional constraints. The smaller the K<sub>a</sub>/K<sub>s </sub>ratio is, the stronger the functional constraints are. Our result suggests that after gene duplication, a substantial proportion (65/250 = 26%) of the duplicate pairs have experienced different functional constraints.</p>
            <p>Among the 65 pairs that have different K<sub>a</sub>/K<sub>s </sub>ratios, 54 pairs also differ between the two copies in their amino-acid substitution rates. Among the 185 pairs that showed no significant difference in K<sub>a</sub>/K<sub>s </sub>ratios, less than 50% showed significantly different amino-acid substitution rates. A 2 &#215; 2 chi-square test (Table <tblr tid="T1">1</tblr>, &#967;<sup>2 </sup>= 22.675, df = 1, <it>p </it>&lt; 0.001) reveals a significant correlation between different K<sub>a</sub>/K<sub>s </sub>ratios and different amino-acid substitution rates. Therefore, duplicate pairs with different K<sub>a</sub>/K<sub>s </sub>ratios tend to evolve at different rates, suggesting that different functional constraints might be largely responsible for the unequal evolutionary rates, although, as mentioned below, some duplicate genes have apparently undergone positive selection.</p>
            <tbl id="T1" hint_layout="single">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>Substitution rates versus K<sub>a</sub>/K<sub>s </sub>ratios in duplicate genes</p>
               </caption>
               <tblbdy cols="4">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>Different amino-acid substitution rates*</p>
                     </c>
                     <c ca="center">
                        <p>Equal amino-acid substitution rate<sup>&#8224;</sup></p>
                     </c>
                     <c ca="center">
                        <p>Total</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="4">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Different K<sub>a</sub>/K<sub>s </sub>ratios<sup>&#8225;</sup></p>
                     </c>
                     <c ca="center">
                        <p>54</p>
                     </c>
                     <c ca="center">
                        <p>11</p>
                     </c>
                     <c ca="center">
                        <p>65</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Equal K<sub>a</sub>/K<sub>s </sub>ratio<sup>&#167;</sup></p>
                     </c>
                     <c ca="center">
                        <p>91</p>
                     </c>
                     <c ca="center">
                        <p>94</p>
                     </c>
                     <c ca="center">
                        <p>185</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Total</p>
                     </c>
                     <c ca="center">
                        <p>145</p>
                     </c>
                     <c ca="center">
                        <p>105</p>
                     </c>
                     <c ca="center">
                        <p>250</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>A 2 &#215; 2 chi-square test. &#967;<sup>2 </sup>= 12.78, df = 1, <it>p </it>&lt; 0.001. The null hypothesis is that the number of pairs with different K<sub>a</sub>/K<sub>s </sub>ratios is independent of the number of pairs with different amino-acid substitution rates. The values are the observed number of pairs for each category; for example, there are 54 pairs with both different K<sub>a</sub>/K<sub>s </sub>ratios and different amino-acid substitution rates. The amino-acid substitution rates (or the K<sub>a</sub>/K<sub>s </sub>ratios) in the two duplicate genes are considered different only if the difference is statistically significant. *Gene pairs with different amino-acid substitution rates between the two duplicates. <sup>&#8224;</sup>Gene pairs with equal amino-acid substitution rates between the two duplicates. <sup>&#8225;</sup>Gene pairs with different K<sub>a</sub>/K<sub>s </sub>ratios between the two duplicates. <sup>&#167;</sup>Gene pairs with equal K<sub>a</sub>/K<sub>s </sub>ratios between the two duplicates.</p>
               </tblfn>
            </tbl>
            <p>One reason why we could not detect as many pairs with different K<sub>a</sub>/K<sub>s </sub>ratios as pairs with different amino-acid substitution rates could be, in part, because fast-evolving sequences tend to have a higher K<sub>s </sub>than slow-evolving ones. To see whether this was true, we calculated the K<sub>s </sub>difference between the two copies of each pair - the K<sub>s </sub>of the fast-evolving copy minus the K<sub>s </sub>of the slow-evolving copy. Figure <figr fid="F2">2</figr> shows that most of the pairs have a positive K<sub>s </sub>difference, which means that in most of the pairs the fast-evolving copy has a higher K<sub>s </sub>than the slow-evolving copy.</p>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>Distribution of K<sub>s </sub>differences between duplicate genes for gene pairs with different amino-acid substitution rates</p>
               </caption>
               <text>
                  <p>Distribution of K<sub>s </sub>differences between duplicate genes for gene pairs with different amino-acid substitution rates. The <it>x </it>axis is the K<sub>s </sub>difference between duplicate genes (the K<sub>s </sub>of the fast-evolving copy minus the K<sub>s </sub>of the slow-evolving one). The <it>y </it>axis is the number of gene pairs within a K<sub>s </sub>bin. This figure shows that most pairs have a positive K<sub>s </sub>difference, which suggests that the fast-evolving copy usually has a higher K<sub>s </sub>than the slow-evolving copy.</p>
               </text>
               <graphic file="gb-2003-4-9-r56-2"/>
            </fig>
            <p>Two duplicate copies may differ significantly in the number of amino-acid substitutions, which reflects a significant difference between two K<sub>a </sub>values at the nucleotide level. However, the two K<sub>s </sub>values are also different (usually the copy with a higher K<sub>a </sub>also has a higher K<sub>s</sub>) which reduces the chance for the K<sub>a</sub>/K<sub>s </sub>ratios of the two copies to be significantly different. This weak correlation between K<sub>a </sub>and K<sub>s </sub>is consistent with several previous studies <abbrgrp><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr></abbrgrp> and may be largely explained by the fact that silent sites in some genes are also under purifying selection (that is, codon usage bias) <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>. So, although we found nearly 60% of pairs with different amino-acid substitution rates, we found far fewer pairs having different K<sub>a</sub>/K<sub>s </sub>ratios.</p>
            <p>We also looked for evidence of positive selection. Figure <figr fid="F3">3a</figr> shows that most of the genes have a K<sub>a</sub>/K<sub>s </sub>ratio of less than one, although there are still 113 genes with a K<sub>a</sub>/K<sub>s </sub>ratio greater than one. K<sub>a</sub>/K<sub>s </sub>> 1 suggests positive selection but evidence for positive selection requires the ratio to be significantly greater than one. In the genes with K<sub>a</sub>/K<sub>s </sub>> 1, many results are just slightly greater than one and only seven genes are found to have the K<sub>a</sub>/K<sub>s </sub>ratio significantly greater than one. However, this does not imply that only seven pairs of duplicate genes were subject to positive selection because, in many cases, the number of substitutions between two young duplicates may be too small for the test to be statistically significant, even if some of the substitutions have occurred by positive selection.</p>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>The K<sub>a</sub>/K<sub>s </sub>ratio distribution of young human duplicates</p>
               </caption>
               <text>
                  <p>The K<sub>a</sub>/K<sub>s </sub>ratio distribution of young human duplicates. The <it>x </it>axis is the K<sub>a</sub>/K<sub>s </sub>ratio on the branch leading to one human duplicated gene. The <it>y </it>axis is the number of genes within a K<sub>a</sub>/K<sub>s </sub>bin. <b>(a) </b>All genes from the 250 pairs, a total of 500 sequences. <b>(b) </b>The fast-evolving duplicate copies of 250 pairs, a total of 250 sequences. <b>(c) </b>The slow-evolving duplicate copies of 250 pairs, a total of 250 sequences.</p>
               </text>
               <graphic file="gb-2003-4-9-r56-3"/>
            </fig>
            <p>Most of the fast-evolving duplicate copies have higher K<sub>a</sub>/K<sub>s </sub>ratios (Figure <figr fid="F3">3b</figr>) than slow-evolving duplicate copies (Figure <figr fid="F3">3c</figr>). This supports the view that after gene duplication, one duplicate copy may have undergone purifying selection, while the functional constraints on the other copy may have been relaxed to some extent.</p>
         </sec>
         <sec>
            <st>
               <p>Different substitution patterns</p>
            </st>
            <p>Among the 145 fast-evolving human young duplicates, 109 have an even amino-acid substitution pattern across the sequence between the human and mouse orthologs. In other words, these 109 sequences show no large highly-conserved regions. On the other hand, 65 of the 145 slow-evolving copies show evidence of an uneven substitution pattern between human and mouse orthologs, which suggests that they have some slow-evolving regions and some fast-evolving regions at the protein level.</p>
            <p>In order to infer the position of each amino-acid substitution in the sequence, we inferred the ancestral sequences by using PAML (Phylogenetic Analysis by Maximum Likelihood) <abbrgrp><abbr bid="B7">7</abbr></abbrgrp> standard settings, which assume constant rates across sites. It is possible, therefore, that our estimated substitutions may be more evenly distributed than they actually are. However, because we are comparing the percentage of sequences with even patterns in fast-evolving copies to those in slow-evolving copies, this potential bias should be on both sides of the comparison and should not change our conclusion.</p>
            <p>Figure <figr fid="F4">4</figr> with a chi-square test (&#967;<sup>2 </sup>= 12.78, df = 1, <it>p </it>&lt; 0.01) shows that fast-evolving duplicates have a significantly higher proportion of sequences with an even substitution pattern. This finding suggests that most of the fast-evolving copies have more relaxed functional constraints than slow-evolving copies and tend to accumulate substitutions evenly across the sequence. The suggestion of relaxed functional constraints for young duplicates is consistent with the observation of Lynch and Conery <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>. Of course, we cannot exclude the possibility that some of the amino-acid substitutions in fast-evolving copies might have been due to positive Darwinian selection.</p>
            <fig id="F4">
               <title>
                  <p>Figure 4</p>
               </title>
               <caption>
                  <p>Comparison between fast-evolving copies and slow-evolving copies</p>
               </caption>
               <text>
                  <p>Comparison between fast-evolving copies and slow-evolving copies. The figure shows that fast-evolving copies have more cases with substitutions distributed evenly along the sequence than slow-evolving copies. Fast-evolving: the copy that has evolved faster than the other in each duplicate pair. Slow-evolving: the copy that has evolved slower than the other in each duplicate pair. Even pattern: a sequence that has evenly-distributed substitutions along the sequence. Uneven pattern: a sequence that has unevenly-distributed substitutions.</p>
               </text>
               <graphic file="gb-2003-4-9-r56-4"/>
            </fig>
            <p>Our finding is very different from that of Kondrashov <it>et al</it>. <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>, who found only two pairs with unequal evolutionary rates out of 49 pairs studied in mammals. Since they also focused on young duplications (0.05 &lt; K<sub>s </sub>&lt; 0.5) and the approach they used to identify duplicate genes is similar to ours, this may be due to the different datasets used. Since the neutral pattern found in the fast-evolving copies in our study is to some extent similar to the evolution of pseudogenes, we examined the possibility of the inclusion of many pseudogenes in our sample.</p>
            <p>The gene predictions in the Ensembl database <abbrgrp><abbr bid="B15">15</abbr></abbrgrp> we used always produce a translation for each gene and a stringent criterion (near full-length similarity) was used in our grouping method; consequently, our dataset does not include pseudogenes due to premature codons. Since we limited our set of duplicated genes to K<sub>s </sub>> 0.05, a pseudogene in our sample would be likely to have lost its function only very recently, otherwise it would have gained one or more premature stop codons since the time of nonfunctionalization. In the Ensembl database we used only those genes ('known' genes) with experimental support and those genes ('novel' genes) with high similarity to known genes in human and other organisms. Genes purely from Genscan predictions were not used in this analysis. These approaches would have effectively reduced the portion of pseudogenes in our dataset. If the functional constraints on a gene are largely relaxed, the evolutionary pattern of this gene may be similar to that of pseudogenes. So it is possible that some of the fast-evolving genes may be on their way to become pseudogenes, although it is still possible that they may evolve new functions. Kondrashov <it>et al. </it><abbrgrp><abbr bid="B6">6</abbr></abbrgrp> used a cDNA-based dataset and found only a few duplicated pairs with different evolutionary rates, which may have represented those genes that survived well through selection and were still functioning. In other words, the cDNA-based genes which they used are normally expressed, meaning these genes may still be under strong selection pressure. Our dataset might be more appropriate for providing a general scenario of how two duplicate genes evolve after gene duplication.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>We used conservative criteria to select young human duplicate pairs and applied a stringent statistical method to test whether two duplicate copies exhibit different evolutionary patterns. Our results suggest that, in most cases, during the early stage of evolution following gene duplication, the two duplicates evolve at different rates, which could affect the fate of the two copies. Different functional constraints on the two copies may have been largely responsible for the different rates. One copy may have relaxed functional constraints, while the other could still be under strong constraints. The stringent statistical tests used in this study might have underestimated the proportion of pairs with this pattern, but this could only strengthen our argument.</p>
      </sec>
      <sec>
         <st>
            <p>Materials and methods</p>
         </st>
         <sec>
            <st>
               <p>Processing data and selecting independent young human duplicate genes</p>
            </st>
            <p>Human genes were downloaded from the Ensembl human database <abbrgrp><abbr bid="B15">15</abbr></abbrgrp> version 11.31 (28 February 2003). The original dataset is available from the authors on request. Only known and novel genes were used in this analysis; those sequences containing repetitive elements detected by FASTA (E = 10<sup>-5</sup>) searching against Repbase <abbrgrp><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr><abbr bid="B18">18</abbr></abbrgrp> were removed from the dataset. If two genes overlapped at a chromosomal position, the gene with the longer protein was retained. The protein sequences selected were grouped into families by the method used by Gu <it>et al</it>. <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>. Within each gene family, the selection of independent duplicate gene pairs proceeded with increasing K<sub>s</sub>. That is, within each gene family, we selected the gene pair with the smallest K<sub>s </sub>and excluded it from the family and then selected the gene pair with the smallest K<sub>s </sub>from among the remaining genes. We repeated this until no gene pairs could be selected. This method ensured that a gene in one pair would not appear in another pair. Among the duplicate genes we selected, those pairs with 0.05 &lt; K<sub>s </sub>&lt; 0.3 were used in this study. We used K<sub>s </sub>&lt; 0.3 as a cutoff to define young human duplicates. With the K<sub>s </sub>between human duplicate genes less than 0.3, the duplication would have occurred less than 50 million years ago. Pairs with K<sub>s </sub>&lt; 0.05 have too few substitutions to make a statistical test meaningful. For each pair we selected, both copies were used to search the mouse database. Mouse genes were also obtained from the Ensembl database <abbrgrp><abbr bid="B15">15</abbr></abbrgrp> version 11.3 (28 February 2003) and were cleaned using the same procedure that was used to clean the human database. The pairs in which the two copies had the same best hits and a human-mouse K<sub>s </sub>&lt; 1 were kept for this study. We chose K<sub>s </sub>&lt; 1 as a cutoff point because a distant outgroup makes it harder to detect rate differences. A set of 250 young human duplicate pairs and their mouse orthologs were thus retained.</p>
         </sec>
         <sec>
            <st>
               <p>Statistical methods to compare evolutionary patterns between two copies</p>
            </st>
            <p>To calculate if the evolutionary rates and the K<sub>a</sub>/K<sub>s </sub>ratio are the same between the two duplicate copies, the likelihood-ratio test <abbrgrp><abbr bid="B8">8</abbr></abbrgrp> was applied to each pair selected. To test the hypothesis of equal evolutionary rates between the two duplicate copies at the amino-acid level, a two-rate model and a free-rate model were compared. The two models differ in that the two-rate model assumes the same evolutionary rate on the two branches leading to the two duplicates but allows the rate on the outgroup branch to be different, while the free-rate model does not impose any equal rates among branches. The codeml program (set seqtype = 2 for amino-acid sequences) in the PAML package was run for each of the two models with all parameters set to default except for the parameter 'model' for amino-acid substitution, which was set to 'Poisson'. We also set this parameter to the 'Jones-Taylor-Thornton model' and the conclusion was basically the same. Two maximum likelihood values for the two models were given, and twice the difference was compared to a chi-square distribution. If significant, the results suggest that the two branches have evolved at unequal rates. To test if the K<sub>a</sub>/K<sub>s </sub>ratios are different between the coding sequences of the two duplicates, a two-ratio model, which assumes the same K<sub>a</sub>/K<sub>s </sub>ratio on the branches leading to the two duplicates but an independent K<sub>a</sub>/K<sub>s </sub>ratio on the branch leading to the outgroup, was compared to the free-ratio model, which assumes an independent K<sub>a</sub>/K<sub>s </sub>ratio for each branch.</p>
         </sec>
         <sec>
            <st>
               <p>Substitution patterns and differential selection</p>
            </st>
            <p>Using the PAML package <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>, the ancestral sequence of each human duplicate gene pair was reconstructed and the position of each substitution was located. Tang and Lewontin's <abbrgrp><abbr bid="B9">9</abbr></abbrgrp> method was then applied to calculate the T statistics of each human sequence. For each human sequence, 100,000 pseudo-sequences were generated under the null hypothesis that the substitutions distribute evenly across the sequences, and the distribution of T statistics of these 100,000 pseudo-sequences was generated. The T statistic of the real sequence was then compared to this distribution. If the T statistic was extremely large or small (the rejection level is 0.05), we considered it to be an even distribution. The program was written in Perl and is available upon request.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Additional data files</p>
         </st>
         <p>The amino-acid alignments (Additional data file <supplr sid="s1">1</supplr>) and coding sequence alignments (Additional data file <supplr sid="s2">2</supplr>) are available in PAML format.</p>
         <suppl id="s1">
            <title>
               <p>Additional data file 1</p>
            </title>
            <caption>
               <p>The amino-acid alignments</p>
            </caption>
            <text>
               <p>The amino-acid alignments</p>
            </text>
            <file name="gb-2003-4-9-r56-s1.paml">
               <p>Click here for additional data file</p>
            </file>
         </suppl>
         <suppl id="s2">
            <title>
               <p>Additional data file 2</p>
            </title>
            <caption>
               <p>The coding sequence alignments</p>
            </caption>
            <text>
               <p>The coding sequence alignments</p>
            </text>
            <file name="gb-2003-4-9-r56-s2.paml">
               <p>Click here for additional data file</p>
            </file>
         </suppl>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>This study was supported by NIH grants.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <aug>
               <au>
                  <snm>Ohno</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Evolution by Gene Duplication.</source>
            <publisher>Evolution by Gene Duplication. Berlin: Springer-Verlag</publisher>
            <pubdate>1970</pubdate>
         </bibl>
         <bibl id="B2">
            <title>
               <p>The evolutionary fate and consequences of duplicate genes.</p>
            </title>
            <aug>
               <au>
                  <snm>Lynch</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Conery</snm>
                  <fnm>JS</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2000</pubdate>
            <volume>290</volume>
            <fpage>1151</fpage>
            <lpage>1155</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.290.5494.1151</pubid>
                  <pubid idtype="pmpid" link="fulltext">11073452</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Evolution of duplicate genes in a tetraploid animal, <it>Xenopus laevis</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Hughes</snm>
                  <fnm>MK</fnm>
               </au>
               <au>
                  <snm>Hughes</snm>
                  <fnm>AL</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>1993</pubdate>
            <volume>10</volume>
            <fpage>1360</fpage>
            <lpage>1369</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">8277859</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Evolutionary rates of duplicate genes in fish and mammals.</p>
            </title>
            <aug>
               <au>
                  <snm>Robinson-Rechavi</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Laudet</snm>
                  <fnm>V</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2001</pubdate>
            <volume>18</volume>
            <fpage>681</fpage>
            <lpage>683</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11264421</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>The ghost of selection past: rates of evolution and functional divergence of anciently duplicated genes.</p>
            </title>
            <aug>
               <au>
                  <snm>Van de Peer</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Taylor</snm>
                  <fnm>JS</fnm>
               </au>
               <au>
                  <snm>Braasch</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Meyer</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>J Mol Evol</source>
            <pubdate>2001</pubdate>
            <volume>53</volume>
            <fpage>436</fpage>
            <lpage>446</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1007/s002390010233</pubid>
                  <pubid idtype="pmpid" link="fulltext">11675603</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Selection in the evolution of gene duplications.</p>
            </title>
            <aug>
               <au>
                  <snm>Kondrashov</snm>
                  <fnm>FA</fnm>
               </au>
               <au>
                  <snm>Rogozin</snm>
                  <fnm>IB</fnm>
               </au>
               <au>
                  <snm>Wolf</snm>
                  <fnm>YI</fnm>
               </au>
               <au>
                  <snm>Koonin</snm>
                  <fnm>EV</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2002</pubdate>
            <volume>3</volume>
            <fpage>research0008.1</fpage>
            <lpage>0008.9</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1186/gb-2002-3-2-research0008</pubid>
                  <pubid idtype="pmpid" link="fulltext">11864370</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>PAML: a program package for phylogenetic analysis by maximum likelihood.</p>
            </title>
            <aug>
               <au>
                  <snm>Yang</snm>
                  <fnm>Z</fnm>
               </au>
            </aug>
            <source>CABIOS</source>
            <pubdate>1997</pubdate>
            <volume>13</volume>
            <fpage>555</fpage>
            <lpage>556</lpage>
            <xrefbib>
               <pubid idtype="pmpid">9367129</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Synonymous and nonsynonymous rate variation in nuclear genes of mammals.</p>
            </title>
            <aug>
               <au>
                  <snm>Yang</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Nielsen</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>J Mol Evol</source>
            <pubdate>1998</pubdate>
            <volume>46</volume>
            <fpage>409</fpage>
            <lpage>418</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">9541535</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Locating regions of differential variability in DNA and protein sequences.</p>
            </title>
            <aug>
               <au>
                  <snm>Tang</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Lewontin</snm>
                  <fnm>RC</fnm>
               </au>
            </aug>
            <source>Genetics</source>
            <pubdate>1999</pubdate>
            <volume>153</volume>
            <fpage>485</fpage>
            <lpage>495</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">10471728</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>Amino acid composition and the evolutionary rates of protein-coding genes.</p>
            </title>
            <aug>
               <au>
                  <snm>Graur</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>J Mol Evol</source>
            <pubdate>1985</pubdate>
            <volume>22</volume>
            <fpage>53</fpage>
            <lpage>62</lpage>
            <xrefbib>
               <pubid idtype="pmpid">3932664</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Mammalian gene evolution: nucleotide sequence divergence between mouse and rat.</p>
            </title>
            <aug>
               <au>
                  <snm>Wolfe</snm>
                  <fnm>KH</fnm>
               </au>
               <au>
                  <snm>Sharp</snm>
                  <fnm>PM</fnm>
               </au>
            </aug>
            <source>J Mol Evol</source>
            <pubdate>1993</pubdate>
            <volume>37</volume>
            <fpage>441</fpage>
            <lpage>456</lpage>
            <xrefbib>
               <pubid idtype="pmpid">8308912</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Frequencies of synonymous substitutions in mammals are gene-specific and correlated with frequencies of nonsynonymous substitutions.</p>
            </title>
            <aug>
               <au>
                  <snm>Mouchiroud</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Gautier</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Bernardi</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>J Mol Evol</source>
            <pubdate>1995</pubdate>
            <volume>40</volume>
            <fpage>107</fpage>
            <lpage>113</lpage>
            <xrefbib>
               <pubid idtype="pmpid">7714909</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Variation in synonymous substitution rates among mammalian genes and the correlation between synonymous and nonsynonymous divergence.</p>
            </title>
            <aug>
               <au>
                  <snm>Ohta</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Ina</snm>
                  <fnm>Y</fnm>
               </au>
            </aug>
            <source>J Mol Evol</source>
            <pubdate>1995</pubdate>
            <volume>41</volume>
            <fpage>717</fpage>
            <lpage>720</lpage>
            <xrefbib>
               <pubid idtype="pmpid">8587116</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>Selection on human genes as revealed by comparisons to chimpanzee cDNA.</p>
            </title>
            <aug>
               <au>
                  <snm>Hellmann</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Zollner</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Enard</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Ebersberger</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Nickel</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Paabo</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2003</pubdate>
            <volume>13</volume>
            <fpage>831</fpage>
            <lpage>7</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1101/gr.944903</pubid>
                  <pubid idtype="pmpid" link="fulltext">12727903</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>The Ensembl genome database project.</p>
            </title>
            <aug>
               <au>
                  <snm>Hubbard</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Barker</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Birney</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Cameron</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Clark</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Cox</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Cuff</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Curwen</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Down</snm>
                  <fnm>T</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2002</pubdate>
            <volume>30</volume>
            <fpage>38</fpage>
            <lpage>41</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/nar/30.1.38</pubid>
                  <pubid idtype="pmpid" link="fulltext">11752248</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Repbase Update: a database and an electronic journal of repetitive elements.</p>
            </title>
            <aug>
               <au>
                  <snm>Jurka</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Trends Genet</source>
            <pubdate>2000</pubdate>
            <volume>16</volume>
            <fpage>418</fpage>
            <lpage>420</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0168-9525(00)02093-X</pubid>
                  <pubid idtype="pmpid" link="fulltext">10973072</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>Repeats in genomic DNA: mining and meaning.</p>
            </title>
            <aug>
               <au>
                  <snm>Jurka</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Curr Opin Struct Biol</source>
            <pubdate>1998</pubdate>
            <volume>8</volume>
            <fpage>333</fpage>
            <lpage>337</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0959-440X(98)80067-5</pubid>
                  <pubid idtype="pmpid">9666329</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>Repbase</p>
            </title>
            <url>http://www.girinst.org/Repbase_Update.html</url>
         </bibl>
         <bibl id="B19">
            <title>
               <p>Extent of gene duplication in the genomes of <it>Drosophila</it>, nematode, and yeast.</p>
            </title>
            <aug>
               <au>
                  <snm>Gu</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Cavalcanti</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>FC</fnm>
               </au>
               <au>
                  <snm>Bouman</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>WH</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2002</pubdate>
            <volume>19</volume>
            <fpage>256</fpage>
            <lpage>262</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11861885</pubid>
            </xrefbib>
         </bibl>
      </refgrp>
   </bm>
</art>

