<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>gb-2009-10-1-r10</ui>
   <ji>GBJ</ji>
   <fm>
      <dochead>Research</dochead>
      <bibl>
         <title>
            <p>Coding region structural heterogeneity and turnover of transcription start sites contribute to divergence in expression between duplicate genes</p>
         </title>
         <aug>
            <au id="A1">
               <snm>Park</snm>
               <fnm>Chungoo</fnm>
               <insr iid="I1"/>
               <email>cxp440@psu.edu</email>
            </au>
            <au id="A2" ca="yes">
               <snm>Makova</snm>
               <mi>D</mi>
               <fnm>Kateryna</fnm>
               <insr iid="I1"/>
               <email>kdm16@psu.edu</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Center for Comparative Genomics and Bioinformatics, Department of Biology, The Pennsylvania State University, University Park, PA 16802, USA</p>
            </ins>
         </insg>
         <source>Genome Biology</source>
         <issn>1465-6906</issn>
         <pubdate>2009</pubdate>
         <volume>10</volume>
         <issue>1</issue>
         <fpage>R10</fpage>
         <url>http://genomebiology.com/2009/10/1/R10</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">19175934</pubid>
               <pubid idtype="doi">10.1186/gb-2009-10-1-r10</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>11</day>
               <month>10</month>
               <year>2008</year>
            </date>
         </rec>
         <revrec>
            <date>
               <day>24</day>
               <month>12</month>
               <year>2008</year>
            </date>
         </revrec>
         <acc>
            <date>
               <day>28</day>
               <month>1</month>
               <year>2009</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>28</day>
               <month>1</month>
               <year>2009</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2009</year>
         <collab>Park and Makova; licensee BioMed Central Ltd.</collab>
         <note>This is an open access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <shorttitle>
         <p>Divergent expression of duplicated genes</p>
      </shorttitle>
      <shortabs>
         <p>Gene expression data for duplicated gene pairs in humans provides insights into the regulatory factors affecting the expression divergence of these genes and implications for their evolution.</p>
      </shortabs>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>Gene expression divergence is one manifestation of functional differences between duplicate genes. Although rapid accumulation of expression divergence between duplicate gene copies has been observed, the driving mechanisms behind this phenomenon have not been explored in detail.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>We examine which factors influence expression divergence between human duplicate genes, utilizing the latest genome-wide data sets. We conclude that the turnover of transcription start sites between duplicate genes occurs rapidly after gene duplication and that gene pairs with shared transcription start sites have significantly higher expression similarity than those without shared transcription start sites. Moreover, we find that most (55%) duplicate gene pairs do not retain the same coding sequence structure between the two duplicate copies and this also contributes to divergence in their expression. Furthermore, the proportion of aligned sequences in <it>cis</it>-regulatory regions between the two copies is positively correlated with expression similarity. Surprisingly, we find no effect of copy-specific transposable element insertions on the divergence of duplicate gene expression.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusions</p>
               </st>
               <p>Our results suggest that turnover of transcription start sites, structural heterogeneity of coding sequences, and divergence of <it>cis</it>-regulatory regions between copies play a pivotal role in determining the expression divergence of duplicate genes.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <meta>
      <classifications>
         <classification type="BMC" subtype="man_spc_id" id="30010008">Evolution</classification>
         <classification type="BMC" subtype="man_spc_id" id="30010010">Genome studies</classification>
      </classifications>
   </meta>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>Because of the importance of gene duplication in evolution <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr><abbr bid="B5">5</abbr></abbrgrp>, it is crucial to know how duplicate genes diverge and which factors determine their destiny. Recently, genome-wide analyses of microarray data <abbrgrp><abbr bid="B6">6</abbr></abbrgrp> have revealed patterns of expression divergence in duplicate genes, which are necessary for understanding the emergence of new functions after gene duplication. Numerous studies indicated that genes diverge rapidly in their expression after duplication <abbrgrp><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr><abbr bid="B12">12</abbr></abbrgrp>. Population genetic models proposed directional selection and relaxation of selective constraints as possible forces driving the evolution of expression in duplicate genes, although the relative frequency of these two scenarios in the evolution of paralogs is still being debated <abbrgrp><abbr bid="B4">4</abbr><abbr bid="B5">5</abbr><abbr bid="B13">13</abbr></abbrgrp>. These population genetic models have been implemented under the assumption that two duplicated gene copies are structurally and functionally identical immediately after duplication. However, this assumption is sometimes violated. First, genes duplicated via retrotransposition lose regulatory sequences and include additional sequences at each side (for example, poly(A) tails at 3' terminus and short direct repeats at both termini), so that retrotransposed copies differ from the corresponding parental genes <abbrgrp><abbr bid="B4">4</abbr><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr></abbrgrp>. Second, tandem duplication by unequal crossing over might not include the entire coding sequence and/or regulatory elements specifying expression of a parental gene. Indeed, Katju and Lynch <abbrgrp><abbr bid="B15">15</abbr></abbrgrp> demonstrated that more than half of newborn duplicate genes in <it>Caenorhabditis elegans </it>represent not complete, but rather partial or chimeric duplications. Such structural heterogeneity may play an important role in rapid expression divergence between human duplicate genes as well; however, it has not been considered in detail in previous studies.</p>
         <p>Transposable elements (TEs) represent another factor that might account for the expression divergence of duplicate genes, since several studies provided evidence of TEs altering gene expression. Jordan and colleagues <abbrgrp><abbr bid="B16">16</abbr></abbrgrp> showed that almost 25% of human promoter regions as well as many other <it>cis</it>-regulatory elements contain, or at least overlap with, TE-derived sequences. This result was later confirmed by another study <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>. A specific example of the importance of TEs in the regulation of gene expression comes from the <it>CYP19 </it>gene, which encodes the aromatase enzyme, important for estrogen biosynthesis <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>. Because of the recent insertion of a long terminal repeat into the first exon of one of the isoforms of human <it>CYP19</it>, the gene gained expression in placenta, while its mouse ortholog has no long terminal repeat and is not expressed there <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>.</p>
         <p>Finally, alternative promoter usage by duplicate genes should be considered as a mechanism for rapid expression divergence. Recent comprehensive studies concluded that many known genes in the human genome are expressed from alternative promoters <abbrgrp><abbr bid="B20">20</abbr><abbr bid="B21">21</abbr><abbr bid="B22">22</abbr><abbr bid="B23">23</abbr></abbrgrp>. Similarly, approximately 22% of genes in the ENCODE regions have functional alternative promoters <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>. The alternative promoters provide a heterogeneity in tissue-specific expression patterns and levels, developmental activity, and translational efficiency <abbrgrp><abbr bid="B25">25</abbr><abbr bid="B26">26</abbr><abbr bid="B27">27</abbr></abbrgrp>. As a result, the use of alternative promoters might be one of the major sources for achieving transcriptome diversity and one of the routes by which duplicate genes acquire divergence in their expression.</p>
         <p>To investigate what drives expression divergence of human paralogs on a genome-wide scale, we addressed the following three questions in the present study: how frequently the turnover of transcription start sites (TSSs) occurs between duplicate genes; how often duplicate gene copies (their coding sequences) differ from each other structurally; and whether the density of copy-specific TEs within <it>cis</it>-regulatory regions influences expression divergence in duplicated genes. We utilized the gene expression profile available for 61 non-redundant and non-pathogenic human tissues <abbrgrp><abbr bid="B28">28</abbr></abbrgrp>, the largest comprehensive expression profile of human genes available to date, and assessed the contributions of TSS turnover, coding sequence structural heterogeneity, and TE integration to divergence in duplicate gene expression.</p>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <sec>
            <st>
               <p>Identification of duplicate genes</p>
            </st>
            <p>Utilizing two different methods, FASTA and TRIBE-MCL, we identified 6,536 and 7,027 non-redundant human duplicate gene pairs, respectively (see Materials and methods for details). These pairs represented 3,313 and 3,555 gene families, respectively. After filtering out duplicate gene pairs with synonymous rate (<it>K</it><sub><it>S</it></sub>) >2 and/or lacking a start codon, we obtained 2,790 and 2,750 duplicate gene pairs using the former and the latter methods, respectively. A total of 1,600 duplicate gene pairs overlapped between these two data sets (Additional data file 2). All subsequent analyses were carried out for duplicate genes identified with each of the two methods. Because the results were similar, we present the results only for duplicate genes identified with the FASTA method (2,790 gene pairs in group A), as this method is stricter for clustering proteins into families compared with the TRIBE-MCL method <abbrgrp><abbr bid="B29">29</abbr><abbr bid="B30">30</abbr></abbrgrp>.</p>
            <p>From human U133A and GNF1H oligonucleotide arrays <abbrgrp><abbr bid="B28">28</abbr></abbrgrp>, we defined 14,505 genes that mapped to probes with a one-to-one correspondence (see Materials and methods), thus minimizing cross-hybridization. Among these genes, we were able to detect 2,924 non-redundant duplicate gene pairs belonging to 1,792 multiple gene families. After filtering out duplicate gene pairs with <it>K</it><sub><it>S </it></sub>>2 and/or lacking a start codon, we obtained 1,015 duplicate gene pairs (group B, representing a subset of group A). In the remainder of the manuscript, we consider duplicate genes of group B when gene expression is investigated and duplicate genes of group A otherwise.</p>
         </sec>
         <sec>
            <st>
               <p>Turnover of TSSs between duplicate genes</p>
            </st>
            <p>Initially, we analyzed the divergence in the position of TSSs between copies in each duplicate gene pair. Using tag clusters, which were built by grouping overlapping tags (namely, 5'-end-sequences) with the same strand, from large-scale tag clustering of the cap analysis of gene expression (CAGE) <abbrgrp><abbr bid="B20">20</abbr></abbrgrp> and the paired-end ditags (PETs) <abbrgrp><abbr bid="B31">31</abbr></abbrgrp>, putative TSSs of each gene were identified (see Materials and methods). From 2,790 duplicate gene pairs in group A, we excluded duplicate gene pairs that were duplicated by retrotransposition or for which at least one copy lacked a TSS(s) identified by either CAGE or PETs. As a result, 1,124 duplicate gene pairs were retained. To evaluate sharing of TSSs between duplicate genes, we compared the sequences of genomic regions surrounding putative TSSs (as identified by CAGE or PETs) between the two copies for each of these 1,124 duplicate gene pairs. We considered 110 bp (-20 bp to +90 bp) surrounding each TSS (later called the 'TSS region'), because there was a clear peak in the average sequence similarity between TSSs of duplicate genes in this region (Additional data file 3) and because several studies indicated that a region of this size surrounding TSSs was well conserved between human and mouse orthologs <abbrgrp><abbr bid="B32">32</abbr><abbr bid="B33">33</abbr></abbrgrp>. Sequence similarity between all possible combinations of TSS regions from each duplicate gene pair was considered. If at least one pair of TSS regions had an identity greater than 60%, it was defined as a TSS(s) shared between the two duplicate copies. As a result, 13.6% (153 out of 1,124) of duplicate gene pairs had shared TSSs.</p>
            <p>We observed that the relative frequency of gene pairs with shared TSSs decreases with increasing <it>K</it><sub><it>S</it></sub>, a proxy of time since duplication (Figure <figr fid="F1">1</figr>). The L-shaped distribution observed in Figure <figr fid="F1">1</figr> implies a rapid turnover of TSSs after gene duplication. Already at <it>K</it><sub><it>S </it></sub>= 0.1, corresponding to only about 33 million years ago since duplication <abbrgrp><abbr bid="B34">34</abbr></abbrgrp>, a mere 64% of duplicate genes share TSSs. Considering an instantaneous <it>K</it><sub><it>S </it></sub>rate according to <abbrgrp><abbr bid="B35">35</abbr></abbrgrp> did not alter our results (Additional data file 4).</p>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>The decline in the proportion of group A duplicate gene pairs with shared TSSs (shown in black) depending on the time since duplication (approximated by <it>K</it><sub><it>S</it></sub>)</p>
               </caption>
               <text>
                  <p>The decline in the proportion of group A duplicate gene pairs with shared TSSs (shown in black) depending on the time since duplication (approximated by <it>K</it><sub><it>S</it></sub>). The proportion of human-mouse orthologous genes with conserved TSSs is shown for comparison (in gray); in this case variation in <it>K</it><sub><it>S </it></sub>is due to regional variation in substitution rates.</p>
               </text>
               <graphic file="gb-2009-10-1-r10-1"/>
            </fig>
            <p>Interestingly, the turnover of TSSs between human duplicate genes was much more rapid than between human-mouse orthologs. Indeed, for 1,610 human-mouse orthologs considered (see Materials and methods), the mean <it>K</it><sub><it>S </it></sub>was 0.61 (with a 95% confidence interval of 0.60-0.63), while the proportion of orthologs with shared TSSs was 0.71, several fold higher than the proportion of human duplicate genes with similar <it>K</it><sub><it>S </it></sub>(Figure <figr fid="F1">1</figr>).</p>
            <p>To estimate the relationship between TSS usage patterns (for example, shared TSSs versus non-shared TSSs) and gene duplication mechanisms, the duplicate genes were divided into three classes: retrotransposed duplicate genes, tandem, and nontandem duplications (see Materials and methods for details). The relative frequencies of gene pairs with shared TSSs in each class were calculated (thus, we analyzed 1,124 non-retransposed genes as above plus 220 retrotransposed genes). Duplicate gene copies in which one of the pair has one exon and the duplicate copy has multiple exons were called retrotransposed duplicate gene copies. We found that among paralogs with shared TSSs, the majority of pairs represented tandem duplicates (Additional data file 1).</p>
            <p>Interestingly, about 30% (67 out of 220) of retrotransposed duplicate gene pairs retained the same TSSs (Additional data file 1). To evaluate whether the retrotransposed gene pairs with shared TSSs tend to undergo stronger purifying selection than those without shared TSSs, the median nonsynonymous-to-synonymous rate ratios (<it>K</it><sub><it>A</it></sub>/<it>K</it><sub><it>S</it></sub>) were compared between these two groups of genes; however, no significant difference was detected (0.475 versus 0.499; <it>P </it>> 0.1, Mann-Whitney U test).</p>
            <p>Next, to test whether the turnover of TSSs may contribute to the expression divergence in duplicate genes, the Pearson correlation coefficient of expression values (<it>R</it><sub><it>expression</it></sub>; calculated for 61 non-redundant tissues) between the two copies in each pair was computed and compared among group B duplicate gene pairs with shared TSSs versus those without shared TSSs (a total of 581 group B pairs with available TSS data were included in the analysis). Duplicate genes with shared TSSs had significantly higher <it>R</it><sub><it>expression </it></sub>values than those without shared TSSs (0.437 versus 0.080; <it>P </it>&lt; 0.01, Mann-Whitney U test). It is conceivable that the significant difference in <it>R</it><sub><it>expression </it></sub>values is due to different synonymous rates in genes with shared TSSs versus those without shared TSSs. Indeed, we observed that all duplicate genes (belonging to group B) with shared TSSs had <it>K</it><sub><it>S </it></sub>&lt;0.4, while more than 97% of gene pairs without shared TSSs had <it>K</it><sub><it>S </it></sub>&#8805; 0.4. However, if only genes with <it>K</it><sub><it>S </it></sub>&lt;0.4 were considered, the gene pairs with shared TSSs still had higher (but not significantly so) <it>R</it><sub><it>expression </it></sub>values than those without shared TSSs (0.437 versus 0.140; <it>P </it>> 0.05, Mann-Whitney U test).</p>
            <p>The 60% identity threshold among the TSS regions that was tentatively inferred from substitution rates between human and mouse ortholog core promoters <abbrgrp><abbr bid="B36">36</abbr></abbrgrp> may be inadequate for estimating the sharing of TSSs among human paralogous genes. Thus, we reclassified the sharing of TSSs between copies of duplicate genes using several identity thresholds (40%, 50%, 70%, and 80%). Although the numbers of duplicate genes with shared TSSs in each bin varied with the threshold, the frequency of gene pairs with shared TSSs decreased over divergent time independent of the threshold used (Additional data file 5), consistent with the pattern observed with the 60% identity threshold (Figure <figr fid="F1">1</figr>). Moreover, regardless of the identity threshold, the <it>R</it><sub><it>expression </it></sub>values were significantly higher in duplicate genes with shared TSSs versus those without shared TSSs (data not shown).</p>
         </sec>
         <sec>
            <st>
               <p>Structural heterogeneity in coding regions of human duplicate genes</p>
            </st>
            <p>By reconstructing the full-length coding sequences via concatenating exons from multiple splicing variants for each gene separately, each pair of duplicate genes was classified into one of two structural categories: completely similar and incompletely similar. If the proportion of aligned sequences was greater than 0.9, duplicate gene pairs were categorized as completely similar and as incompletely similar otherwise. For some analyses, incompletely similar duplicate gene copies were classified in one of the three non-overlapping groups: 5' similar, 3' similar, and neither 5' nor 3' similar. If alignments between the two copies started at the start codons of both copies, then such duplicates were classified as 5' similar. Alternatively, if the alignments ended at the stop codons of both copies, we classified the duplicate genes as 3' similar. The remaining duplicate gene pairs were labeled as neither 5' nor 3' similar.</p>
            <p>After excluding genes that lacked start/stop codons or consensus splice sites, 2,591 duplicate gene pairs were retained (from 2,790 pairs of group A; for group B, 889 duplicate gene pairs were retained). We found that 55% (1,429 out of 2,591) of duplicate gene pairs had incompletely similar structures. As expected from the divergence of the coding sequence over time, the proportion of duplicate gene pairs with completely similar structures decreased gradually with divergence between the two duplicate copies, approximated by <it>K</it><sub><it>S </it></sub>(Figure <figr fid="F2">2</figr>). Considering an instantaneous <it>K</it><sub><it>S </it></sub>rate according to <abbrgrp><abbr bid="B35">35</abbr></abbrgrp> did not alter our results (Additional data file 6). Interestingly, even at the smallest duplicate gene divergence (<it>K</it><sub><it>S </it></sub>&lt;0.1), the proportion of genes with completely similar structures was only 80% (Figure <figr fid="F2">2</figr>). Although this finding might be affected by misannotations, our results suggest that some duplicate genes might have acquired structural differences during duplication.</p>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>Proportion of group A duplicate gene pairs classified by coding sequence structural heterogeneity</p>
               </caption>
               <text>
                  <p>Proportion of group A duplicate gene pairs classified by coding sequence structural heterogeneity.</p>
               </text>
               <graphic file="gb-2009-10-1-r10-2"/>
            </fig>
            <p>To analyze whether the incompletely similar structures of duplicate genes can lead to expression divergence, we compared the relationship between <it>R</it><sub><it>expression </it></sub>and <it>K</it><sub><it>S </it></sub>for duplicate genes with completely versus incompletely similar structures. Before addressing this issue, retrotransposed duplicate genes (a total of 108 out of 889 genes retained in group B) were excluded because, as retrotransposition does not include a promoter, it can lead to expression divergence regardless of structural heterogeneity in coding sequence between duplicates. We found that: the correlation coefficient between <it>R</it><sub><it>expression </it></sub>and <it>K</it><sub><it>S </it></sub>for duplicate gene pairs with completely similar structures was significantly lower than that for pairs with incompletely similar structures (R = -0.315 versus R = -0.001; Fisher's z test, z = -4.028, <it>P </it>&lt; 0.001; Kolmogorov-Smirnov test for normality, <it>P </it>&lt; 0.010; Figure <figr fid="F3">3</figr> and Table <tblr tid="T1">1</tblr>); and duplicate genes with completely similar structures had significantly higher y-intercepts of regression lines than duplicate genes with incompletely similar structures (0.407 versus 0.134; z = 2.672, <it>P </it>&lt; 0.01). These observations suggest that, immediately after duplication, the expression pattern is more similar for duplicate gene pairs retaining the same versus acquiring different coding sequence structures, and that divergence of gene expression is more dependent on evolutionary time for duplicate gene pairs with completely versus incompletely similar structures. To estimate the importance of sharing of 5' regions of coding sequences between duplicate gene copies, which can be an indirect indicator of common transcription regulation mechanisms, we separately considered duplicate gene pairs completely similar at the 5' end only (a total of 24 gene pairs from group B that were otherwise genes with incompletely similar structures) and calculated the correlation coefficient between their <it>R</it><sub><it>expression </it></sub>and <it>K</it><sub><it>S</it></sub>. The correlation was negative, but not significant (Table <tblr tid="T1">1</tblr>). When duplicate gene pairs having completely similar and 5' similar structures were considered together, the correlation coefficient between <it>R</it><sub><it>expression </it></sub>and <it>K</it><sub><it>S </it></sub>was somewhat lower than that for duplicate gene pairs with completely similar structures (Table <tblr tid="T1">1</tblr>), although the difference was not significant (z = -0.093, <it>P </it>> 0.1). We observed that there was no correlation between <it>R</it><sub><it>expression </it></sub>and <it>K</it><sub><it>S </it></sub>for duplicate genes with 3' similar structure and with neither 5' nor 3' similar structure (Table <tblr tid="T1">1</tblr>). These results suggest that maintenance of the entire coding region (and not just of its 5' or 3' portion) is important for determining gene expression profile after duplication.</p>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>The relationship between <it>K</it><sub><it>S </it></sub>and <it>R</it><sub><it>expression </it></sub>for group B duplicate genes with (a) completely similar structures and (b) incompletely similar structures</p>
               </caption>
               <text>
                  <p>The relationship between <it>K</it><sub><it>S </it></sub>and <it>R</it><sub><it>expression </it></sub>for group B duplicate genes with <b>(a)</b> completely similar structures and <b>(b)</b> incompletely similar structures.</p>
               </text>
               <graphic file="gb-2009-10-1-r10-3"/>
            </fig>
            <tbl id="T1">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>The relationship between <it>K</it><sub><it>S </it></sub>and <it>R</it><sub><it>expression </it></sub>in each structural category using group B duplicate gene pairs</p>
               </caption>
               <tblbdy cols="6">
                  <r>
                     <c ca="left">
                        <p>Structural categories</p>
                     </c>
                     <c ca="center">
                        <p>Number of gene pairs</p>
                     </c>
                     <c ca="center">
                        <p><it>K</it><sub><it>A</it></sub>/<it>K</it><sub><it>S</it></sub>*</p>
                     </c>
                     <c ca="center">
                        <p><it>K</it><sub><it>S</it></sub>*</p>
                     </c>
                     <c ca="center">
                        <p><it>R</it><sub><it>expression</it></sub>*</p>
                     </c>
                     <c ca="center">
                        <p>Pearson correlation coefficient of <it>K</it><sub><it>S </it></sub>versus <it>R</it><sub><it>expression </it></sub>(<it>P</it>-value)</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="6">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Completely similar</p>
                     </c>
                     <c ca="center">
                        <p>214</p>
                     </c>
                     <c ca="center">
                        <p>0.296 (0.237)</p>
                     </c>
                     <c ca="center">
                        <p>1.153 (1.225)</p>
                     </c>
                     <c ca="center">
                        <p>0.213 (0.162)</p>
                     </c>
                     <c ca="center">
                        <p>-0.315 (&lt;0.001)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>5' similar</p>
                     </c>
                     <c ca="center">
                        <p>24</p>
                     </c>
                     <c ca="center">
                        <p>0.391 (0.311)</p>
                     </c>
                     <c ca="center">
                        <p>1.292 (1.501)</p>
                     </c>
                     <c ca="center">
                        <p>0.053 (0.026)</p>
                     </c>
                     <c ca="center">
                        <p>-0.157 (NS)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>3' similar</p>
                     </c>
                     <c ca="center">
                        <p>23</p>
                     </c>
                     <c ca="center">
                        <p>0.302 (0.311)</p>
                     </c>
                     <c ca="center">
                        <p>1.365 (1.610)</p>
                     </c>
                     <c ca="center">
                        <p>0.346 (0.249)</p>
                     </c>
                     <c ca="center">
                        <p>0.019 (NS)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Neither 5' nor 3' similar</p>
                     </c>
                     <c ca="center">
                        <p>520</p>
                     </c>
                     <c ca="center">
                        <p>0.551 (0.456)</p>
                     </c>
                     <c ca="center">
                        <p>1.565 (1.658)</p>
                     </c>
                     <c ca="center">
                        <p>0.126 (0.063)</p>
                     </c>
                     <c ca="center">
                        <p>0.017 (NS)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Incompletely similar (the sum of the above three categories)</p>
                     </c>
                     <c ca="center">
                        <p>567</p>
                     </c>
                     <c ca="center">
                        <p>0.534 (0.444)</p>
                     </c>
                     <c ca="center">
                        <p>1.545 (1.646)</p>
                     </c>
                     <c ca="center">
                        <p>0.132 (0.068)</p>
                     </c>
                     <c ca="center">
                        <p>-0.001 (NS)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Completely and 5' similar</p>
                     </c>
                     <c ca="center">
                        <p>238</p>
                     </c>
                     <c ca="center">
                        <p>0.307 (0.246)</p>
                     </c>
                     <c ca="center">
                        <p>1.167 (1.263)</p>
                     </c>
                     <c ca="center">
                        <p>0.197 (0.151)</p>
                     </c>
                     <c ca="center">
                        <p>-0.307 (&lt;0.001)</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>*Values are mean (median). NS, not significant.</p>
               </tblfn>
            </tbl>
            <p>To estimate differences in selective pressure among duplicate genes in different structural categories, their <it>K</it><sub><it>A</it></sub>/<it>K</it><sub><it>S </it></sub>ratios were compared (Table <tblr tid="T1">1</tblr>). We observed that <it>K</it><sub><it>A</it></sub>/<it>K</it><sub><it>S </it></sub>was significantly lower for duplicate genes with completely similar structures than for those with incompletely similar structures (<it>P </it>&lt; 0.001, Mann-Whitney U test; Table <tblr tid="T1">1</tblr>), suggesting that the former genes are subject to stronger purifying selection than the latter genes.</p>
         </sec>
         <sec>
            <st>
               <p>Divergence of <it>cis</it>-regulatory sequences between duplicate genes</p>
            </st>
            <p>Next, we evaluated the relative contribution of <it>cis</it>-regulatory divergence to differences in expression between copies of duplicate genes in each pair. The 2-kb (from -1.5 kb to +0.5 kb) genomic regions surrounding TSSs were used as putative <it>cis</it>-regulatory sequences and their divergence was estimated with REALIGNER <abbrgrp><abbr bid="B37">37</abbr></abbrgrp>. For genes with multiple TSSs, a TSS supported by the highest number of CAGE/PET tags was selected. This analysis was limited to group B duplicate genes with completely similar structures (a total of 158 duplicate gene pairs). We found a significant positive correlation (R = 0.242, <it>P </it>&lt; 0.01) between the proportion of aligned sequences in the <it>cis</it>-regulatory region (<it>P</it><sub><it>cis</it></sub>) and <it>R</it><sub><it>expression</it></sub>. This implies that the divergence of <it>cis</it>-regulatory regions leads to expression divergence in duplicate genes. After duplicate genes created by retrotransposition (a total of 23 gene pairs) were excluded, the correlation coefficient was even higher (R = 0.252, <it>P </it>&lt; 0.01). Through comparison between <it>K</it><sub><it>S </it></sub>(which may serve as a neutral proxy, although see <abbrgrp><abbr bid="B38">38</abbr></abbrgrp>) on the one hand and the proportion (corrected for multiple hits using HKY85 model) of aligned sequences in the <it>cis</it>-regulatory region on the other hand in each non-retrotransposed duplicate gene pair, we estimated whether the <it>cis</it>-regulatory regions evolved neutrally. We found that for 107 out of 135 duplicate gene pairs compared, <it>K</it><sub><it>S </it></sub>was significantly higher (<it>P </it>&lt; 0.001, Wilcoxon signed-rank test) than the proportion of aligned sequences in the <it>cis</it>-regulatory region, suggesting that purifying selection acts at <it>cis</it>-regulatory regions.</p>
            <p>To investigate whether copy-specific TEs influence divergence in duplicate gene expression, we identified such TEs (TEs that integrated in the <it>cis</it>-regulatory region of only one duplicate gene copy of a pair after duplication) in the same 2-kb regions surrounding TSSs of the above 158 duplicate genes pairs (excluding 23 retrotransposed duplicate pairs; see Materials and methods). However, no significant correlation was found between the proportion of copy-specific TEs and either <it>P</it><sub><it>cis </it></sub>for duplicate genes or <it>R</it><sub><it>expression </it></sub>(data not shown). This suggests that the effect of copy-specific TEs on divergence in duplicate gene expression may be at best minor, although this issue requires additional studies.</p>
         </sec>
         <sec>
            <st>
               <p>Interplay of multiple predictors in explaining divergence of paralogous gene expression</p>
            </st>
            <p>Because several factors studied above might be interrelated, we conducted multiple regression analysis to estimate the relative contribution of each factor to explaining the total variability in <it>R</it><sub><it>expression</it></sub>. A total of four continuous predictors (<it>K</it><sub><it>A</it></sub>, <it>K</it><sub><it>S</it></sub>, the <it>K</it><sub><it>A</it></sub>/<it>K</it><sub><it>S </it></sub>ratio, and divergence of <it>cis</it>-regulatory sequences (labeled 'Cis') and three categorical predictors (shared versus not shared TSSs (labeled 'TSS'); completely versus incompletely similar gene structure (labeled 'Structure'); and tandem versus non-tandem gene organization (labeled 'Tandem')) as well as all possible pairwise interaction terms were used to build a regression model. After pruning nonsignificant terms, the final multiple regression model explained approximately 10% of the variation in <it>R</it><sub><it>expression </it></sub>and consisted of eight predictors (Table <tblr tid="T2">2</tblr>). Five of these predictors remained significant after applying Bonferroni correction for multiple tests (Table <tblr tid="T2">2</tblr>). These predictors included: Tandem, TSS, and interaction terms between Structure and Tandem, between TSS and Tandem, and between <it>K</it><sub><it>A</it></sub>/<it>K</it><sub><it>S </it></sub>ratio and Cis (Table <tblr tid="T2">2</tblr>). Our computation of the relative contribution of the variability explained (RCVE) for significant predictors (see Materials and methods for details) indicated that each of them makes a sizeable input into the model.</p>
            <tbl id="T2">
               <title>
                  <p>Table 2</p>
               </title>
               <caption>
                  <p>Multiple regression models for expression divergence in duplicate genes</p>
               </caption>
               <tblbdy cols="3">
                  <r>
                     <c ca="left">
                        <p>Predictors</p>
                     </c>
                     <c ca="center">
                        <p><it>P</it>-value</p>
                     </c>
                     <c ca="center">
                        <p>RCVE*</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="3">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Cis<sup>&#8224;</sup></p>
                     </c>
                     <c ca="center">
                        <p>4.2 &#215; 10<sup>-2 </sup>(NS<sup>&#8225;</sup>)</p>
                     </c>
                     <c ca="center">
                        <p>0.075</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>TSS<sup>&#167;</sup></p>
                     </c>
                     <c ca="center">
                        <p>9.9 &#215; 10<sup>-5</sup></p>
                     </c>
                     <c ca="center">
                        <p>0.277</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Tandem<sup>&#182;</sup></p>
                     </c>
                     <c ca="center">
                        <p>2.7 &#215; 10<sup>-6</sup></p>
                     </c>
                     <c ca="center">
                        <p>0.405</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p><it>K</it><sub><it>A </it></sub>&#215; Cis</p>
                     </c>
                     <c ca="center">
                        <p>1.1 &#215; 10<sup>-2 </sup>(NS)</p>
                     </c>
                     <c ca="center">
                        <p>0.118</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p><it>K</it><sub><it>S </it></sub>&#215; Cis</p>
                     </c>
                     <c ca="center">
                        <p>2.7 &#215; 10<sup>-2 </sup>(NS)</p>
                     </c>
                     <c ca="center">
                        <p>0.088</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Structure<sup>&#165; </sup>&#215; Tandem</p>
                     </c>
                     <c ca="center">
                        <p>1.7 &#215; 10<sup>-3</sup></p>
                     </c>
                     <c ca="center">
                        <p>0.180</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>TSS &#215; Tandem</p>
                     </c>
                     <c ca="center">
                        <p>1.1 &#215; 10<sup>-5</sup></p>
                     </c>
                     <c ca="center">
                        <p>0.354</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>&#969;<sup># </sup>&#215; Cis</p>
                     </c>
                     <c ca="center">
                        <p>3.1 &#215; 10<sup>-3</sup></p>
                     </c>
                     <c ca="center">
                        <p>0.159</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>R<sup>2</sup></p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>0.093</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>*RCVE: relative contribution to the variability explained (see Materials and methods for more details). <sup>&#8224;</sup>Cis: divergence of <it>cis</it>-regulatory sequences in 2 kb surrounding TSS (see Materials and methods for more details). <sup>&#8225;</sup>NS: not significant after Bonferroni correction for multiple tests. <sup>&#167;</sup>TSS: shared versus not shared TSSs. <sup>&#182;</sup>Tandem: tandem versus nontandem organization of duplicate genes. <sup>&#165;</sup>Structure: structural heterogeneity in coding sequences. <sup>#</sup>&#969;:<it>K</it><sub><it>A</it></sub>/<it>K</it><sub><it>S </it></sub>ratio.</p>
               </tblfn>
            </tbl>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Discussion</p>
         </st>
         <p>Although it has been shown that duplicate genes diverge rapidly in their expression <abbrgrp><abbr bid="B10">10</abbr><abbr bid="B39">39</abbr><abbr bid="B40">40</abbr><abbr bid="B41">41</abbr></abbrgrp>, little is known about which factors influence their expression divergence at the genomic level <abbrgrp><abbr bid="B42">42</abbr></abbrgrp>. In this study, we investigated three such factors: structural heterogeneity of coding sequences, turnover of TSSs, and divergence of <it>cis</it>-regulatory regions (including insertions of copy-specific TEs).</p>
         <p>Our results indicate that structural differences in coding sequences are common among human duplicate genes. We observed a high proportion of duplicate genes with structural differences even among young duplicates (<it>K</it><sub><it>S </it></sub>&lt;0.1), which is consistent with the findings for <it>C. elegans </it>duplicate genes <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>. Thus, genes might already be structurally different at the point of duplication. In general, duplication by unequal crossing over might not contain the entire coding sequence of a parental gene, and indeed, for the majority of individual young duplicate gene pairs with incompletely similar structures in our data set (for approximately 90% of duplicate pairs of group A), both copies reside on the same chromosome. Over time, duplicate genes accumulate mutations leading to amino acid changes, premature stop codons, and atypical splicing <abbrgrp><abbr bid="B4">4</abbr><abbr bid="B14">14</abbr><abbr bid="B43">43</abbr></abbrgrp>. These mutations might lead to decreasing numbers of duplicate genes retaining their ancestral structure and lead to more rapid divergence in expression and function.</p>
         <p>Alteration of TSSs between duplicate gene copies is likely to have a direct impact on expression divergence. Using sequence similarity analysis, we examined whether duplicate genes share their TSSs. A large number of duplicate genes with distinct TSSs between the two copies were observed and these duplicate gene copies usually had different expression patterns. Although we did not directly estimate the fitness effects of turnover of TSSs on retention of duplicate genes, alteration of TSSs provides a means for the realization of several models of gene duplication evolution (for example, subfunctionalization and neofunctionalization <abbrgrp><abbr bid="B44">44</abbr><abbr bid="B45">45</abbr></abbrgrp>).</p>
         <p>Additionally, we observed that <it>cis</it>-regulatory regions of duplicate genes diverge with time since duplication. This is consistent with several previous reports <abbrgrp><abbr bid="B46">46</abbr><abbr bid="B47">47</abbr><abbr bid="B48">48</abbr></abbrgrp>. We investigated a potential impact of the density of copy-specific TEs on the divergence of duplicate gene expression and, surprisingly, found no major effect. This result corroborates recent findings regarding orthologous mammalian promoters; in human core promoters, the density of most observed repeat classes was significantly below the genomic average, suggesting that insertion of TEs in <it>cis</it>-regulatory regions is prevented by purifying selection <abbrgrp><abbr bid="B36">36</abbr></abbrgrp>.</p>
         <p>Using multiple regression analysis, we observed that shared versus not shared TSS ('TSS'), completely versus incompletely similar structure ('Structure'), divergence of <it>cis</it>-regulatory sequences ('Cis'), the <it>K</it><sub><it>A</it></sub>/<it>K</it><sub><it>S </it></sub>ratio, and tandem versus non-tandem duplicate gene organization played an important role in determining divergence in duplicate gene expression. It is worth noting that all three novel predictors introduced in this manuscript (TSS, Structure, and Cis) significantly influence divergence in duplicate gene expression alone and/or through interaction with other predictors. Interestingly, <it>K</it><sub><it>S</it></sub>, a proxy of evolutionary time, was not a significant predictor in our model. However, as noted above, evolutionary time influences alterations in other predictors and, therefore, the influence of <it>K</it><sub><it>S </it></sub>on <it>R</it><sub><it>expression </it></sub>might be observed through significance of predictors dependent on <it>K</it><sub><it>S</it></sub>. While interaction terms are not straightforward to interpret, the finding that several of them significantly contributed to the model suggests that considering multiple correlated factors might be essential for understanding patterns of duplicate gene expression divergence.</p>
         <p>In this study, expression pattern was used as an indicator of evolution of biological functions after gene duplication. Several studies have suggested that gene expression density and breadth (for example, in housekeeping versus tissue-specific genes) has significantly influenced the evolution of proteins <abbrgrp><abbr bid="B49">49</abbr><abbr bid="B50">50</abbr><abbr bid="B51">51</abbr><abbr bid="B52">52</abbr></abbrgrp>. In addition to gene expression, which is likely a strong predictor <abbrgrp><abbr bid="B53">53</abbr><abbr bid="B54">54</abbr></abbrgrp>, several additional factors have been implicated in protein evolution. Such factors include gene dispensability <abbrgrp><abbr bid="B55">55</abbr><abbr bid="B56">56</abbr></abbrgrp>, protein stability and interaction network <abbrgrp><abbr bid="B57">57</abbr><abbr bid="B58">58</abbr></abbrgrp> as well as codon usage <abbrgrp><abbr bid="B54">54</abbr><abbr bid="B59">59</abbr></abbrgrp>. Although these variables individually explain only a small fraction of variation in the rate of protein evolution, studying them might provide important insights into divergence between duplicate genes.</p>
         <p>Most gene evolution models have assumed that two duplicate gene copies are expressed equally immediately after duplication. However, similarly to coding sequences, promoter regions might also be incompletely duplicated between copies; this possibility needs to be evaluated in future studies. Frequently, because of the complex evolutionary dynamics of promoter sequences <abbrgrp><abbr bid="B47">47</abbr><abbr bid="B60">60</abbr><abbr bid="B61">61</abbr></abbrgrp>, it is difficult to distinguish incomplete promoter duplication from rapid promoter evolution after duplication.</p>
         <p>Reconstruction of ancestral gene expression state can be performed using a parsimony-based procedure in multi-gene families <abbrgrp><abbr bid="B62">62</abbr></abbrgrp>, instead of using the pairwise analysis employed here. However, rigorous filtering for potential cross-hybridization of transcripts of genes from the same multi-gene family in our study makes such ancestral reconstruction difficult. Thus, additional studies using different types of expression data may allow us to decompose the expression divergence of genes in multi-gene families and thus provide us with additional methodological insights for understanding gene expression divergence.</p>
         <p>In the present study, as expected, we observed a significant negative correlation between the synonymous rate and Pearson correlation coefficient of expression values between duplicate gene copies; however, the resulting correlation was weaker than in our previous study <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>. There might be several potential reasons explaining this difference (for example, different <it>K</it><sub><it>S </it></sub>thresholds used in the two studies and a greater number of tissues used in the present study). However, the major advance of the present study compared with the previous one <abbrgrp><abbr bid="B10">10</abbr></abbrgrp> is a more rigorous filtering for potential cross-hybridization of transcripts of two duplicate gene copies to the same probe, and thus we consider the present results more robust.</p>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>The present study represents the first report of the effects of structural differences in coding region and of unique TSSs on the divergence of duplicate gene expression. Our observations of frequent turnover of TSSs between duplicate genes and a high proportion of young duplicate genes with incompletely similar structures contradict the assumptions of classic gene duplication models, according to which duplicate genes are considered to be equal both structurally and functionally at the point of duplication <abbrgrp><abbr bid="B4">4</abbr><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr></abbrgrp>. Although potential incomplete duplication of promoters will be the subject of future studies, our investigation of factors contributing to expression divergence of duplicate genes provides important information for understanding human transcriptome heterogeneity, complexity, and evolution.</p>
      </sec>
      <sec>
         <st>
            <p>Materials and methods</p>
         </st>
         <sec>
            <st>
               <p>Identification of duplicate gene pairs</p>
            </st>
            <p>To cluster genes into families, we downloaded 48,218 protein sequences of consensus coding sequences, known and novel genes from Ensembl (release 38 of NCBI build 36) and independently used the FASTA <abbrgrp><abbr bid="B63">63</abbr></abbrgrp> and TRIBE-MCL <abbrgrp><abbr bid="B64">64</abbr></abbrgrp> methods to define duplicate gene families. Briefly, for the FASTA method, each protein sequence was used as a query to search against all other protein sequences using FASTA <abbrgrp><abbr bid="B65">65</abbr></abbrgrp> with E &lt; 10. Two protein sequences formed a link if: the aligned region was >80% of the longer protein; and the identity between two proteins was &#8805; 30% for alignments longer than 150 amino acids or &#8805; (0.01n + 4.8L<sup>-0.32 [1+exp(-L/1000)]</sup>) otherwise, where <it>L </it>is the alignable length between two proteins and n = 6. The formula above was derived from empirical data, which suggested that a higher sequence identity was required for shorter proteins <abbrgrp><abbr bid="B66">66</abbr></abbrgrp>. These gene pairs were grouped into gene families according to the single linkage clustering algorithm. For gene families derived by TRIBE-MCL, we downloaded the gene annotations through BioMart in the Ensembl database, and considered gene families with at least two members.</p>
            <p>To identify independent pairs of duplicate genes within each gene family, we sorted gene pairs in ascending order of <it>K</it><sub><it>S </it></sub>and selected the pair with the lowest <it>K</it><sub><it>S</it></sub>. After excluding genes that had been picked, we chose the next gene pair with the lowest <it>K</it><sub><it>S</it></sub>. These steps were repeated for each gene family. All genes encoding proteins were realigned using CLUSTALW <abbrgrp><abbr bid="B67">67</abbr></abbrgrp>, and the yn00 module <abbrgrp><abbr bid="B68">68</abbr></abbrgrp> of PAML <abbrgrp><abbr bid="B69">69</abbr></abbrgrp> was used to calculate <it>K</it><sub><it>S</it></sub>. We counted duplicate gene pairs in intervals of size <it>K</it><sub><it>S </it></sub>= 0.01 to derive the instantaneous rate of <it>K</it><sub><it>S</it></sub> according to <abbrgrp><abbr bid="B35">35</abbr></abbrgrp>.</p>
            <p>Duplicate gene copies in which one of the pair has one exon and the duplicate copy has multiple exons were called retrotransposed duplicate gene copies. In addition, duplicate gene pairs were classified as tandem duplicates if there were no genes separating them.</p>
         </sec>
         <sec>
            <st>
               <p>Expression data analysis</p>
            </st>
            <p>Expression data for 61 non-redundant and nonpathogenic human tissues in U133A and GNF1H Affymetrix arrays were obtained from <abbrgrp><abbr bid="B28">28</abbr></abbrgrp>. To validate mapping between probe sets and genes, we aligned the transcripts of consensus coding sequences, known genes, and novel genes downloaded from Ensembl (release 38 of NCBI build 36) with the exemplar and consensus sequences for each array using BLAST <abbrgrp><abbr bid="B70">70</abbr></abbrgrp> with E &lt; 10<sup>-20</sup>. According to the criteria described in <abbrgrp><abbr bid="B71">71</abbr><abbr bid="B72">72</abbr></abbrgrp>, the acceptable alignments were selected if: the identity was 100% and the length was greater than 49 bp; or the identity was higher than 94% and the length was at least either 99 bp or 90% of the length of the query. We considered three scenarios for mapping relationships: a single probe set hitting one gene (9,508 probe sets); multiple probe sets hitting one gene (13,186 probe sets and 4,997 genes); and a single probe set hitting multiple genes (4,493 probe sets and 6,764 genes). All genes following the first two scenarios were utilized in the present study. For each gene following the second scenario, the probe set with the highest expression value (defined by average difference) was selected. All genes following the third scenario were removed from the analysis due to potential cross-hybridization. Following <abbrgrp><abbr bid="B28">28</abbr></abbrgrp>, genes with average difference >200 in a particular tissue were considered to be expressed in this tissue.</p>
         </sec>
         <sec>
            <st>
               <p>Identification of putative TSSs</p>
            </st>
            <p>The putative TSSs were identified using the method described in the ENCODE pilot project <abbrgrp><abbr bid="B73">73</abbr></abbrgrp>. Briefly, we utilized tag clusters from two sets of 5'-end-tag-capture technologies: CAGE <abbrgrp><abbr bid="B20">20</abbr></abbrgrp> and PETs <abbrgrp><abbr bid="B31">31</abbr></abbrgrp>. If two tag clusters were located on the same strand and within 60 bp (which was derived from analyzing the distribution of distances between tag clusters in <abbrgrp><abbr bid="B73">73</abbr></abbrgrp>) of each other, they were considered as one tag cluster. To map tag clusters to genes, the following two criteria were considered. First, the strand of a tag cluster was required to be identical to the strand of a gene. Second, a tag cluster was required to be located in the 5' upstream region from the most upstream start codon of a gene. Because we constructed artificial coding regions of genes by including all their exons, our analysis is not affected by alternative start codons. To confirm the reliability of the tag data, RefSeq <abbrgrp><abbr bid="B74">74</abbr></abbrgrp>, H-Invitational <abbrgrp><abbr bid="B75">75</abbr></abbrgrp> and human ESTs <abbrgrp><abbr bid="B76">76</abbr></abbrgrp> RNA data from the UCSC Genome Browser <abbrgrp><abbr bid="B77">77</abbr></abbrgrp> were utilized. We excluded tag clusters with a single tag as well as those whose coordinates did not overlap with the genomic coordinates of the 5' end of cDNAs or ESTs. To define a representative tag site (to be used as a putative TSS) for each tag cluster, we selected the tag site that was supported by the highest number of 5' start sites. Otherwise, if several sites in a tag cluster had the same number of 5' start sites, the central coordinate of this tag cluster was defined as the representative tag site.</p>
         </sec>
         <sec>
            <st>
               <p>Analysis of turnover of TSSs between human-mouse orthologous gene pairs</p>
            </st>
            <p>To evaluate conservation of TSSs between human-mouse orthologous genes, we obtained two distinct classes of orthologous genes from <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>. Briefly, 'conserved promoter regions' means that upstream sequences of TSSs between human and mouse orthologous genes were aligned; otherwise, 'non-conserved promoter regions' means there were no significant alignments. We excluded orthologous genes that were classified into both classes because alternatively spliced variants of each gene had different conservation patterns of promoter regions. As a result, 1,610 orthologous gene pairs that were classified into just one class in a mutually exclusive manner were retained. We downloaded human and mouse protein sequences from Ensembl (release 38 of NCBI build 36). All genes were aligned using CLUSTALW <abbrgrp><abbr bid="B67">67</abbr></abbrgrp>, and the yn00 module <abbrgrp><abbr bid="B68">68</abbr></abbrgrp> of PAML <abbrgrp><abbr bid="B69">69</abbr></abbrgrp> was used to calculate <it>K</it><sub><it>S </it></sub>between orthologous genes.</p>
         </sec>
         <sec>
            <st>
               <p>Classification of the type of gene duplication into structural categories</p>
            </st>
            <p>Structural categorization of duplicate genes was performed using reconstructed full-length coding sequences. We downloaded annotated human genome data from Ensembl (release 38 of NCBI build 36). Alternatively spliced variants lacking start or stop codons or lacking canonical exon boundaries (5'-GT...AG-3', 5'-GC...AG-3', or 5'-AT...AC-3') were excluded. For each gene with several alternatively spliced variants, all exons were aligned against each other, and, if some exons overlapped, they were merged in a single exon. Next, exons were sorted by their genomic coordinates and were reassembled to form reconstructed full-length coding sequences.</p>
            <p>The reconstructed full-length coding sequences were aligned using AVID <abbrgrp><abbr bid="B78">78</abbr></abbrgrp> with default parameters. Each pair of duplicate genes was classified into one of the four structural categories: completely similar, 5' similar, 3' similar, and neither 5' nor 3' similar. If the proportion of aligned sequences was greater than 0.9, duplicate gene pairs were categorized as completely similar. The other duplicate gene pairs were exclusively classified in just one category of 5' similar, 3' similar, or neither 5' nor 3' similar. If alignments between the two copies started at the start codons of both copies, then such duplicates were classified as 5' similar. Alternatively, if the alignments ended at the stop codons of both copies, we classified the duplicate genes into 3' similar. Finally, the remaining duplicate gene pairs were labeled as neither 5' nor 3' similar.</p>
         </sec>
         <sec>
            <st>
               <p><it>Cis</it>-regulatory regions analysis</p>
            </st>
            <p>To detect homologous sequences in <it>cis</it>-regulatory regions, we used a modified version of REALIGNER <abbrgrp><abbr bid="B37">37</abbr></abbrgrp>. Using BL2SEQ (part of the Blast suite <abbrgrp><abbr bid="B70">70</abbr></abbrgrp>) with mismatch penalty equal to -2 and word size equal to 7, we constructed alignments of 2-kb (-1.5 kb to +0.5 kb) genomic regions surrounding putative TSSs between copies in each duplicate gene pair. We selected alignments satisfying three criteria: hit length >7 bp; identity >70%; and identical hit strand. If two local alignments overlapped, an alignment with the higher bit score was retained. If the bit scores of the two overlapping alignments were identical, a longer alignment or the one closest to TSS was retained. If the two local alignments were not syntenic (the order of blocks in each alignment was inconsistent), an alignment with the lower bit score was removed. Finally, all local alignments ordered by their genomic coordinates were used as a conserved <it>cis</it>-regulatory region for a duplicate gene pair.</p>
            <p>TEs within <it>cis</it>-regulatory regions were classified into two sets: with the insertion occurring in the ancestral sequence before duplication of a genomic region; with the insertion in only one duplicate copy after the duplication event. We used the Repeatmasker <abbrgrp><abbr bid="B79">79</abbr></abbrgrp> tables at the UCSC Genome Browser <abbrgrp><abbr bid="B77">77</abbr></abbrgrp> to map the coordinates of TEs into <it>cis</it>-regulatory regions.</p>
         </sec>
         <sec>
            <st>
               <p>Multiple regression analysis</p>
            </st>
            <p>Linear multiple regression analysis was performed in the R statistical package. The original model included all seven predictors and their interaction terms, but was pruned to include only significant predictors (and significant interaction terms). RCVE <abbrgrp><abbr bid="B80">80</abbr><abbr bid="B81">81</abbr></abbrgrp> was utilized to assess the contribution of each predictor to explaining the total variability:</p>
            <p>
               <display-formula>
                  <m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="gb-2009-10-1-r10-i1">
                     <m:semantics>
                        <m:mrow>
                           <m:mi>R</m:mi>
                           <m:mi>C</m:mi>
                           <m:mi>V</m:mi>
                           <m:mi>E</m:mi>
                           <m:mo>=</m:mo>
                           <m:mfrac>
                              <m:mrow>
                                 <m:msubsup>
                                    <m:mi>R</m:mi>
                                    <m:mrow>
                                       <m:mi>f</m:mi>
                                       <m:mi>u</m:mi>
                                       <m:mi>l</m:mi>
                                       <m:mi>l</m:mi>
                                    </m:mrow>
                                    <m:mn>2</m:mn>
                                 </m:msubsup>
                                 <m:mo>&#8722;</m:mo>
                                 <m:msubsup>
                                    <m:mi>R</m:mi>
                                    <m:mrow>
                                       <m:mi>r</m:mi>
                                       <m:mi>e</m:mi>
                                       <m:mi>d</m:mi>
                                       <m:mi>u</m:mi>
                                       <m:mi>c</m:mi>
                                       <m:mi>e</m:mi>
                                       <m:mi>d</m:mi>
                                    </m:mrow>
                                    <m:mn>2</m:mn>
                                 </m:msubsup>
                              </m:mrow>
                              <m:mrow>
                                 <m:msubsup>
                                    <m:mi>R</m:mi>
                                    <m:mrow>
                                       <m:mi>f</m:mi>
                                       <m:mi>u</m:mi>
                                       <m:mi>l</m:mi>
                                       <m:mi>l</m:mi>
                                    </m:mrow>
                                    <m:mn>2</m:mn>
                                 </m:msubsup>
                              </m:mrow>
                           </m:mfrac>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2Caerbhv2BYDwAHbqedmvETj2BSbqee0evGueE0jxyaibaiKI8=vI8GiVeY=Pipec8Eeeu0xXdbba9frFj0xb9Lqpepeea0xd9q8qiYRWxGi6xij=hbbc9s8aq0=yqpe0xbbG8A8frFve9Fve9Fj0dmeaabaqaciGacaGaaeqabaqabeGadaaakeaacaWGsbGaam4qaiaadAfacaWGfbGaeyypa0tcfa4aaSaaaeaacaWGsbWaa0baaeaacaWGMbGaamyDaiaadYgacaWGSbaabaGaaGOmaaaacqGHsislcaWGsbWaa0baaeaacaWGYbGaamyzaiaadsgacaWG1bGaam4yaiaadwgacaWGKbaabaGaaGOmaaaaaeaacaWGsbWaa0baaeaacaWGMbGaamyDaiaadYgacaWGSbaabaGaaGOmaaaaaaaaaa@4946@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>where <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="gb-2009-10-1-r10-i2"><m:semantics><m:mrow><m:msubsup><m:mi>R</m:mi><m:mrow><m:mi>f</m:mi><m:mi>u</m:mi><m:mi>l</m:mi><m:mi>l</m:mi></m:mrow><m:mn>2</m:mn></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2Caerbhv2BYDwAHbqedmvETj2BSbqee0evGueE0jxyaibaiKI8=vI8viVeY=Nipec8Eeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaamOuamaaDaaaleaacaWGMbGaamyDaiaadYgacaWGSbaabaGaaGOmaaaaaaa@352A@</m:annotation></m:semantics></m:math></inline-formula> and <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="gb-2009-10-1-r10-i3"><m:semantics><m:mrow><m:msubsup><m:mi>R</m:mi><m:mrow><m:mi>r</m:mi><m:mi>e</m:mi><m:mi>d</m:mi><m:mi>u</m:mi><m:mi>c</m:mi><m:mi>e</m:mi><m:mi>d</m:mi></m:mrow><m:mn>2</m:mn></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2Caerbhv2BYDwAHbqedmvETj2BSbqee0evGueE0jxyaibaiKI8=vI8viVeY=Nipec8Eeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaamOuamaaDaaaleaacaWGYbGaamyzaiaadsgacaWG1bGaam4yaiaadwgacaWGKbaabaGaaGOmaaaaaaa@37E2@</m:annotation></m:semantics></m:math></inline-formula> are the <it>R</it><sup>2 </sup>for the full model and the model except for the predictor of interest, respectively. In addition, variance inflation factors <abbrgrp><abbr bid="B82">82</abbr></abbrgrp> were calculated for each predictor to diagnose multicollinearity. All predictors and their interaction terms included in the final model had variance inflation factors below 2 (data not shown), suggesting that multicollinearity was not adversely affecting the model.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Abbreviations</p>
         </st>
         <p>CAGE: cap analysis of gene expression; <it>K</it><sub><it>A</it></sub>: nonsynonymous divergence; <it>K</it><sub><it>S</it></sub>: synonymous rate; PET: paired-end ditag; RCVE: relative contribution to variability explained; TE: transposable element; TSS: transcription start site.</p>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>CP and KDM designed the experiments and wrote the manuscript. CP performed data analyses.</p>
      </sec>
      <sec>
         <st>
            <p>Additional data files</p>
         </st>
         <p>The following additional data are available with the online version of this paper. Additional data file <supplr sid="S1">1</supplr> is a table listing the classification of duplicate gene pairs based on the absence or presence of shared TSSs and different duplication mechanisms. Additional data file <supplr sid="S2">2</supplr> is a Venn diagram depicting the number of duplicate gene pairs that were identified by the FASTA and TRIBE-MCL methods. Additional data file <supplr sid="S3">3</supplr> shows average sequence identity between TSS regions of duplicate genes. Additional data file <supplr sid="S4">4</supplr> shows number of duplicate gene pairs with shared TSSs (A) and without shared TSSs (B) plotted against the instantaneous rate of <it>K</it><sub><it>S</it></sub>. Additional data file <supplr sid="S5">5</supplr> shows proportions of group A duplicate gene pairs with shared TSSs depending on different identity thresholds. Additional data file <supplr sid="S6">6</supplr> shows number of duplicate gene pairs in different structure categories plotted against the instantaneous rate of <it>K</it><sub><it>S</it></sub>.</p>
         <suppl id="S1">
            <title>
               <p>Additional data file 1</p>
            </title>
            <caption>
               <p>Classification of duplicate gene pairs based on the absence or presence of shared TSSs and different duplication mechanisms</p>
            </caption>
            <text>
               <p>Classification of duplicate gene pairs based on the absence or presence of shared TSSs and different duplication mechanisms.</p>
            </text>
            <file name="gb-2009-10-1-r10-S1.eps">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S2">
            <title>
               <p>Additional data file 2</p>
            </title>
            <caption>
               <p>Number of duplicate gene pairs that were identified by the FASTA and TRIBE-MCL methods</p>
            </caption>
            <text>
               <p>Number of duplicate gene pairs that were identified by the FASTA and TRIBE-MCL methods.</p>
            </text>
            <file name="gb-2009-10-1-r10-S2.eps">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S3">
            <title>
               <p>Additional data file 3</p>
            </title>
            <caption>
               <p>Average sequence identity between TSS regions of duplicate genes</p>
            </caption>
            <text>
               <p>The identities were obtained by BL2SEQ <abbrgrp><abbr bid="B70">70</abbr></abbrgrp> with default parameters. Black bars represent 110 bp (-20 bp to +90 bp) surrounding each TSS.</p>
            </text>
            <file name="gb-2009-10-1-r10-S3.eps">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S4">
            <title>
               <p>Additional data file 4</p>
            </title>
            <caption>
               <p>Number of duplicate gene pairs with shared TSSs and without shared TSSs plotted against the instantaneous rate of <it>K</it><sub><it>S</it></sub></p>
            </caption>
            <text>
               <p>Number of duplicate gene pairs (A) with shared TSSs and (B) without shared TSSs plotted against the instantaneous rate of <it>K</it><sub><it>S</it></sub>.</p>
            </text>
            <file name="gb-2009-10-1-r10-S4.eps">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S5">
            <title>
               <p>Additional data file 5</p>
            </title>
            <caption>
               <p>Proportions of group A duplicate gene pairs with shared TSSs depending on different identity thresholds</p>
            </caption>
            <text>
               <p>Proportions of group A duplicate gene pairs with shared TSSs depending on different identity thresholds.</p>
            </text>
            <file name="gb-2009-10-1-r10-S5.eps">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S6">
            <title>
               <p>Additional data file 6</p>
            </title>
            <caption>
               <p>Number of duplicate gene pairs in different structure categories plotted against the instantaneous rate of <it>K</it><sub><it>S</it></sub></p>
            </caption>
            <text>
               <p>Number of duplicate gene pairs in different structure categories plotted against the instantaneous rate of <it>K</it><sub><it>S</it></sub>.</p>
            </text>
            <file name="gb-2009-10-1-r10-S6.eps">
               <p>Click here for file</p>
            </file>
         </suppl>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>We thank Ross Hardison, Webb Miller, Francesca Chiaromonte, Laura Carrel, and Claude dePamphilis for valuable discussions. We are grateful to Melissa Wilson for comments on the manuscript. This work was supported by start-up funds from Penn State (to KDM).</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <aug>
               <au>
                  <snm>Ohno</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Evolution by Gene Duplication</source>
            <publisher>New York: Springer Verlag</publisher>
            <pubdate>1970</pubdate>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Duplication and divergence: the evolution of new genes and old ideas.</p>
            </title>
            <aug>
               <au>
                  <snm>Taylor</snm>
                  <fnm>JS</fnm>
               </au>
               <au>
                  <snm>Raes</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Annu Rev Genet</source>
            <pubdate>2004</pubdate>
            <volume>38</volume>
            <fpage>615</fpage>
            <lpage>643</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1146/annurev.genet.38.072902.092831</pubid>
                  <pubid idtype="pmpid" link="fulltext">15568988</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Selection and gene duplication: a view from the genome.</p>
            </title>
            <aug>
               <au>
                  <snm>Wagner</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2002</pubdate>
            <volume>3</volume>
            <fpage>reviews1012</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">139360</pubid>
                  <pubid idtype="pmpid" link="fulltext">12049669</pubid>
                  <pubid idtype="doi">10.1186/gb-2002-3-5-reviews1012</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Evolution by gene duplication: an update.</p>
            </title>
            <aug>
               <au>
                  <snm>Zhang</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Trends Ecol Evol</source>
            <pubdate>2003</pubdate>
            <volume>18</volume>
            <fpage>292</fpage>
            <lpage>298</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1016/S0169-5347(03)00033-8</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>The evolutionary fate and consequences of duplicate genes.</p>
            </title>
            <aug>
               <au>
                  <snm>Lynch</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Conery</snm>
                  <fnm>JS</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2000</pubdate>
            <volume>290</volume>
            <fpage>1151</fpage>
            <lpage>1155</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.290.5494.1151</pubid>
                  <pubid idtype="pmpid" link="fulltext">11073452</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>The next generation of microarray research: applications in evolutionary and ecological genomics.</p>
            </title>
            <aug>
               <au>
                  <snm>Shiu</snm>
                  <fnm>SH</fnm>
               </au>
               <au>
                  <snm>Borevitz</snm>
                  <fnm>JO</fnm>
               </au>
            </aug>
            <source>Heredity</source>
            <pubdate>2008</pubdate>
            <volume>100</volume>
            <fpage>141</fpage>
            <lpage>149</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/sj.hdy.6800916</pubid>
                  <pubid idtype="pmpid" link="fulltext">17091126</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Asymmetric sequence divergence of duplicate genes.</p>
            </title>
            <aug>
               <au>
                  <snm>Conant</snm>
                  <fnm>GC</fnm>
               </au>
               <au>
                  <snm>Wagner</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2003</pubdate>
            <volume>13</volume>
            <fpage>2052</fpage>
            <lpage>2058</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">403682</pubid>
                  <pubid idtype="pmpid" link="fulltext">12952876</pubid>
                  <pubid idtype="doi">10.1101/gr.1252603</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Rapid evolution of expression and regulatory divergences after yeast gene duplication.</p>
            </title>
            <aug>
               <au>
                  <snm>Gu</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Huang</snm>
                  <fnm>W</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2005</pubdate>
            <volume>102</volume>
            <fpage>707</fpage>
            <lpage>712</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">545572</pubid>
                  <pubid idtype="pmpid" link="fulltext">15647348</pubid>
                  <pubid idtype="doi">10.1073/pnas.0409186102</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Rapid divergence in expression between duplicate genes inferred from microarray data.</p>
            </title>
            <aug>
               <au>
                  <snm>Gu</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Nicolae</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Lu</snm>
                  <fnm>HH</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>WH</fnm>
               </au>
            </aug>
            <source>Trends Genet</source>
            <pubdate>2002</pubdate>
            <volume>18</volume>
            <fpage>609</fpage>
            <lpage>613</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0168-9525(02)02837-8</pubid>
                  <pubid idtype="pmpid" link="fulltext">12446139</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>Divergence in the spatial pattern of gene expression between human duplicate genes.</p>
            </title>
            <aug>
               <au>
                  <snm>Makova</snm>
                  <fnm>KD</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>WH</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2003</pubdate>
            <volume>13</volume>
            <fpage>1638</fpage>
            <lpage>1645</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">403737</pubid>
                  <pubid idtype="pmpid" link="fulltext">12840042</pubid>
                  <pubid idtype="doi">10.1101/gr.1133803</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Decoupled evolution of coding region and mRNA expression patterns after gene duplication: implications for the neutralist-selectionist debate.</p>
            </title>
            <aug>
               <au>
                  <snm>Wagner</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2000</pubdate>
            <volume>97</volume>
            <fpage>6579</fpage>
            <lpage>6584</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">18666</pubid>
                  <pubid idtype="pmpid" link="fulltext">10823904</pubid>
                  <pubid idtype="doi">10.1073/pnas.110147097</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Different evolutionary patterns between young duplicate genes in the human genome.</p>
            </title>
            <aug>
               <au>
                  <snm>Zhang</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Gu</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>WH</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2003</pubdate>
            <volume>4</volume>
            <fpage>R56</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">193656</pubid>
                  <pubid idtype="pmpid" link="fulltext">12952535</pubid>
                  <pubid idtype="doi">10.1186/gb-2003-4-9-r56</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>The altered evolutionary trajectories of gene duplicates.</p>
            </title>
            <aug>
               <au>
                  <snm>Lynch</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Katju</snm>
                  <fnm>V</fnm>
               </au>
            </aug>
            <source>Trends Genet</source>
            <pubdate>2004</pubdate>
            <volume>20</volume>
            <fpage>544</fpage>
            <lpage>549</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.tig.2004.09.001</pubid>
                  <pubid idtype="pmpid" link="fulltext">15475113</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>Gene duplication: the genomic trade in spare parts.</p>
            </title>
            <aug>
               <au>
                  <snm>Hurles</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>PLoS Biol</source>
            <pubdate>2004</pubdate>
            <volume>2</volume>
            <fpage>E206</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">449868</pubid>
                  <pubid idtype="pmpid" link="fulltext">15252449</pubid>
                  <pubid idtype="doi">10.1371/journal.pbio.0020206</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>The structure and early evolution of recently arisen gene duplicates in the <it>Caenorhabditis elegans </it>genome.</p>
            </title>
            <aug>
               <au>
                  <snm>Katju</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Lynch</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Genetics</source>
            <pubdate>2003</pubdate>
            <volume>165</volume>
            <fpage>1793</fpage>
            <lpage>1803</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1462873</pubid>
                  <pubid idtype="pmpid" link="fulltext">14704166</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Origin of a substantial fraction of human regulatory sequences from transposable elements.</p>
            </title>
            <aug>
               <au>
                  <snm>Jordan</snm>
                  <fnm>IK</fnm>
               </au>
               <au>
                  <snm>Rogozin</snm>
                  <fnm>IB</fnm>
               </au>
               <au>
                  <snm>Glazko</snm>
                  <fnm>GV</fnm>
               </au>
               <au>
                  <snm>Koonin</snm>
                  <fnm>EV</fnm>
               </au>
            </aug>
            <source>Trends Genet</source>
            <pubdate>2003</pubdate>
            <volume>19</volume>
            <fpage>68</fpage>
            <lpage>72</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0168-9525(02)00006-9</pubid>
                  <pubid idtype="pmpid" link="fulltext">12547512</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>Transposable elements as a significant source of transcription regulating signals.</p>
            </title>
            <aug>
               <au>
                  <snm>Thornburg</snm>
                  <fnm>BG</fnm>
               </au>
               <au>
                  <snm>Gotea</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Makalowski</snm>
                  <fnm>W</fnm>
               </au>
            </aug>
            <source>Gene</source>
            <pubdate>2006</pubdate>
            <volume>365</volume>
            <fpage>104</fpage>
            <lpage>110</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.gene.2005.09.036</pubid>
                  <pubid idtype="pmpid" link="fulltext">16376497</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>Mechanisms in tissue-specific regulation of estrogen biosynthesis in humans.</p>
            </title>
            <aug>
               <au>
                  <snm>Kamat</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Hinshelwood</snm>
                  <fnm>MM</fnm>
               </au>
               <au>
                  <snm>Murry</snm>
                  <fnm>BA</fnm>
               </au>
               <au>
                  <snm>Mendelson</snm>
                  <fnm>CR</fnm>
               </au>
            </aug>
            <source>Trends Endocrinol Metab</source>
            <pubdate>2002</pubdate>
            <volume>13</volume>
            <fpage>122</fpage>
            <lpage>128</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S1043-2760(02)00567-2</pubid>
                  <pubid idtype="pmpid" link="fulltext">11893526</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>Transposable elements in mammals promote regulatory variation and diversification of genes with specialized functions.</p>
            </title>
            <aug>
               <au>
                  <snm>Lagemaat</snm>
                  <mnm>van de</mnm>
                  <fnm>LN</fnm>
               </au>
               <au>
                  <snm>Landry</snm>
                  <fnm>JR</fnm>
               </au>
               <au>
                  <snm>Mager</snm>
                  <fnm>DL</fnm>
               </au>
               <au>
                  <snm>Medstrand</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Trends Genet</source>
            <pubdate>2003</pubdate>
            <volume>19</volume>
            <fpage>530</fpage>
            <lpage>536</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.tig.2003.08.004</pubid>
                  <pubid idtype="pmpid" link="fulltext">14550626</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>Genome-wide analysis of mammalian promoter architecture and evolution.</p>
            </title>
            <aug>
               <au>
                  <snm>Carninci</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Sandelin</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Lenhard</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Katayama</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Shimokawa</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Ponjavic</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Semple</snm>
                  <fnm>CA</fnm>
               </au>
               <au>
                  <snm>Taylor</snm>
                  <fnm>MS</fnm>
               </au>
               <au>
                  <snm>Engstrom</snm>
                  <fnm>PG</fnm>
               </au>
               <au>
                  <snm>Frith</snm>
                  <fnm>MC</fnm>
               </au>
               <au>
                  <snm>Forrest</snm>
                  <fnm>AR</fnm>
               </au>
               <au>
                  <snm>Alkema</snm>
                  <fnm>WB</fnm>
               </au>
               <au>
                  <snm>Tan</snm>
                  <fnm>SL</fnm>
               </au>
               <au>
                  <snm>Plessy</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Kodzius</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Ravasi</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Kasukawa</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Fukuda</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Kanamori-Katayama</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Kitazume</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Kawaji</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Kai</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Nakamura</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Konno</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Nakano</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Mottagui-Tabar</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Arner</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Chesi</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Gustincich</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Persichetti</snm>
                  <fnm>F</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nat Genet</source>
            <pubdate>2006</pubdate>
            <volume>38</volume>
            <fpage>626</fpage>
            <lpage>635</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/ng1789</pubid>
                  <pubid idtype="pmpid" link="fulltext">16645617</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>A high-resolution map of active promoters in the human genome.</p>
            </title>
            <aug>
               <au>
                  <snm>Kim</snm>
                  <fnm>TH</fnm>
               </au>
               <au>
                  <snm>Barrera</snm>
                  <fnm>LO</fnm>
               </au>
               <au>
                  <snm>Zheng</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Qu</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Singer</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Richmond</snm>
                  <fnm>TA</fnm>
               </au>
               <au>
                  <snm>Wu</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Green</snm>
                  <fnm>RD</fnm>
               </au>
               <au>
                  <snm>Ren</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2005</pubdate>
            <volume>436</volume>
            <fpage>876</fpage>
            <lpage>880</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1895599</pubid>
                  <pubid idtype="pmpid" link="fulltext">15988478</pubid>
                  <pubid idtype="doi">10.1038/nature03877</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>Diversification of transcriptional modulation: large-scale identification and characterization of putative alternative promoters of human genes.</p>
            </title>
            <aug>
               <au>
                  <snm>Kimura</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Wakamatsu</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Suzuki</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Ota</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Nishikawa</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Yamashita</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Yamamoto</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Sekine</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Tsuritani</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Wakaguri</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Ishii</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Sugiyama</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Saito</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Isono</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Irie</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Kushida</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Yoneyama</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Otsuka</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Kanda</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Yokoi</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Kondo</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Wagatsuma</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Murakawa</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Ishida</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Ishibashi</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Takahashi-Fujii</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Tanase</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Nagai</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Kikuchi</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Nakai</snm>
                  <fnm>K</fnm>
               </au>
               <etal/>
            </aug>
            <source>Genome Res</source>
            <pubdate>2006</pubdate>
            <volume>16</volume>
            <fpage>55</fpage>
            <lpage>65</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1356129</pubid>
                  <pubid idtype="pmpid" link="fulltext">16344560</pubid>
                  <pubid idtype="doi">10.1101/gr.4039406</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>Distinct class of putative "non-conserved" promoters in humans: comparative studies of alternative promoters of human and mouse genes.</p>
            </title>
            <aug>
               <au>
                  <snm>Tsuritani</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Irie</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Yamashita</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Sakakibara</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Wakaguri</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Kanai</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Mizushima-Sugano</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Sugano</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Nakai</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Suzuki</snm>
                  <fnm>Y</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2007</pubdate>
            <volume>17</volume>
            <fpage>1005</fpage>
            <lpage>1014</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1899111</pubid>
                  <pubid idtype="pmpid" link="fulltext">17567985</pubid>
                  <pubid idtype="doi">10.1101/gr.6030107</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>Comprehensive analysis of transcriptional promoter structure and function in 1% of the human genome.</p>
            </title>
            <aug>
               <au>
                  <snm>Cooper</snm>
                  <fnm>SJ</fnm>
               </au>
               <au>
                  <snm>Trinklein</snm>
                  <fnm>ND</fnm>
               </au>
               <au>
                  <snm>Anton</snm>
                  <fnm>ED</fnm>
               </au>
               <au>
                  <snm>Nguyen</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Myers</snm>
                  <fnm>RM</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2006</pubdate>
            <volume>16</volume>
            <fpage>1</fpage>
            <lpage>10</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1356123</pubid>
                  <pubid idtype="pmpid" link="fulltext">16344566</pubid>
                  <pubid idtype="doi">10.1101/gr.4222606</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>Complex controls: the role of alternative promoters in mammalian genomes.</p>
            </title>
            <aug>
               <au>
                  <snm>Landry</snm>
                  <fnm>JR</fnm>
               </au>
               <au>
                  <snm>Mager</snm>
                  <fnm>DL</fnm>
               </au>
               <au>
                  <snm>Wilhelm</snm>
                  <fnm>BT</fnm>
               </au>
            </aug>
            <source>Trends Genet</source>
            <pubdate>2003</pubdate>
            <volume>19</volume>
            <fpage>640</fpage>
            <lpage>648</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.tig.2003.09.014</pubid>
                  <pubid idtype="pmpid" link="fulltext">14585616</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>Promoting transcriptome diversity.</p>
            </title>
            <aug>
               <au>
                  <snm>Strausberg</snm>
                  <fnm>RL</fnm>
               </au>
               <au>
                  <snm>Levy</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2007</pubdate>
            <volume>17</volume>
            <fpage>965</fpage>
            <lpage>968</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1101/gr.6499807</pubid>
                  <pubid idtype="pmpid" link="fulltext">17606983</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B27">
            <title>
               <p>Identification and functional analysis of human transcriptional promoters.</p>
            </title>
            <aug>
               <au>
                  <snm>Trinklein</snm>
                  <fnm>ND</fnm>
               </au>
               <au>
                  <snm>Aldred</snm>
                  <fnm>SJ</fnm>
               </au>
               <au>
                  <snm>Saldanha</snm>
                  <fnm>AJ</fnm>
               </au>
               <au>
                  <snm>Myers</snm>
                  <fnm>RM</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2003</pubdate>
            <volume>13</volume>
            <fpage>308</fpage>
            <lpage>312</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">420378</pubid>
                  <pubid idtype="pmpid" link="fulltext">12566409</pubid>
                  <pubid idtype="doi">10.1101/gr.794803</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B28">
            <title>
               <p>A gene atlas of the mouse and human protein-encoding transcriptomes.</p>
            </title>
            <aug>
               <au>
                  <snm>Su</snm>
                  <fnm>AI</fnm>
               </au>
               <au>
                  <snm>Wiltshire</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Batalov</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Lapp</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Ching</snm>
                  <fnm>KA</fnm>
               </au>
               <au>
                  <snm>Block</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Soden</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Hayakawa</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Kreiman</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Cooke</snm>
                  <fnm>MP</fnm>
               </au>
               <au>
                  <snm>Walker</snm>
                  <fnm>JR</fnm>
               </au>
               <au>
                  <snm>Hogenesch</snm>
                  <fnm>JB</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2004</pubdate>
            <volume>101</volume>
            <fpage>6062</fpage>
            <lpage>6067</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">395923</pubid>
                  <pubid idtype="pmpid" link="fulltext">15075390</pubid>
                  <pubid idtype="doi">10.1073/pnas.0400782101</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B29">
            <title>
               <p>Genome cluster database. A sequence family analysis platform for Arabidopsis and rice.</p>
            </title>
            <aug>
               <au>
                  <snm>Horan</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Lauricha</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Bailey-Serres</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Raikhel</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Girke</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Plant Physiol</source>
            <pubdate>2005</pubdate>
            <volume>138</volume>
            <fpage>47</fpage>
            <lpage>54</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1104159</pubid>
                  <pubid idtype="pmpid" link="fulltext">15888677</pubid>
                  <pubid idtype="doi">10.1104/pp.104.059048</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B30">
            <title>
               <p>Organismal complexity, protein complexity, and gene duplicability.</p>
            </title>
            <aug>
               <au>
                  <snm>Yang</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Lusk</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>WH</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2003</pubdate>
            <volume>100</volume>
            <fpage>15661</fpage>
            <lpage>15665</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">307624</pubid>
                  <pubid idtype="pmpid" link="fulltext">14660792</pubid>
                  <pubid idtype="doi">10.1073/pnas.2536672100</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B31">
            <title>
               <p>Gene identification signature (GIS) analysis for transcriptome characterization and genome annotation.</p>
            </title>
            <aug>
               <au>
                  <snm>Ng</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Wei</snm>
                  <fnm>CL</fnm>
               </au>
               <au>
                  <snm>Sung</snm>
                  <fnm>WK</fnm>
               </au>
               <au>
                  <snm>Chiu</snm>
                  <fnm>KP</fnm>
               </au>
               <au>
                  <snm>Lipovich</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Ang</snm>
                  <fnm>CC</fnm>
               </au>
               <au>
                  <snm>Gupta</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Shahab</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Ridwan</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Wong</snm>
                  <fnm>CH</fnm>
               </au>
               <au>
                  <snm>Liu</snm>
                  <fnm>ET</fnm>
               </au>
               <au>
                  <snm>Ruan</snm>
                  <fnm>Y</fnm>
               </au>
            </aug>
            <source>Nat Methods</source>
            <pubdate>2005</pubdate>
            <volume>2</volume>
            <fpage>105</fpage>
            <lpage>111</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nmeth733</pubid>
                  <pubid idtype="pmpid" link="fulltext">15782207</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B32">
            <title>
               <p>Sequence comparison of human and mouse genes reveals a homologous block structure in the promoter regions.</p>
            </title>
            <aug>
               <au>
                  <snm>Suzuki</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Yamashita</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Shirota</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Sakakibara</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Chiba</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Mizushima-Sugano</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Nakai</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Sugano</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2004</pubdate>
            <volume>14</volume>
            <fpage>1711</fpage>
            <lpage>1718</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">515316</pubid>
                  <pubid idtype="pmpid" link="fulltext">15342556</pubid>
                  <pubid idtype="doi">10.1101/gr.2435604</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B33">
            <title>
               <p>Genome-wide analysis of core promoter elements from conserved human and mouse orthologous pairs.</p>
            </title>
            <aug>
               <au>
                  <snm>Jin</snm>
                  <fnm>VX</fnm>
               </au>
               <au>
                  <snm>Singer</snm>
                  <fnm>GA</fnm>
               </au>
               <au>
                  <snm>Agosto-Perez</snm>
                  <fnm>FJ</fnm>
               </au>
               <au>
                  <snm>Liyanarachchi</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Davuluri</snm>
                  <fnm>RV</fnm>
               </au>
            </aug>
            <source>BMC bioinformatics</source>
            <pubdate>2006</pubdate>
            <volume>7</volume>
            <fpage>114</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1475891</pubid>
                  <pubid idtype="pmpid" link="fulltext">16522199</pubid>
                  <pubid idtype="doi">10.1186/1471-2105-7-114</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B34">
            <title>
               <p>Slow molecular clocks in Old World monkeys, apes, and humans.</p>
            </title>
            <aug>
               <au>
                  <snm>Yi</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Ellsworth</snm>
                  <fnm>DL</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>WH</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2002</pubdate>
            <volume>19</volume>
            <fpage>2191</fpage>
            <lpage>2198</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">12446810</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B35">
            <title>
               <p>The pattern of evolution of smaller-scale gene duplicates in mammalian genomes is more consistent with neo- than subfunctionalisation.</p>
            </title>
            <aug>
               <au>
                  <snm>Hughes</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Liberles</snm>
                  <fnm>DA</fnm>
               </au>
            </aug>
            <source>J Mol Evol</source>
            <pubdate>2007</pubdate>
            <volume>65</volume>
            <fpage>574</fpage>
            <lpage>588</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1007/s00239-007-9041-9</pubid>
                  <pubid idtype="pmpid" link="fulltext">17957399</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B36">
            <title>
               <p>Heterotachy in mammalian promoter evolution.</p>
            </title>
            <aug>
               <au>
                  <snm>Taylor</snm>
                  <fnm>MS</fnm>
               </au>
               <au>
                  <snm>Kai</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Kawai</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Carninci</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Hayashizaki</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Semple</snm>
                  <fnm>CA</fnm>
               </au>
            </aug>
            <source>PLoS Genet</source>
            <pubdate>2006</pubdate>
            <volume>2</volume>
            <fpage>e30</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1449885</pubid>
                  <pubid idtype="pmpid" link="fulltext">16683025</pubid>
                  <pubid idtype="doi">10.1371/journal.pgen.0020030</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B37">
            <title>
               <p>Highly conserved upstream sequences for transcription factor genes and implications for the regulatory network.</p>
            </title>
            <aug>
               <au>
                  <snm>Iwama</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Gojobori</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2004</pubdate>
            <volume>101</volume>
            <fpage>17156</fpage>
            <lpage>17161</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">534610</pubid>
                  <pubid idtype="pmpid" link="fulltext">15572454</pubid>
                  <pubid idtype="doi">10.1073/pnas.0407670101</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B38">
            <title>
               <p>Hearing silence: non-neutral evolution at synonymous sites in mammals</p>
            </title>
            <aug>
               <au>
                  <snm>Chamary</snm>
                  <fnm>JV</fnm>
               </au>
               <au>
                  <snm>Parmley</snm>
                  <fnm>JL</fnm>
               </au>
               <au>
                  <snm>Hurst</snm>
                  <fnm>LD</fnm>
               </au>
            </aug>
            <source>Nat Rev Genet</source>
            <pubdate>2006</pubdate>
            <volume>7</volume>
            <fpage>98</fpage>
            <lpage>108</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid" link="fulltext">16418745</pubid>
                  <pubid idtype="doi">10.1038/nrg1770</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B39">
            <title>
               <p>Expression divergence between duplicate genes.</p>
            </title>
            <aug>
               <au>
                  <snm>Li</snm>
                  <fnm>WH</fnm>
               </au>
               <au>
                  <snm>Yang</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Gu</snm>
                  <fnm>X</fnm>
               </au>
            </aug>
            <source>Trends Genet</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <fpage>602</fpage>
            <lpage>607</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.tig.2005.08.006</pubid>
                  <pubid idtype="pmpid" link="fulltext">16140417</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B40">
            <title>
               <p>A burst of protein sequence evolution and a prolonged period of asymmetric evolution follow gene duplication in yeast.</p>
            </title>
            <aug>
               <au>
                  <snm>Scannell</snm>
                  <fnm>DR</fnm>
               </au>
               <au>
                  <snm>Wolfe</snm>
                  <fnm>KH</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2008</pubdate>
            <volume>18</volume>
            <fpage>137</fpage>
            <lpage>147</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2134778</pubid>
                  <pubid idtype="pmpid" link="fulltext">18025270</pubid>
                  <pubid idtype="doi">10.1101/gr.6341207</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B41">
            <title>
               <p>Preferential subfunctionalization of slow-evolving genes after allopolyploidization in <it>Xenopus laevis </it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Semon</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Wolfe</snm>
                  <fnm>KH</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2008</pubdate>
            <volume>105</volume>
            <fpage>8333</fpage>
            <lpage>8338</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2448837</pubid>
                  <pubid idtype="pmpid" link="fulltext">18541921</pubid>
                  <pubid idtype="doi">10.1073/pnas.0708705105</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B42">
            <title>
               <p>External factors accelerate expression divergence between duplicate genes.</p>
            </title>
            <aug>
               <au>
                  <snm>Ha</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>WH</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>ZJ</fnm>
               </au>
            </aug>
            <source>Trends Genet</source>
            <pubdate>2007</pubdate>
            <volume>23</volume>
            <fpage>162</fpage>
            <lpage>166</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2065749</pubid>
                  <pubid idtype="pmpid" link="fulltext">17320239</pubid>
                  <pubid idtype="doi">10.1016/j.tig.2007.02.005</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B43">
            <title>
               <p>Splitting pairs: the diverging fates of duplicated genes</p>
            </title>
            <aug>
               <au>
                  <snm>Prince</snm>
                  <fnm>VE</fnm>
               </au>
               <au>
                  <snm>Pickett</snm>
                  <fnm>FB</fnm>
               </au>
            </aug>
            <source>Nat Rev Genet</source>
            <pubdate>2002</pubdate>
            <volume>3</volume>
            <fpage>827</fpage>
            <lpage>837</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid" link="fulltext">12415313</pubid>
                  <pubid idtype="doi">10.1038/nrg928</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B44">
            <title>
               <p>The probability of duplicate gene preservation by subfunctionalization.</p>
            </title>
            <aug>
               <au>
                  <snm>Lynch</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Force</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Genetics</source>
            <pubdate>2000</pubdate>
            <volume>154</volume>
            <fpage>459</fpage>
            <lpage>473</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1460895</pubid>
                  <pubid idtype="pmpid" link="fulltext">10629003</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B45">
            <title>
               <p>Role of positive selection in the retention of duplicate genes in mammalian genomes.</p>
            </title>
            <aug>
               <au>
                  <snm>Shiu</snm>
                  <fnm>SH</fnm>
               </au>
               <au>
                  <snm>Byrnes</snm>
                  <fnm>JK</fnm>
               </au>
               <au>
                  <snm>Pan</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>WH</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2006</pubdate>
            <volume>103</volume>
            <fpage>2232</fpage>
            <lpage>2236</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1413713</pubid>
                  <pubid idtype="pmpid" link="fulltext">16461903</pubid>
                  <pubid idtype="doi">10.1073/pnas.0510388103</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B46">
            <title>
               <p>Evolution of cis-regulatory elements in duplicated genes of yeast.</p>
            </title>
            <aug>
               <au>
                  <snm>Papp</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Pal</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Hurst</snm>
                  <fnm>LD</fnm>
               </au>
            </aug>
            <source>Trends Genet</source>
            <pubdate>2003</pubdate>
            <volume>19</volume>
            <fpage>417</fpage>
            <lpage>422</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0168-9525(03)00174-4</pubid>
                  <pubid idtype="pmpid" link="fulltext">12902158</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B47">
            <title>
               <p>cis-Regulatory and protein evolution in orthologous and duplicate genes.</p>
            </title>
            <aug>
               <au>
                  <snm>Castillo-Davis</snm>
                  <fnm>CI</fnm>
               </au>
               <au>
                  <snm>Hartl</snm>
                  <fnm>DL</fnm>
               </au>
               <au>
                  <snm>Achaz</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2004</pubdate>
            <volume>14</volume>
            <fpage>1530</fpage>
            <lpage>1536</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">509261</pubid>
                  <pubid idtype="pmpid" link="fulltext">15256508</pubid>
                  <pubid idtype="doi">10.1101/gr.2662504</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B48">
            <title>
               <p>The role of cis-regulatory motifs and genetical control of expression in the divergence of yeast duplicate genes.</p>
            </title>
            <aug>
               <au>
                  <snm>Leach</snm>
                  <fnm>LJ</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Lu</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Kearsey</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Luo</snm>
                  <fnm>Z</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2007</pubdate>
            <volume>24</volume>
            <fpage>2556</fpage>
            <lpage>2565</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/molbev/msm188</pubid>
                  <pubid idtype="pmpid" link="fulltext">17846103</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B49">
            <title>
               <p>Mammalian housekeeping genes evolve more slowly than tissue-specific genes.</p>
            </title>
            <aug>
               <au>
                  <snm>Zhang</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>WH</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2004</pubdate>
            <volume>21</volume>
            <fpage>236</fpage>
            <lpage>239</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/molbev/msh010</pubid>
                  <pubid idtype="pmpid" link="fulltext">14595094</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B50">
            <title>
               <p>Gene expression intensity shapes evolutionary rates of the proteins encoded by the vertebrate genome.</p>
            </title>
            <aug>
               <au>
                  <snm>Subramanian</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Kumar</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Genetics</source>
            <pubdate>2004</pubdate>
            <volume>168</volume>
            <fpage>373</fpage>
            <lpage>381</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1448110</pubid>
                  <pubid idtype="pmpid" link="fulltext">15454550</pubid>
                  <pubid idtype="doi">10.1534/genetics.104.028944</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B51">
            <title>
               <p>Highly expressed genes in yeast evolve slowly.</p>
            </title>
            <aug>
               <au>
                  <snm>Pal</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Papp</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Hurst</snm>
                  <fnm>LD</fnm>
               </au>
            </aug>
            <source>Genetics</source>
            <pubdate>2001</pubdate>
            <volume>158</volume>
            <fpage>927</fpage>
            <lpage>931</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1461684</pubid>
                  <pubid idtype="pmpid" link="fulltext">11430355</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B52">
            <title>
               <p>Determinants of substitution rates in mammalian genes: expression pattern affects selection intensity but not mutation rate.</p>
            </title>
            <aug>
               <au>
                  <snm>Duret</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Mouchiroud</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2000</pubdate>
            <volume>17</volume>
            <fpage>68</fpage>
            <lpage>74</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">10666707</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B53">
            <title>
               <p>A single determinant dominates the rate of yeast protein evolution.</p>
            </title>
            <aug>
               <au>
                  <snm>Drummond</snm>
                  <fnm>DA</fnm>
               </au>
               <au>
                  <snm>Raval</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Wilke</snm>
                  <fnm>CO</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2006</pubdate>
            <volume>23</volume>
            <fpage>327</fpage>
            <lpage>337</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/molbev/msj038</pubid>
                  <pubid idtype="pmpid" link="fulltext">16237209</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B54">
            <title>
               <p>An analysis of determinants of amino acids substitution rates in bacterial proteins.</p>
            </title>
            <aug>
               <au>
                  <snm>Rocha</snm>
                  <fnm>EP</fnm>
               </au>
               <au>
                  <snm>Danchin</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2004</pubdate>
            <volume>21</volume>
            <fpage>108</fpage>
            <lpage>116</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/molbev/msh004</pubid>
                  <pubid idtype="pmpid" link="fulltext">14595100</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B55">
            <title>
               <p>Protein dispensability and rate of evolution.</p>
            </title>
            <aug>
               <au>
                  <snm>Hirsh</snm>
                  <fnm>AE</fnm>
               </au>
               <au>
                  <snm>Fraser</snm>
                  <fnm>HB</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2001</pubdate>
            <volume>411</volume>
            <fpage>1046</fpage>
            <lpage>1049</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/35082561</pubid>
                  <pubid idtype="pmpid" link="fulltext">11429604</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B56">
            <title>
               <p>Significant impact of protein dispensability on the instantaneous rate of protein evolution.</p>
            </title>
            <aug>
               <au>
                  <snm>Zhang</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>He</snm>
                  <fnm>X</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2005</pubdate>
            <volume>22</volume>
            <fpage>1147</fpage>
            <lpage>1155</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/molbev/msi101</pubid>
                  <pubid idtype="pmpid" link="fulltext">15689524</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B57">
            <title>
               <p>Evolutionary rate in the protein interaction network.</p>
            </title>
            <aug>
               <au>
                  <snm>Fraser</snm>
                  <fnm>HB</fnm>
               </au>
               <au>
                  <snm>Hirsh</snm>
                  <fnm>AE</fnm>
               </au>
               <au>
                  <snm>Steinmetz</snm>
                  <fnm>LM</fnm>
               </au>
               <au>
                  <snm>Scharfe</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Feldman</snm>
                  <fnm>MW</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2002</pubdate>
            <volume>296</volume>
            <fpage>750</fpage>
            <lpage>752</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1068696</pubid>
                  <pubid idtype="pmpid" link="fulltext">11976460</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B58">
            <title>
               <p>Mistranslation-induced protein misfolding as a dominant constraint on coding-sequence evolution.</p>
            </title>
            <aug>
               <au>
                  <snm>Drummond</snm>
                  <fnm>DA</fnm>
               </au>
               <au>
                  <snm>Wilke</snm>
                  <fnm>CO</fnm>
               </au>
            </aug>
            <source>Cell</source>
            <pubdate>2008</pubdate>
            <volume>134</volume>
            <fpage>341</fpage>
            <lpage>352</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.cell.2008.05.042</pubid>
                  <pubid idtype="pmpid" link="fulltext">18662548</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B59">
            <title>
               <p>Functional genomic analysis of the rates of protein evolution.</p>
            </title>
            <aug>
               <au>
                  <snm>Wall</snm>
                  <fnm>DP</fnm>
               </au>
               <au>
                  <snm>Hirsh</snm>
                  <fnm>AE</fnm>
               </au>
               <au>
                  <snm>Fraser</snm>
                  <fnm>HB</fnm>
               </au>
               <au>
                  <snm>Kumm</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Giaever</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Eisen</snm>
                  <fnm>MB</fnm>
               </au>
               <au>
                  <snm>Feldman</snm>
                  <fnm>MW</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2005</pubdate>
            <volume>102</volume>
            <fpage>5483</fpage>
            <lpage>5488</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">555735</pubid>
                  <pubid idtype="pmpid" link="fulltext">15800036</pubid>
                  <pubid idtype="doi">10.1073/pnas.0501761102</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B60">
            <title>
               <p>Not born equal: increased rate asymmetry in relocated and retrotransposed rodent gene duplicates.</p>
            </title>
            <aug>
               <au>
                  <snm>Cusack</snm>
                  <fnm>BP</fnm>
               </au>
               <au>
                  <snm>Wolfe</snm>
                  <fnm>KH</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2007</pubdate>
            <volume>24</volume>
            <fpage>679</fpage>
            <lpage>686</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/molbev/msl199</pubid>
                  <pubid idtype="pmpid" link="fulltext">17179139</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B61">
            <title>
               <p>Fast evolution of core promoters in primate genomes.</p>
            </title>
            <aug>
               <au>
                  <snm>Liang</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Lin</snm>
                  <fnm>YS</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>WH</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2008</pubdate>
            <volume>25</volume>
            <fpage>1239</fpage>
            <lpage>1244</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/molbev/msn072</pubid>
                  <pubid idtype="pmpid" link="fulltext">18367463</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B62">
            <title>
               <p>Phylogenetic reconstruction of ancestral character states for gene expression and mRNA splicing data.</p>
            </title>
            <aug>
               <au>
                  <snm>Rossnes</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Eidhammer</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Liberles</snm>
                  <fnm>DA</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>6</volume>
            <fpage>127</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1166541</pubid>
                  <pubid idtype="pmpid" link="fulltext">15921519</pubid>
                  <pubid idtype="doi">10.1186/1471-2105-6-127</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B63">
            <title>
               <p>Extent of gene duplication in the genomes of Drosophila, nematode, and yeast.</p>
            </title>
            <aug>
               <au>
                  <snm>Gu</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Cavalcanti</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>FC</fnm>
               </au>
               <au>
                  <snm>Bouman</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>WH</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2002</pubdate>
            <volume>19</volume>
            <fpage>256</fpage>
            <lpage>262</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11861885</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B64">
            <title>
               <p>An efficient algorithm for large-scale detection of protein families.</p>
            </title>
            <aug>
               <au>
                  <snm>Enright</snm>
                  <fnm>AJ</fnm>
               </au>
               <au>
                  <snm>Van Dongen</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Ouzounis</snm>
                  <fnm>CA</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2002</pubdate>
            <volume>30</volume>
            <fpage>1575</fpage>
            <lpage>1584</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">101833</pubid>
                  <pubid idtype="pmpid" link="fulltext">11917018</pubid>
                  <pubid idtype="doi">10.1093/nar/30.7.1575</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B65">
            <title>
               <p>Improved tools for biological sequence comparison.</p>
            </title>
            <aug>
               <au>
                  <snm>Pearson</snm>
                  <fnm>WR</fnm>
               </au>
               <au>
                  <snm>Lipman</snm>
                  <fnm>DJ</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>1988</pubdate>
            <volume>85</volume>
            <fpage>2444</fpage>
            <lpage>2448</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">280013</pubid>
                  <pubid idtype="pmpid" link="fulltext">3162770</pubid>
                  <pubid idtype="doi">10.1073/pnas.85.8.2444</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B66">
            <title>
               <p>Twilight zone of protein sequence alignments.</p>
            </title>
            <aug>
               <au>
                  <snm>Rost</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>Protein Eng</source>
            <pubdate>1999</pubdate>
            <volume>12</volume>
            <fpage>85</fpage>
            <lpage>94</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/protein/12.2.85</pubid>
                  <pubid idtype="pmpid" link="fulltext">10195279</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B67">
            <title>
               <p>CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice.</p>
            </title>
            <aug>
               <au>
                  <snm>Thompson</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Higgins</snm>
                  <fnm>DG</fnm>
               </au>
               <au>
                  <snm>Gibson</snm>
                  <fnm>TJ</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1994</pubdate>
            <volume>22</volume>
            <fpage>4673</fpage>
            <lpage>4680</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">308517</pubid>
                  <pubid idtype="pmpid" link="fulltext">7984417</pubid>
                  <pubid idtype="doi">10.1093/nar/22.22.4673</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B68">
            <title>
               <p>Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models.</p>
            </title>
            <aug>
               <au>
                  <snm>Yang</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Nielsen</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2000</pubdate>
            <volume>17</volume>
            <fpage>32</fpage>
            <lpage>43</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">10666704</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B69">
            <title>
               <p>PAML: a program package for phylogenetic analysis by maximum likelihood.</p>
            </title>
            <aug>
               <au>
                  <snm>Yang</snm>
                  <fnm>Z</fnm>
               </au>
            </aug>
            <source>Comput Appl Biosci</source>
            <pubdate>1997</pubdate>
            <volume>13</volume>
            <fpage>555</fpage>
            <lpage>556</lpage>
            <xrefbib>
               <pubid idtype="pmpid">9367129</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B70">
            <title>
               <p>Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.</p>
            </title>
            <aug>
               <au>
                  <snm>Altschul</snm>
                  <fnm>SF</fnm>
               </au>
               <au>
                  <snm>Madden</snm>
                  <fnm>TL</fnm>
               </au>
               <au>
                  <snm>Schaffer</snm>
                  <fnm>AA</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Lipman</snm>
                  <fnm>DJ</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1997</pubdate>
            <volume>25</volume>
            <fpage>3389</fpage>
            <lpage>3402</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">146917</pubid>
                  <pubid idtype="pmpid" link="fulltext">9254694</pubid>
                  <pubid idtype="doi">10.1093/nar/25.17.3389</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B71">
            <title>
               <p>Rapid and asymmetric divergence of duplicate genes in the human gene coexpression network.</p>
            </title>
            <aug>
               <au>
                  <snm>Chung</snm>
                  <fnm>WY</fnm>
               </au>
               <au>
                  <snm>Albert</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Albert</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Nekrutenko</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Makova</snm>
                  <fnm>KD</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2006</pubdate>
            <volume>7</volume>
            <fpage>46</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1403810</pubid>
                  <pubid idtype="pmpid" link="fulltext">16441884</pubid>
                  <pubid idtype="doi">10.1186/1471-2105-7-46</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B72">
            <title>
               <p>Congruence of tissue expression profiles from Gene Expression Atlas, SAGEmap and TissueInfo databases.</p>
            </title>
            <aug>
               <au>
                  <snm>Huminiecki</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Lloyd</snm>
                  <fnm>AT</fnm>
               </au>
               <au>
                  <snm>Wolfe</snm>
                  <fnm>KH</fnm>
               </au>
            </aug>
            <source>BMC Genomics</source>
            <pubdate>2003</pubdate>
            <volume>4</volume>
            <fpage>31</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">183867</pubid>
                  <pubid idtype="pmpid" link="fulltext">12885301</pubid>
                  <pubid idtype="doi">10.1186/1471-2164-4-31</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B73">
            <title>
               <p>Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project.</p>
            </title>
            <aug>
               <au>
                  <snm>Birney</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Stamatoyannopoulos</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Dutta</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Guigo</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Gingeras</snm>
                  <fnm>TR</fnm>
               </au>
               <au>
                  <snm>Margulies</snm>
                  <fnm>EH</fnm>
               </au>
               <au>
                  <snm>Weng</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Snyder</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Dermitzakis</snm>
                  <fnm>ET</fnm>
               </au>
               <au>
                  <snm>Thurman</snm>
                  <fnm>RE</fnm>
               </au>
               <au>
                  <snm>Kuehn</snm>
                  <fnm>MS</fnm>
               </au>
               <au>
                  <snm>Taylor</snm>
                  <fnm>CM</fnm>
               </au>
               <au>
                  <snm>Neph</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Koch</snm>
                  <fnm>CM</fnm>
               </au>
               <au>
                  <snm>Asthana</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Malhotra</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Adzhubei</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Greenbaum</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Andrews</snm>
                  <fnm>RM</fnm>
               </au>
               <au>
                  <snm>Flicek</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Boyle</snm>
                  <fnm>PJ</fnm>
               </au>
               <au>
                  <snm>Cao</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Carter</snm>
                  <fnm>NP</fnm>
               </au>
               <au>
                  <snm>Clelland</snm>
                  <fnm>GK</fnm>
               </au>
               <au>
                  <snm>Davis</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Day</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Dhami</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Dillon</snm>
                  <fnm>SC</fnm>
               </au>
               <au>
                  <snm>Dorschner</snm>
                  <fnm>MO</fnm>
               </au>
               <au>
                  <snm>Fiegler</snm>
                  <fnm>H</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nature</source>
            <pubdate>2007</pubdate>
            <volume>447</volume>
            <fpage>799</fpage>
            <lpage>816</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2212820</pubid>
                  <pubid idtype="pmpid">17571346</pubid>
                  <pubid idtype="doi">10.1038/nature05874</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B74">
            <title>
               <p>NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins.</p>
            </title>
            <aug>
               <au>
                  <snm>Pruitt</snm>
                  <fnm>KD</fnm>
               </au>
               <au>
                  <snm>Tatusova</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Maglott</snm>
                  <fnm>DR</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2007</pubdate>
            <volume>35</volume>
            <fpage>D61</fpage>
            <lpage>65</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1716718</pubid>
                  <pubid idtype="pmpid" link="fulltext">17130148</pubid>
                  <pubid idtype="doi">10.1093/nar/gkl842</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B75">
            <title>
               <p>Integrative annotation of 21,037 human genes validated by full-length cDNA clones.</p>
            </title>
            <aug>
               <au>
                  <snm>Imanishi</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Itoh</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Suzuki</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>O'Donovan</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Fukuchi</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Koyanagi</snm>
                  <fnm>KO</fnm>
               </au>
               <au>
                  <snm>Barrero</snm>
                  <fnm>RA</fnm>
               </au>
               <au>
                  <snm>Tamura</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Yamaguchi-Kabata</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Tanino</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Yura</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Miyazaki</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Ikeo</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Homma</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Kasprzyk</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Nishikawa</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Hirakawa</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Thierry-Mieg</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Thierry-Mieg</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Ashurst</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Jia</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Nakao</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Thomas</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Mulder</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Karavidopoulou</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Jin</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Kim</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Yasuda</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Lenhard</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Eveno</snm>
                  <fnm>E</fnm>
               </au>
               <etal/>
            </aug>
            <source>PLoS Biol</source>
            <pubdate>2004</pubdate>
            <volume>2</volume>
            <fpage>e162</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">393292</pubid>
                  <pubid idtype="pmpid" link="fulltext">15103394</pubid>
                  <pubid idtype="doi">10.1371/journal.pbio.0020162</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B76">
            <title>
               <p>GenBank: update.</p>
            </title>
            <aug>
               <au>
                  <snm>Benson</snm>
                  <fnm>DA</fnm>
               </au>
               <au>
                  <snm>Karsch-Mizrachi</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Lipman</snm>
                  <fnm>DJ</fnm>
               </au>
               <au>
                  <snm>Ostell</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Wheeler</snm>
                  <fnm>DL</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2004</pubdate>
            <volume>32</volume>
            <fpage>D23</fpage>
            <lpage>26</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">308779</pubid>
                  <pubid idtype="pmpid" link="fulltext">14681350</pubid>
                  <pubid idtype="doi">10.1093/nar/gkh045</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B77">
            <title>
               <p>The UCSC Genome Browser Database: 2008 update.</p>
            </title>
            <aug>
               <au>
                  <snm>Karolchik</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Kuhn</snm>
                  <fnm>RM</fnm>
               </au>
               <au>
                  <snm>Baertsch</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Barber</snm>
                  <fnm>GP</fnm>
               </au>
               <au>
                  <snm>Clawson</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Diekhans</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Giardine</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Harte</snm>
                  <fnm>RA</fnm>
               </au>
               <au>
                  <snm>Hinrichs</snm>
                  <fnm>AS</fnm>
               </au>
               <au>
                  <snm>Hsu</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Kober</snm>
                  <fnm>KM</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Pedersen</snm>
                  <fnm>JS</fnm>
               </au>
               <au>
                  <snm>Pohl</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Raney</snm>
                  <fnm>BJ</fnm>
               </au>
               <au>
                  <snm>Rhead</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Rosenbloom</snm>
                  <fnm>KR</fnm>
               </au>
               <au>
                  <snm>Smith</snm>
                  <fnm>KE</fnm>
               </au>
               <au>
                  <snm>Stanke</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Thakkapallayil</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Trumbower</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Zweig</snm>
                  <fnm>AS</fnm>
               </au>
               <au>
                  <snm>Haussler</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Kent</snm>
                  <fnm>WJ</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2008</pubdate>
            <volume>36</volume>
            <fpage>D773</fpage>
            <lpage>779</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2238835</pubid>
                  <pubid idtype="pmpid" link="fulltext">18086701</pubid>
                  <pubid idtype="doi">10.1093/nar/gkm966</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B78">
            <title>
               <p>AVID: A global alignment program.</p>
            </title>
            <aug>
               <au>
                  <snm>Bray</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Dubchak</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Pachter</snm>
                  <fnm>L</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2003</pubdate>
            <volume>13</volume>
            <fpage>97</fpage>
            <lpage>102</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">430967</pubid>
                  <pubid idtype="pmpid" link="fulltext">12529311</pubid>
                  <pubid idtype="doi">10.1101/gr.789803</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B79">
            <title>
               <p>Repbase update: a database and an electronic journal of repetitive elements.</p>
            </title>
            <aug>
               <au>
                  <snm>Jurka</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Trends Genet</source>
            <pubdate>2000</pubdate>
            <volume>16</volume>
            <fpage>418</fpage>
            <lpage>420</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0168-9525(00)02093-X</pubid>
                  <pubid idtype="pmpid" link="fulltext">10973072</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B80">
            <title>
               <p>A macaque's-eye view of human insertions and deletions: differences in mechanisms.</p>
            </title>
            <aug>
               <au>
                  <snm>Kvikstad</snm>
                  <fnm>EM</fnm>
               </au>
               <au>
                  <snm>Tyekucheva</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Chiaromonte</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Makova</snm>
                  <fnm>KD</fnm>
               </au>
            </aug>
            <source>PLoS Comput Biol</source>
            <pubdate>2007</pubdate>
            <volume>3</volume>
            <fpage>1772</fpage>
            <lpage>1782</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1976337</pubid>
                  <pubid idtype="pmpid" link="fulltext">17941704</pubid>
                  <pubid idtype="doi">10.1371/journal.pcbi.0030176</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B81">
            <title>
               <p>The genome-wide determinants of human and chimpanzee microsatellite evolution.</p>
            </title>
            <aug>
               <au>
                  <snm>Kelkar</snm>
                  <fnm>YD</fnm>
               </au>
               <au>
                  <snm>Tyekucheva</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Chiaromonte</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Makova</snm>
                  <fnm>KD</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2008</pubdate>
            <volume>18</volume>
            <fpage>30</fpage>
            <lpage>38</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2134767</pubid>
                  <pubid idtype="pmpid" link="fulltext">18032720</pubid>
                  <pubid idtype="doi">10.1101/gr.7113408</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B82">
            <aug>
               <au>
                  <snm>Kutner</snm>
                  <fnm>MH</fnm>
               </au>
               <au>
                  <snm>Nachtsheim</snm>
                  <fnm>CJ</fnm>
               </au>
               <au>
                  <snm>Neter</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>W</fnm>
               </au>
            </aug>
            <source>Applied Linear Statistical Models</source>
            <publisher>New York: McGraw-Hill</publisher>
            <pubdate>2005</pubdate>
         </bibl>
      </refgrp>
   </bm>
</art>

