<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>gb-2001-2-12-research0051</ui>
   <ji>GBJ</ji>
   <fm>
      <dochead>Research</dochead>
      <bibl>
         <title>
            <p><it>Quod erat demonstrandum?</it> The mystery of experimental validation of apparently erroneous computational analyses of protein sequences</p>
         </title>
         <aug>
            <au id="A1">
               <snm>Iyer</snm>
               <mi>M</mi>
               <fnm>Lakshminarayan</fnm>
               <insr iid="I1"/>
            </au>
            <au id="A2">
               <snm>Aravind</snm>
               <fnm>L</fnm>
               <insr iid="I1"/>
            </au>
            <au id="A3">
               <snm>Bork</snm>
               <fnm>Peer</fnm>
               <insr iid="I2"/>
            </au>
            <au id="A4">
               <snm>Hofmann</snm>
               <fnm>Kay</fnm>
               <insr iid="I3"/>
            </au>
            <au id="A5">
               <snm>Mushegian</snm>
               <mi>R</mi>
               <fnm>Arcady</fnm>
               <insr iid="I4"/>
            </au>
            <au id="A6">
               <snm>Zhulin</snm>
               <mi>B</mi>
               <fnm>Igor</fnm>
               <insr iid="I5"/>
            </au>
            <au id="A7" ca="yes">
               <snm>Koonin</snm>
               <mi>V</mi>
               <fnm>Eugene</fnm>
               <insr iid="I1"/>
               <email>koonin@ncbi.nlm.nih.gov</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA</p>
            </ins>
            <ins id="I2">
               <p>EMBL, Biocomputing, Meyerhofstrasse 1, 69117 Heidelberg, Germany</p>
            </ins>
            <ins id="I3">
               <p>MEMOREC Stoffel GmbH, K&#246;ln D-50829, Germany</p>
            </ins>
            <ins id="I4">
               <p>Stowers Institute for Medical Research, 1000 E 50th Street, Kansas City, MO 64410, USA</p>
            </ins>
            <ins id="I5">
               <p>School of Biology, Georgia Institute of Technology, Atlanta, GA 30332, USA</p>
            </ins>
         </insg>
         <source>Genome Biology</source>
         <issn>1465-6906</issn>
         <pubdate>2001</pubdate>
         <volume>2</volume>
         <issue>12</issue>
         <fpage>research0051.1</fpage>
         <lpage>research0051.11</lpage>
         <url>http://genomebiology.com/2001/2/12/research/0051</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="doi">10.1186/gb-2001-2-12-research0051</pubid>
               <pubid idtype="pmpid">11790254</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>3</day>
               <month>7</month>
               <year>2001</year>
            </date>
         </rec>
         <revrec>
            <date>
               <day>7</day>
               <month>9</month>
               <year>2001</year>
            </date>
         </revrec>
         <acc>
            <date>
               <day>4</day>
               <month>10</month>
               <year>2001</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>13</day>
               <month>11</month>
               <year>2001</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2001</year>
         <collab>Iyer et al, licensee BioMed Central Ltd</collab>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>Computational predictions are critical for directing the experimental study of protein functions. Therefore it is paradoxical when an apparently erroneous computational prediction seems to be supported by experiment.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>We analyzed six cases where application of novel or conventional computational methods for protein sequence and structure analysis led to non-trivial predictions that were subsequently supported by direct experiments. We show that, on all six occasions, the original prediction was unjustified, and in at least three cases, an alternative, well-supported computational prediction, incompatible with the original one, could be derived. The most unusual cases involved the identification of an archaeal cysteinyl-tRNA synthetase, a dihydropteroate synthase and a thymidylate synthase, for which experimental verifications of apparently erroneous computational predictions were reported. Using sequence-profile analysis, multiple alignment and secondary-structure prediction, we have identified the unique archaeal 'cysteinyl-tRNA synthetase' as a homolog of extracellular polygalactosaminidases, and the 'dihydropteroate synthase' as a member of the &#946;-lactamase-like superfamily of metal-dependent hydrolases.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusions</p>
               </st>
               <p>In each of the analyzed cases, the original computational predictions could be refuted and, in some instances, alternative strongly supported predictions were obtained. The nature of the experimental evidence that appears to support these predictions remains an open question. Some of these experiments might signify discovery of extremely unusual forms of the respective enzymes, whereas the results of others could be due to artifacts.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <meta>
      <classifications>
         <classification type="BMC" subtype="man_spc_id" id="30010008">Evolution</classification>
         <classification type="BMC" subtype="man_spc_id" id="30010010">Genome studies</classification>
         <classification type="BMC" subtype="man_spc_id" id="30010002">Bioinformatics</classification>
      </classifications>
   </meta>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>The availability of a large number of protein sequences, including complete protein sets encoded in diverse genomes, and the rapidly growing database of protein structures have already greatly impacted on our understanding of the evolution of protein structure and function [<abbr bid="B1">1</abbr>,<abbr bid="B2">2</abbr>]. This process has been aided by the development of powerful algorithms and sensitive computational tools for detecting sequence and structural similarities between proteins. In particular, methods that extract information from multiple alignments to construct various types of sequence profiles and use the resulting sequence profiles for iterative database searching, such as PSI-BLAST and Hidden-Markov-Model (HMM)-based approaches, have substantially improved the detection of subtle similarities between proteins that previously were amenable only to direct structural comparison [<abbr bid="B3">3</abbr>,<abbr bid="B4">4</abbr>]. The sensitivity and accuracy of these methods have been extensively tested and statistical approaches for validating the observed similarities are available [<abbr bid="B5">5</abbr>,<abbr bid="B6">6</abbr>,<abbr bid="B7">7</abbr>,<abbr bid="B8">8</abbr>,<abbr bid="B9">9</abbr>,<abbr bid="B10">10</abbr>,<abbr bid="B11">11</abbr>].</p>
         <p>Despite these achievements, detection and interpretation of relationships between homologous proteins that have limited sequence similarity remains a major challenge. Such studies typically require a case-by-case approach that is guided by a detailed understanding of protein sequence-structure patterns and is rooted in the biology of the proteins analyzed. Prediction of structures and function(s) of uncharacterized proteins is one of the principal outcomes of these analyses, and experimental verification of such predictions tends to increase confidence in the validity of sequence-structure comparative approaches. The negative feedback from experiments that failed to confirm a computational prediction is potentially even more important, because it could result in revision and refinement of the computational methods.</p>
         <p>When examining cases of reported prediction followed by experimental validation, however, we encountered several paradoxical situations. In each of these, a prediction that has been reportedly confirmed by experiment was incompatible with results obtained with several standard computational procedures. More importantly, alternative predictions, supported by statistically significant sequence and/or structural similarity, were made in some of these cases. Here we present several such mysteries, describe the refutation of the original predictions and the new predictions, wherever feasible, and discuss the discrepancy between the computational and experimental results. The choice of the cases was not systematic; rather, those chosen were notable because they relied on novel computational techniques, exploited particularly subtle sequence or structural motifs, and dealt with crucial biological problems.</p>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <sec>
            <st>
               <p>MJ1477: a predicted archaeal cysteinyl-tRNA synthetase</p>
            </st>
            <p>Aminoacyl-tRNA synthetases (aaRSs) specific for 17 of the 20 amino acids are universally present in cellular life forms. The three exceptions are GlnRS, AsnRS and CysRS. GInRS and AsnRS are missing in many bacteria and archaea because glutamine and asparagine are incorporated into proteins through transamidation of glutamate and aspartate, respectively. CysRS is missing in two archaeal methanogens whose genomes have been sequenced - <it>Methanobacterium thermoautotrophicum</it> and <it>Methanococcus jannaschii</it> [<abbr bid="B12">12</abbr>]. No alternative mechanism for cysteine incorporation into proteins is known; hence the absence of CysRS in these organisms was an enigma.</p>
            <p>Two solutions to this puzzle, both unusual, have recently been proposed and experimentally validated. One involves non-orthologous gene displacement, a situation in which the same essential function is carried out by distantly related or even unrelated proteins in different organisms [<abbr bid="B13">13</abbr>,<abbr bid="B14">14</abbr>]. It has been shown that <it>M. jannaschii</it> ProRS, a class II synthetase that is unrelated to the class I CysRS, substituted for the missing CysRS activity [<abbr bid="B15">15</abbr>,<abbr bid="B16">16</abbr>,<abbr bid="B17">17</abbr>]. The other solution involved a new candidate for the role of CysRS, the MJ1477 protein from <it>M. jannaschii.</it> This protein and its orthologs (direct evolutionary counterparts related by vertical descent from a common ancestor) from the bacteria <it>Thermotoga maritima</it> and <it>Deinococcus radiodurans</it> were identified as 'distant orthologs' of the <it>Bacillus subtilis</it> CysRS by using a computational method specifically designed to detect distantly related orthologs [<abbr bid="B18">18</abbr>]. The method is based on application of discriminant analysis to alignment scores, in order to separate the scores for pairs of functionally identical proteins from different genomes from the scores for proteins with different functions. This prediction was then validated experimentally by showing that MJ1477 had CysRS activity <it>in vitro</it> and that an ortholog of MJ1477 from <it>D. radiodurans,</it> DR0705, complemented a CysRS deficient, temperature-sensitive, lethal <it>E. coli</it> mutant strain [<abbr bid="B18">18</abbr>]. An important corollary of these surprising findings is a rapid divergence of the MJ1477 family from CysRS, such that all the catalytic and otherwise functionally important residues characteristic of this enzyme, and also present in other class I aaRSs, have changed. Furthermore, MJ1477 and its orthologs do not have the accessory domains found in all known CysRS, namely the DALR domain (named after a distinct amino-acid signature), which is shared by aaRSs of several specificities, and another domain specific to CysRS [<abbr bid="B19">19</abbr>].</p>
            <p>We examined the protein sequences of MJ1477 and its homologs using more traditional computational techniques. Almost all these proteins contain amino-terminal signal pep-tides readily identifiable by using the SignalP program [<abbr bid="B20">20</abbr>], but do not contain any predicted transmembrane segments, and, accordingly, are predicted to be secreted from the cells (Figure <figr fid="F1">1</figr>). Furthermore, iterative database searches using the PSI-BLAST program [<abbr bid="B9">9</abbr>] showed statistically significant sequence similarity between these proteins and an experimentally characterized endo &#945;-1,4-polygalactosaminidase from <it>Pseudomonas</it> species [<abbr bid="B21">21</abbr>]. For example, in a search initiated with the sequence of MJ1477 and a profile inclusion cut-off of 0.01, the polygalactosaminidase sequence was retrieved from the database in the second iteration, followed by other bacterial proteins predicted to possess the same activity. This protein family has several conserved motifs, including a characteristic Dxhp signature (h, hydrophobic residue; p, polar residue), in which the conserved aspartate is likely to directly participate in catalysis (Figure <figr fid="F1">1</figr>). The hybrid-fold-recognition method, which combines sequence-profile analysis with alignment-based secondary-structure prediction [<abbr bid="B22">22</abbr>] and the 3D-PSSM method [<abbr bid="B23">23</abbr>] both suggested a likely &#945;-amylase-like triosephosphate isomerase (TIM) barrel structure for this protein family. Thus, although the identification of MJ1477 as a secreted polygalactosaminidase or a related polysaccharide hydrolase with a different specificity awaits experimental verification, it shows all the signs of a correct computational prediction: statistically significant similarity between the analyzed protein and an experimentally characterized enzyme; conservation of distinct motifs implicated in catalysis; potential presence of a structural fold compatible with the experimentally demonstrated enzymatic activity; and confident prediction of the extracellular localization that is, again, compatible with a polysaccharide hydrolase activity involved in environmental carbohydrate utilization or capsular metabolism. None of this evidence is offered by the analysis that led to the CysRS prediction for MJ1477.</p>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>Multiple alignment of the polygalactosaminidase family that includes MJ1477, the alleged archaeal CysRS.</p>
               </caption>
               <text>
                  <p>Multiple alignment of the polygalactosaminidase family that includes MJ1477, the alleged archaeal CysRS. Proteins are denoted by their gene name, followed by their species abbreviations and GenBank identifier (GI) numbers. The coloring reflects the 100% consensus. The consensus abbreviations and coloring scheme used in this and subsequent figures are as follows. Hydrophobic residues (h; LIYFMWACV) and aliphatic (l;LIAV) residues are shaded yellow. Colored magenta are alcohol (o; ST), charged (c; KERDH), basic (+; KRH), acidic (-; DE), and polar (p;STEDRKHNQ) residues. Small (s; SAGDNPVT) residues are colored green and big (b; LIFMWYERKQ) residues are shaded gray. The hydrophobic residues of the signal peptide are highlighted in yellow. In the Secondary Structure line, H indicates a helix and E indicates extended conformation (b strand). Aqa, <it>Aquifex aeolicus</it>; Dr, <it>Deinococcus radiodurans</it>; Mj, <it>Methanococcus jannaschii</it>; Pa, <it>Pseudomonas aeruginosa</it>; Ps, <it>Pseudomonas</it> species; Scoe, <it>Streptomyces coelicolor</it>; Strgi, <it>Streptomyces griseus</it>; Tm, <it>Thermotoga maritima.</it></p>
               </text>
               <graphic file="gb-2001-2-12-research0051-1"/>
            </fig>
            <p>Therefore we are forced to conclude that MJ1477 and its homologs are not related to CysRS and there is nothing in the computational analysis of these proteins that would point to an aaRS activity. In contrast, we predict these proteins to be extracellular polygalactosaminidases or similar polysaccharide hydrolases. The polysaccharide hydrolase and aaRS functions seem to be essentially incompatible. First, a secreted enzyme is unlikely to function as an aaRS whose site of action is, by definition, intracellular. Second, even if an entirely new class of aaRSs is postulated, the reaction catalyzed by this new aaRS does not resemble polysaccharide hydrolysis or its reversal. Aminoacyl-tRNA synthetases catalyze a succession of reactions, which involve: hydrolysis of the &#945;-&#946; phosphate bond in ATP; condensation of AMP with the cognate amino acid, resulting in the formation of an aminoacyl-adenylate; displacement of the AMP moiety of the aminoacyl-adenylate with the cognate tRNA, producing aminoacyl-tRNA. Even if the two condensation reactions, in very general terms, could be considered a reversal of the polysaccharide hydrolysis reaction, there is no indication that polysaccharide hydrolases could bind and hydrolyze ATP, and the multiple alignment of the MJ1477 family did not include any conserved signatures typical of potential phosphate-binding loops (Figure <figr fid="F1">1</figr>). Neither does this family contain any recognizable RNA-binding domains. Finally, <it>M. thermoautotrophicum</it> does not encode any homologs of MJ1477, ruling out the possibility that this family encompasses CysRS of both archaeal methanogens. Taken together, these observations appear to effectively refute the prediction of a CysRS activity, thus pitting computational results against experimental data.</p>
         </sec>
         <sec>
            <st>
               <p>MJ0301: a predicted dihydropteroate synthase</p>
            </st>
            <p>Dihydropteroate synthase (DHPS) catalyzes the condensation of <it>p</it>-aminobenzoic acid with 7,8-dihydro-6-hydroxymethylpterin pyrophosphate to give 7,8-dihydropteroate, an intermediate in folate metabolism. The protein from <it>Staphylococcus,</it> a Gram-positive bacterium, has been crystallized and shown to adopt a TIM-barrel structure [<abbr bid="B24">24</abbr>]. Although it has been indicated that no DHPS could be detected in archaeal genomes [<abbr bid="B25">25</abbr>], orthologs of bacterial DHPS are readily identifiable in all archaea; this enzyme is missing only in animals and in several intracellular bacterial pathogens, such as <it>Rickettsia prowazekii,</it> spirochetes and mycoplasmas (COG0294 in the database of Clusters of Orthologous Groups of proteins (COGs)) [<abbr bid="B26">26</abbr>]. Most archaea have a distinct version of DHPS that shows relatively low sequence similarity to the bacterial orthologs and contains an additional uncharacterized carboxy-terminal domain. This previously undetected domain is also present in some other enzymes of pterin biosynthesis, such as tetrahydromethanopterin-S-methyltransferase from <it>Streptomyces</it> (L.M.I., LA. and E.V.K., unpublished observation). Some archaeal species, including <it>Thermoplasma</it> and <it>Halobacterium,</it> have the bacterial-type DHPS, which was probably acquired by horizontal gene transfer and displaced the original archaeal version. Despite the relatively low sequence similarity to bacterial DHPS, all archaeal orthologs have the conserved catalytic residues identified in DHPS (Figure <figr fid="F2">2</figr>) and are confidently predicted, by the hybrid-fold-recognition method, to assume the same fold as DHPS from <it>Pneumocystis carinii</it> and <it>Staphylococcus aureus</it> whose crystal structures have been determined.</p>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>Multiple alignment of predicted archaeal dihydropteroate synthases.</p>
               </caption>
               <text>
                  <p>Multiple alignment of predicted archaeal dihydropteroate synthases. The scheme for displaying multiple alignments is as described in the legend to Figure <figr fid="F1">1</figr>. The consensus secondary structure was derived from the crystal structures of the <it>Staphylococcus aureus</it>, <it>Mycobacterium tuberculosis</it> and <it>Escherichia coli</it> DHPS (Protein Data Bank ID: 1AD1, EYE, 1AJ0). Residues are colored at 90% consensus. Af, <it>Archaeoglobus fulgidus</it>;Ape, <it>Aeropyrum pernix</it>; At, <it>Arabidopsis thaliana</it>; Ec, <it>Escherichia coli</it>; Mj, <it>Methanococcus jannaschii</it>;Mt, <it>Mycobacterium tuberculosis</it>; Mth, <it>Methanobacterium thermoautotrophicum</it>; Sa, <it>Staphylococcus aureus</it>; Sc, <it>Saccharomyces cerevisiae</it>; Pab, <it>Pyrococcus abyssi</it>.</p>
               </text>
               <graphic file="gb-2001-2-12-research0051-2"/>
            </fig>
            <p>An analysis using ORF, a program developed to recognize folds by comparing predicted secondary structures of proteins ([<abbr bid="B27">27</abbr>]; we are unaware of a published detailed description of this method), identified MJ0301 as a homolog of DHPS, although, given the low sequence similarity, a convergent origin of the relationship between MJ0301 and DHPS was deemed likely (there seems to be a terminological confusion involved here, but we are quoting the results of the original computational analysis of this protein as they have been presented). It was acknowledged that MJ0107 (a member of COG0294) could be identified as a possible homolog of DHPS by sequence-based methods, and this protein was assayed for dihydropteroate synthase activity, but none was detected [<abbr bid="B25">25</abbr>]. In contrast, DHPS activity (albeit relatively low) was shown <it>in vitro</it> for the partially purified MJ0301 protein [<abbr bid="B25">25</abbr>]. However, MJ0301 has been shown to belong to the metallo-&#946;-lactamase superfamily of enzymes and, in the evolutionary classification of metallo-&#946;-lactamases, belongs to an archaea-specific family (Figure <figr fid="F2">2</figr>; COG1237) [<abbr bid="B28">28</abbr>]. Metallo-&#946;-lactamases encompass a wide range of metal-dependent hydrolytic and oxidoreductase activities with a variety of substrates and are particularly abundant in archaea where some of them are involved in RNA processing [<abbr bid="B28">28</abbr>]. None of these enzymes catalyzes a reaction resembling the condensation reaction catalyzed by DHPS. The characteristic motifs of metallo-&#946;-lactamases, which mostly include metal-binding histidines, are highly conserved in MJ0301 and its orthologs (Figure <figr fid="F3">3</figr>). In contrast, most of the MJ0301 residues described as equivalent to the functionally important residues of <it>Escherichia coli</it> dihydropteroate synthase are not conserved, even among the archaeal orthologs of this protein. Finally, the &#946;-lactamase fold consists of two subdomains of the &#946;4-&#945;-&#946;-&#945; topology whose &#946; sheets are sandwiched against each other; in structural terms, these domains are completely different from the TIM-barrel, with which the ORF program matched the MJ0301 structural prediction. Taken together, these observations are sufficient to reject the proposed relationship between MJ0301 and dihydropteroate synthases.</p>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>Multiple alignment of the archaea-specific family of predicted metallo-&#946;-lactamase superfamily hydrolases that includes the alleged archaeal dihydropteroate synthase, MJ0301.</p>
               </caption>
               <text>
                  <p>Multiple alignment of the archaea-specific family of predicted metallo-&#946;-lactamase superfamily hydrolases that includes the alleged archaeal dihydropteroate synthase, MJ0301. The scheme for displaying multiple alignments is as described in the legend to Figure <figr fid="F1">1</figr>. A consensus secondary structure was derived from the crystal structure metallo-&#946;-lactamases from <it>Stenotrophomonas maltophilia</it> (1SML) and <it>Bacteroides fragilis</it> (1A7T). Residues are colored at 90% consensus. Bfr, <it>Bacteroides fragilis</it>; Bsp, <it>Bacillus</it> species 170; Mj, <it>M. jannaschii</it>; Mth, <it>M. thermoautotrophicum</it>; Pab, <it>P. abyssi</it>; Ph, <it>P. horikoshii</it>; Stma, <it>S. maltophilia</it>; Tm, <it>Thermotoga maritima.</it></p>
               </text>
               <graphic file="gb-2001-2-12-research0051-3"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>MJ0757: a predicted thymidylate synthase</p>
            </st>
            <p>Thymidylate synthase is a central enzyme of pyrimidine metabolism that catalyzes the formation of deoxythymidine monophosphate (dTMP) from deoxyuridine monophosphate (dUMP) by transfer of a methyl group to its pyrimidine ring. This reaction is catalyzed by at least two unrelated enzymes. The canonical thymidylate synthase (TS), such as the <it>E. coli</it> ThyA, is a protein with a distinct &#945;/&#946;-fold that transfers a methyl group to dUMP from 5,10-methylenetetrahydrofolate [<abbr bid="B29">29</abbr>]. This classic TS is readily identifiable in many (but not all) bacteria, eukaryotes and three archaeal species, <it>Archaeoglobus fulgidus, M. jannaschii,</it> and <it>M. thermoautotrophicum</it> (COG0207). The archaeal members of the TS family share with their bacterial orthologs all the conserved residues involved in catalysis (Figure <figr fid="F4">4</figr>).</p>
            <fig id="F4">
               <title>
                  <p>Figure 4</p>
               </title>
               <caption>
                  <p>Multiple alignment of predicted archaeal thymidylate synthases (TS).</p>
               </caption>
               <text>
                  <p>Multiple alignment of predicted archaeal thymidylate synthases (TS). The scheme for displaying multiple alignments is as described in the legend to Figure <figr fid="F1">1</figr>. Residues are colored at 90% consensus. A consensus secondary structure was derived using known TS structures from <it>R. norvegicus</it>, <it>E. coli</it> and bacteriophage T4 deoxycytidylate hydroxymethyltransferase (1B5D). The <it>Archaeoglobus fulgidus</it> TS has a duplication of the TS domain and the amino-terminal domain (N.TS_Af; shaded gray) is predicted to be inactive. Af, <it>Archaeoglobus fulgidus</it>; At, <it>Arabidopsis thaliana</it>; BPSP1; bacteriophage SP1; Bs, <it>B. subtilis</it>; Dm, <it>Drosophila melanogaster</it>; Dr, <it>D. radiodurans</it>; Ec, <it>E. coli</it>; Mj, <it>M. jannaschii</it>; Mt, <it>M. tuberculosis</it>; Mth, <it>M. thermoautotrophicum</it>; Nm, <it>Neisseria meningitidis</it>; Rn, <it>R. norvegicus</it>; T2, bacteriophage T2; Xf, <it>Xylella fastidiosa</it>.</p>
               </text>
               <graphic file="gb-2001-2-12-research0051-4"/>
            </fig>
            <p>An alternative TS or its subunit is predicted to be encoded by a gene from <it>Dictyostelium</it> that rescues a slime mold mutant auxotrophic for thymidylate [<abbr bid="B30">30</abbr>]. This protein is not homologous to the canonical TS, but its orthologs in bacteria and archaea show an almost perfect complementary phyletic distribution (COG1351).</p>
            <p>In a screen for the TS in <it>M. jannaschii,</it> the ORF method picked the MJ0757 protein as the most likely homolog of the canonical TS family [<abbr bid="B27">27</abbr>]. In the validation experiment, MJ0757 overexpressed in <it>E. coli</it> was shown to possess TS activity [<abbr bid="B25">25</abbr>]. Sequence searches show that MJ0757 belongs to a small family of euryarchaea-specific proteins of uncharacterized function (COG1810). Of the 17 residues reported to be conserved between MJ0757 and the TS family, only seven were conserved thoughout the MJ0757 family (Figure <figr fid="F5">5</figr>). Moreover, a comparison of the secondary structure elements derived from the reported three-dimensional model of MJ0757 [<abbr bid="B27">27</abbr>] and those derived from a prediction generated using a multiple alignment query with the structure-prediction program PHD (such predictions typically exceed 70% accuracy), showed an overlap of just two of the 16 or so secondary structural elements (Figure <figr fid="F5">5</figr>). Conversely, several sequence motifs that are characteristic of the MJ0757 family did not overlap with the conserved regions in the MJ0757-TS alignment (Figure <figr fid="F5">5</figr>). Furthermore, some, but not all, members of the MJ0757 family contain an amino-terminal insertion of a small, metal-chelating module (Figure <figr fid="F5">5</figr>), which was used to improve the alignment with the <it>E. coli</it> TS [<abbr bid="B25">25</abbr>], although this region was variable even within the MJ0757 family itself. On the basis of these observations, a relationship between MJ0757 and the canonical TS has to be rejected. The actual fold and function of MJ0757 and its homologs cannot be predicted at present. However, these proteins have several features that suggest that they might be metal-dependent enzymes potentially involved in redox reactions. These suggestive features include the fusion with a ferredoxin domain seen in the <it>M. thermoautotrophicum</it> member MTH601, the insertion of the metal-binding module in certain members, including MJ0757 (see above), and the presence of three cysteines that are conserved throughout this family.</p>
            <fig id="F5">
               <title>
                  <p>Figure 5</p>
               </title>
               <caption>
                  <p>Multiple alignment of the uncharacterized archaeal protein family that includes the alleged archaeal thymidylate synthase, MJ0757.</p>
               </caption>
               <text>
                  <p>Multiple alignment of the uncharacterized archaeal protein family that includes the alleged archaeal thymidylate synthase, MJ0757. The scheme for displaying multiple alignments is as described in the legend to Figure <figr fid="F1">1</figr>. Residues are colored at 100% consensus. In addition, metal-chelating residues in an inserted module shared by orthologs of MJ0757 are shaded blue. The asterisks denote residues in MJ0757 that were predicted to be conserved between MJ0757 and TS. Also shown are predicted secondary structures for the MJ0757 family that were obtained by using the PHD program, and the TS-like secondary structure predicted for MJ0757 in [25]. Af, <it>A. fulgidus</it>; Mj, <it>M. jannaschii</it>; Mth, <it>M. thermoautotrophicum</it>.</p>
               </text>
               <graphic file="gb-2001-2-12-research0051-5"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Cmpp16: a plant 'paralog' of plant viral movement proteins</p>
            </st>
            <p>Viral movement proteins (MPs) are encoded by diverse, unrelated families of plant viruses, such as positive-strand RNA, negative-strand RNA, single-stranded DNA and double-stranded DNA viruses, and are essential for cell-to-cell movement of all these viruses [<abbr bid="B31">31</abbr>,<abbr bid="B32">32</abbr>]. To isolate potential host homologs of the red clover necrotic mosaic virus (RCNMV) MP, antibodies to this protein were used to screen phloem extracts of <it>Cucurbita maxima,</it> resulting in the detection of a protein designated Cmpp16. This protein was identified as a 'paralog' (generally, this term refers to homologous genes related by duplication within the same genome) of the viral MPs on the basis of sequence similarity detected using the Megalign program [<abbr bid="B33">33</abbr>]. Subsequently, Cmpp16 was shown to bind RNA, which is a common property of viral MPs, and to induce an increase of the size-exclusion limit of plasmodesmata, also a mechanism associated with the MPs [<abbr bid="B33">33</abbr>].</p>
            <p>However, computational analysis of the Cmpp16 sequence reveals a picture that is incompatible with a homologous relationship with MPs. Cmpp16 consists mostly of a C2 domain that is readily detected by PSI-BLAST or by profile-searching engines such as the CD-search [<abbr bid="B34">34</abbr>]. The Cmpp16 sequence contains all critical residues of the C2 domain (Figure <figr fid="F6">6</figr>). C2 domains bind a variety of substrates, such as Ca<sup>2+</sup>, phospholipids, inositol polyphosphates and other proteins, but apparently not RNA [<abbr bid="B35">35</abbr>]. There is no detectable similarity between C2 domains and the MPs, and conserved motifs in the published alignment of Cmpp16 and the RCNMV MP do not correspond to those in C2 domains; moreover, many of the residues described as conserved in Cmpp16 and MP are not conserved within the viral movement protein family itself. Thus, we conclude that viral MPs and Cmpp16, a C2-domain protein, are not homologs. Subsequently, a similar methodology has been employed to detect a relationship between Cmpp36 (a cytochrome B5 reductase), Cmpp16 and the RCNMV movement protein [<abbr bid="B36">36</abbr>]. As in the above case of Cmpp16, this relationship of a cytochrome B5 reductase with the viral movement proteins appears to be spurious (data not shown).</p>
            <fig id="F6">
               <title>
                  <p>Figure 6</p>
               </title>
               <caption>
                  <p>Multiple alignment of a selection of C2 domains including the alleged 'paralog' of plant virus movement proteins, Cmpp16.</p>
               </caption>
               <text>
                  <p>Multiple alignment of a selection of C2 domains including the alleged 'paralog' of plant virus movement proteins, Cmpp16. The scheme for displaying multiple alignments is as described in the legend to Figure <figr fid="F1">1</figr>. Residues are colored at 100% consensus. A consensus secondary structure was derived from known structures of the C2 domains in phospholipase C-&#948;1 (1QAT), synaptotagmin (1RSY), and protein kinase C (1A25). At: <it>A. thaliana</it>, Cm: <it>Cucurbita maxima</it>, Le: <it>Lycopersicon esculentum</it>, Os: <it>Oryza sativa</it>, Rn: <it>R. norvegicus</it>.</p>
               </text>
               <graphic file="gb-2001-2-12-research0051-6"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Human activating transcription factor-2 (ATF-2): a predicted histone acetyltransferase</p>
            </st>
            <p>Histone acetyltransferases (HAT) are key regulators of eukaryotic transcription. GCN5-like HATs, which modulate chromatin-associated transcription, belong to a vast superfamily of amino-group acetyl- and myristoyl-transferases with extremely diverse functions [<abbr bid="B37">37</abbr>]. ATF-2 is a basic leucine zipper (b-ZIP) family transcription factor that binds to cyclic AMP-response elements (CRE) and activates transcription [<abbr bid="B38">38</abbr>]. Vertebrate ATF-2 also has an amino-terminal zinc finger, which is involved in transcription activation [<abbr bid="B39">39</abbr>]. Non-vertebrate orthologs of ATF-2, in <it>Drosophila, Caenorhabditis elegans</it> and yeasts, lack the zinc finger. In experiments designed to isolate ATF-2-associated HAT, ATF-2 alone was shown to be sufficient for the acetyltransferase activity. Examining the region of ATF-2 that showed HAT activity, the authors found some sequence similarity and at least one motif resembling the acetyltransferase superfamily and concluded that ATF-2 contained a GCN5-like acetyltransferase domain [<abbr bid="B40">40</abbr>]. Subsequent site-directed mutagenesis supported the importance of the reported acetyltransferase motifs for the HAT activity of ATF-2.</p>
            <p>However, profile-based sequence searches and attempts at fold recognition failed to detect any relationship between ATF-2 and the acetyltransferase superfamily. The region designated as having HAT activity and containing the acetyltransferase domain shows poor conservation between orthologs and closely related paralogs of the ATF-2 family, especially in the sequence identified as the most prominent A motif of the acetyltransferase family (Figure <figr fid="F7">7</figr>). Furthermore, complexity analysis using the SEG program, with the parameters adjusted for decomposition of a protein into globular and non-globular regions [<abbr bid="B41">41</abbr>], predicted that the entire region of the ATF-2 protein between the amino-terminal zinc finger and the carboxy-terminal helical b-ZIP was unstructured. This is consistent with the structural prediction derived using the PHD program that indicated no regular secondary structure in this region. Thus, the relationship between ATF-2 and the GCN5-like acetyltransferase superfamily seems to be invalid, leaving the structural basis for the reported acetyltransferase activity of ATF-2 an open issue.</p>
            <fig id="F7">
               <title>
                  <p>Figure 7</p>
               </title>
               <caption>
                  <p>Multiple alignment of the region of the ATF-2 transcription factor and its homologs identified as a GCN5-like acetyltransferase domain.</p>
               </caption>
               <text>
                  <p>Multiple alignment of the region of the ATF-2 transcription factor and its homologs identified as a GCN5-like acetyltransferase domain. The scheme for displaying multiple alignments is as described in the legend to Figure <figr fid="F1">1</figr>. Residues are colored at 100% consensus. Ce: <it>Caenorhabditis elegans,</it> Hs: <it>Homo sapiens</it>, Sp: <it>Schizosaccharomyces pombe</it>.</p>
               </text>
               <graphic file="gb-2001-2-12-research0051-7"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Predicted PAS domain in the phytochrome-interacting transcription factor PIF3</p>
            </st>
            <p>PAS domains are sensory modules in various signal transduction proteins from all major lineages of cellular life [<abbr bid="B42">42</abbr>]. PAS domains are typically implicated in sensing oxygen, redox potential, light and small ligands [<abbr bid="B43">43</abbr>]. In addition, PAS domains are sites for protein-protein interactions and are responsible for the formation of homo- and hetero-dimers in several signal transduction pathways that involve transcriptional activation. A PAS domain has been reported in the transcription factor PIF3 from <it>Arabidopsis,</it> which interacts with a phytochrome photoreceptor and transduces light signals to photoresponsive plant genes [<abbr bid="B44">44</abbr>]. It has been hypothesized that the purported PAS domain of PIF3 directly interacts with the PAS domains of the phytochrome [<abbr bid="B44">44</abbr>]. This hypothesis was later tested experimentally and evidence was presented that the PAS domain of PIF3 indeed was a major contributor to the interaction between the two proteins [<abbr bid="B45">45</abbr>].</p>
            <p>PIF3 belongs to a plant-specific family of basic helix-loop-helix (bHLH)-domain- containing proteins that, in addition to the bHLH domain, have an uncharacterized conserved domain at the amino terminus present in single or duplicate copies (L.M.I., I.Z., L.A. and E.V.K., unpublished observations). The PIF3 family currently consists of about eight paralogous proteins in <it>Arabidopsis</it> and an ortholog from rice. The region predicted to be a PAS domain is poorly conserved in the rice ortholog of PIF3 and the paralogs from <it>Arabidopsis.</it> An alignment with the rice ortholog indicated that the proposed PAS domain was a rapidly diverging, compositionally biased sequence (Figure <figr fid="F8">8</figr>). Complexity analysis using the SEG program showed that the reported PAS domain mapped to a region that was predicted to be entirely non-globular. All attempts to objectively detect a PAS domain in PIF3 using sensitive profile methods based on PSI-BLAST-derived scoring matrices or Hidden Markov Models (HMM) failed. Additionally, secondary-structure prediction for the proposed PAS region using PHD indicated that this region is largely unstructured. These observations appear to be sufficient to reject the presence of a PAS domain in PIF3 although the region thought to be a PAS domain could indeed be involved in the interaction with phytochrome.</p>
            <fig id="F8">
               <title>
                  <p>Figure 8</p>
               </title>
               <caption>
                  <p>A comparison of the multiple alignments of PIF3, its rice ortholog, and PAS domain proteins.</p>
               </caption>
               <text>
                  <p>A comparison of the multiple alignments of PIF3, its rice ortholog, and PAS domain proteins. The scheme for displaying multiple alignments is as described in the legend to Figure <figr fid="F1">1</figr>. Residues are colored at 90% consensus. A consensus secondary structure was derived from those available for FixL (1EW0) and photoactive yellow protein (3PYP). Aa, <it>A. aeolicus</it>; Af, <it>A. fulgidus</it>; At, <it>A. thaliana</it>; Av, <it>Azotobacter vinelandii</it>; Bs, <it>B. subtilis</it>; Dm, <it>D. melanogaster</it>; Ec, <it>E. coli</it>; Eh, <it>Ectothiorhodospira halophila</it>; Nc: <it>Neurospora crassa</it>; Os, <it>O. sativa</it>, Rm: <it>Rhizobium meliloti</it>.</p>
               </text>
               <graphic file="gb-2001-2-12-research0051-8"/>
            </fig>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Discussion and conclusions</p>
         </st>
         <p>In the six cases described above, we provide evidence for rejecting the homologous relationships and functional predictions inferred for the proteins in question by using computational methods. The number of examples in this category could be increased, and some have already been considered in the literature, for example the spurious discovery of a 'functional PDZ domain' in the molecular chaperone ClpA ([<abbr bid="B46">46</abbr>], see refutation in [<abbr bid="B47">47</abbr>]) or the finding of an ATPase domain and death effector domains in the apoptosis-associated protein FLASH ([<abbr bid="B48">48</abbr>], see refutation in [<abbr bid="B49">49</abbr>]). The common and most striking aspect of all these cases is that the predictions based on apparently erroneous computational analysis were supported by experiments. What are the solutions to this clash between computational and experimental evidence?</p>
         <p>We envisage three main possibilities. The first, experiment-centered view would hold that experimental evidence always has the upper hand and that, even if the alternative computational solutions that we describe here seem more plausible than the original predictions, the latter are correct insofar as they are supported by experiment. Epistemologically, this argument is not sound because hypotheses (computational predictions in this case) cannot be proved by the success of the experiments they prompt. They can only be falsified by experiments producing results incompatible with the predictions [<abbr bid="B50">50</abbr>]. Simply put, the experiments could have worked for a wrong reason. For example, this seems particularly likely in the case of the site-directed mutagenesis of the transcription factor ATF-2 discussed above. The mutagenized residues probably are indeed important for the function of this protein, but not because they are part of a GCN5-like acetyltransferase domain, which this protein does not contain. Similar logic applies to the case of the predicted, but apparently nonexistent, PAS domain in the transcription factor PIF3. More important, however, computational predictions are falsifiable within the realm of computational analysis itself. Falsification is offered by alternative, unequivocally supported predictions that are incompatible with the original ones. In four of the six cases described (CysRS, DHPS, TS and MP), such evidence was obtained by computational methods.</p>
         <p>The second possibility is that, although the computational predictions described here are correct, whereas the original ones are wrong, the experimental evidence is also solid. In each of the described cases, this would elevate the biochemical activities identified through these experiments to the status of major, unexpected discoveries, because the chemistry underlying them would have to be extremely unusual. In particular, if the identification of the <it>M. jannaschii</it> cysteinyl-tRNA synthetase is indeed correct, this enzyme would have to be a derivative of a specific family of polysaccharide hydrolases containing a signal peptide but no recognizable ATP-binding or RNA-binding domains.</p>
         <p>The third explanation is that the original computational predictions triggered over-interpretation of the experimental results that, in reality, might have been obtained as a result of nonspecific activities, contamination or other artifacts. In this regard, it is important to realize that not only computational predictions, but biological experiments also, are intrinsically error-prone and open to conflicting interpretations. The probabilistic nature of computational analyses is well realized (and at times, perhaps, overrated) by most researchers, probably because explicit calculation of probability or likelihood is at the core of most widely used computer methods for sequence and structure analyses. In this regard, it is prudent to note that the alternative computational predictions presented here should be considered to be 'more likely' than the original ones, rather than to contradict the latter in an absolute sense. As we attempted to show above, however, the difference in the likelihood of two mutually incompatible predictions can be overwhelming, with one supported by multiple lines of evidence as opposed to the other. In contrast to computational studies, experimental ones are often, consciously or unconsciously, treated as demonstration of 'final truth'. In reality, however, probabilistic inference is inherent in practically any interpretation of experimental results when questions are asked such as "How likely is it that the protein under study has a particular biochemical activity <it>in vivo</it>?" or "How central is this activity for the <it>in vivo</it> function of the protein under study, given the results of a surrogate <it>in vitro</it> assay?" Thus, certain experimental designs may not be appropriate to ascertain the actual <it>in vivo</it> biochemistry of a protein. Furthermore, even if the particular activities detected under these conditions are genuine, the likelihood of these being relevant <it>in vivo</it> needs to be additionally assessed. Accordingly, when strong computational predictions seem not to be borne out by experiment, the conditions and design of the experiments deserve special scrutiny: they might have given a negative result for a wrong reason. A case in point is the MJ0107 protein, the apparent archaeal ortholog of DHPS, which failed to show dihydropteroate synthase activity [<abbr bid="B25">25</abbr>]. We strongly believe that this issue needs to be revisited. All this considered, the results of independent application of computational and experimental techniques tend to be complementary, and useful in adding or reducing confidence in the biological conclusions of a particular study.</p>
         <p>Finally, it should be emphasized that these cautionary notes on application of computational methods in protein function prediction in no way suggest that new computational approaches that depart sharply from more established ones are doomed to failure. Indeed, the most popular advanced search methods based on sequence profiles - PSI-BLAST and Hidden Markov Model (HMM) search - are rather recent innovations [<abbr bid="B11">11</abbr>,<abbr bid="B51">51</abbr>,<abbr bid="B52">52</abbr>]. Furthermore, methods based on a different principle, such as protein sequence-structure threading, have a recent history of success despite uncertainties in their statistical foundations [<abbr bid="B22">22</abbr>,<abbr bid="B53">53</abbr>,<abbr bid="B54">54</abbr>,<abbr bid="B55">55</abbr>,<abbr bid="B56">56</abbr>]. It does seem, however, that when a structurally and functionally plausible prediction is produced, with a high confidence, by a well tested, statistically sound computational method, an incompatible prediction yielded by a new method without a clear statistical foundation is most likely to be incorrect.</p>
      </sec>
      <sec>
         <st>
            <p>Materials and methods</p>
         </st>
         <p>The non-redundant protein-sequence database at the National Center for Biotechnology Information (NCBI) was searched using the gapped version of the BLAST program [<abbr bid="B9">9</abbr>]. Sequence-profile searches were carried out using the PSI-BLAST program, with the cut-off for inclusion of sequences into the profile set at <it>E</it> = 0.01 [<abbr bid="B3">3</abbr>,<abbr bid="B9">9</abbr>], and the HMMer program package [<abbr bid="B57">57</abbr>]. Multiple alignments of amino-acid sequences were generated using the T_Coffee program [<abbr bid="B58">58</abbr>]. Protein secondary-structure predictions were generated using the PHD program [<abbr bid="B59">59</abbr>,<abbr bid="B60">60</abbr>], with multiple alignments of individual protein families used as queries. Sequence-structure threading was carried out using the combined-fold-prediction algorithm [<abbr bid="B22">22</abbr>] or the 3D-PSSM algorithm based on the use of a three-dimensional position-specific scoring matrix [<abbr bid="B23">23</abbr>]. Signal peptides in protein sequences were predicted using the SignalP program [<abbr bid="B61">61</abbr>]. The COG database [<abbr bid="B62">62</abbr>,<abbr bid="B63">63</abbr>] was used as a source of information on orthologous relationships between proteins.</p>
      </sec>
   </bdy>
   <bm>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Predicting function: from genes to genomes and back.</p>
            </title>
            <aug>
               <au>
                  <snm>Bork</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Dandekar</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Diaz-Lazcoz</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Eisenhaber</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Huynen</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Yuan</snm>
                  <fnm>Y</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1998</pubdate>
            <volume>283</volume>
            <fpage>707</fpage>
            <lpage>725</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.1998.2144</pubid>
                  <pubid idtype="pmpid" link="fulltext">9790834</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>The impact of comparative genomics on our understanding of evolution.</p>
            </title>
            <aug>
               <au>
                  <snm>Koonin</snm>
                  <fnm>EV</fnm>
               </au>
               <au>
                  <snm>Aravind</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Kondrashov</snm>
                  <fnm>AS</fnm>
               </au>
            </aug>
            <source>Cell</source>
            <pubdate>2000</pubdate>
            <volume>101</volume>
            <fpage>573</fpage>
            <lpage>576</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">10892642</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Gleaning non-trivial structural, functional and evolutionary information about proteins by iterative database searches.</p>
            </title>
            <aug>
               <au>
                  <snm>Aravind</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Koonin</snm>
                  <fnm>EV</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1999</pubdate>
            <volume>287</volume>
            <fpage>1023</fpage>
            <lpage>1040</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.1999.2653</pubid>
                  <pubid idtype="pmpid" link="fulltext">10222208</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Progress in protein structure prediction.</p>
            </title>
            <aug>
               <au>
                  <snm>Murzin</snm>
                  <fnm>AG</fnm>
               </au>
            </aug>
            <source>Nat Struct Biol</source>
            <pubdate>2001</pubdate>
            <volume>8</volume>
            <fpage>110</fpage>
            <lpage>112</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/84088</pubid>
                  <pubid idtype="pmpid" link="fulltext">11175896</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Statistical methods and insights for protein and DNA sequences.</p>
            </title>
            <aug>
               <au>
                  <snm>Karlin</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Bucher</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Brendel</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Altschul</snm>
                  <fnm>SF</fnm>
               </au>
            </aug>
            <source>Annu Rev Biophys Biophys Chem</source>
            <pubdate>1991</pubdate>
            <volume>20</volume>
            <fpage>175</fpage>
            <lpage>203</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1146/annurev.bb.20.060191.001135</pubid>
                  <pubid idtype="pmpid">1867715</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Chance and statistical significance in protein and DNA sequence analysis.</p>
            </title>
            <aug>
               <au>
                  <snm>Karlin</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Brendel</snm>
                  <fnm>V</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1992</pubdate>
            <volume>257</volume>
            <fpage>39</fpage>
            <lpage>49</lpage>
            <xrefbib>
               <pubid idtype="pmpid">1621093</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Applications and statistics for multiple high-scoring segments in molecular sequences.</p>
            </title>
            <aug>
               <au>
                  <snm>Karlin</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Altschul</snm>
                  <fnm>SF</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>1993</pubdate>
            <volume>90</volume>
            <fpage>5873</fpage>
            <lpage>5877</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">46825</pubid>
                  <pubid idtype="pmpid" link="fulltext">8390686</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Statistical studies of biomolecular sequences: score-based methods.</p>
            </title>
            <aug>
               <au>
                  <snm>Karlin</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Phil Trans R Soc Lond B</source>
            <pubdate>1994</pubdate>
            <volume>344</volume>
            <fpage>391</fpage>
            <lpage>402</lpage>
            <xrefbib>
               <pubid idtype="pmpid">7800709 </pubid>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.</p>
            </title>
            <aug>
               <au>
                  <snm>Altschul</snm>
                  <fnm>SF</fnm>
               </au>
               <au>
                  <snm>Madden</snm>
                  <fnm>TL</fnm>
               </au>
               <au>
                  <snm>Schaffer</snm>
                  <fnm>AA</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Lipman</snm>
                  <fnm>DJ</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1997</pubdate>
            <volume>25</volume>
            <fpage>3389</fpage>
            <lpage>3402</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">146917</pubid>
                  <pubid idtype="pmpid" link="fulltext">9254694</pubid>
                  <pubid idtype="doi">10.1093/nar/25.17.3389</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>The estimation of statistical parameters for local alignment score distributions.</p>
            </title>
            <aug>
               <au>
                  <snm>Altschul</snm>
                  <fnm>SF</fnm>
               </au>
               <au>
                  <snm>Bundschuh</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Olsen</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Hwa</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2001</pubdate>
            <volume>29</volume>
            <fpage>351</fpage>
            <lpage>361</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">29669</pubid>
                  <pubid idtype="pmpid" link="fulltext">11139604</pubid>
                  <pubid idtype="doi">10.1093/nar/29.2.351</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <aug>
               <au>
                  <snm>Durbin</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Eddy</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Krogh</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Mitchison</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge, MA: Cambridge University Press;</source>
            <pubdate>1998</pubdate>
         </bibl>
         <bibl id="B12">
            <title>
               <p>The renaissance of aminoacyl-tRNA synthesis.</p>
            </title>
            <aug>
               <au>
                  <snm>Ibba</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Soll</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>EMBO Rep</source>
            <pubdate>2001</pubdate>
            <volume>2</volume>
            <fpage>382</fpage>
            <lpage>387</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11375928</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Non-orthologous gene displacement.</p>
            </title>
            <aug>
               <au>
                  <snm>Koonin</snm>
                  <fnm>EV</fnm>
               </au>
               <au>
                  <snm>Mushegian</snm>
                  <fnm>AR</fnm>
               </au>
               <au>
                  <snm>Bork</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Trends Genet</source>
            <pubdate>1996</pubdate>
            <volume>12</volume>
            <fpage>334</fpage>
            <lpage>336</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0168-9525(96)20010-1</pubid>
                  <pubid idtype="pmpid" link="fulltext">8855656</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>Analogous enzymes: independent inventions in enzyme evolution.</p>
            </title>
            <aug>
               <au>
                  <snm>Galperin</snm>
                  <fnm>MY</fnm>
               </au>
               <au>
                  <snm>Walker</snm>
                  <fnm>DR</fnm>
               </au>
               <au>
                  <snm>Koonin</snm>
                  <fnm>EV</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>1998</pubdate>
            <volume>8</volume>
            <fpage>779</fpage>
            <lpage>790</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">9724324</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>One polypeptide with two aminoacyl-tRNA synthetase activities.</p>
            </title>
            <aug>
               <au>
                  <snm>Stathopoulos</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Longman</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Vothknecht</snm>
                  <fnm>UC</fnm>
               </au>
               <au>
                  <snm>Becker</snm>
                  <fnm>HD</fnm>
               </au>
               <au>
                  <snm>Ibba</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Soll</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2000</pubdate>
            <volume>287</volume>
            <fpage>479</fpage>
            <lpage>482</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.287.5452.479</pubid>
                  <pubid idtype="pmpid" link="fulltext">10642548</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Synthesis of cysteinyl-tRNA(Cys) by a genome that lacks the normal cysteine-tRNA synthetase.</p>
            </title>
            <aug>
               <au>
                  <snm>Lipman</snm>
                  <fnm>RS</fnm>
               </au>
               <au>
                  <snm>Sowers</snm>
                  <fnm>KR</fnm>
               </au>
               <au>
                  <snm>Hou</snm>
                  <fnm>YM</fnm>
               </au>
            </aug>
            <source>Biochemistry</source>
            <pubdate>2000</pubdate>
            <volume>39</volume>
            <fpage>7792</fpage>
            <lpage>7798</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1021/bi0004955</pubid>
                  <pubid idtype="pmpid" link="fulltext">10869184</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p><it>Methanococcus jannaschii</it> prolyl-cysteinyl-tRNA synthetase possesses overlapping amino acid binding sites.</p>
            </title>
            <aug>
               <au>
                  <snm>Stathopoulos</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Jacquin-Becker</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Becker</snm>
                  <fnm>HD</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Ambrogelly</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Longman</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Soll</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Biochemistry</source>
            <pubdate>2001</pubdate>
            <volume>40</volume>
            <fpage>46</fpage>
            <lpage>52</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1021/bi002108x</pubid>
                  <pubid idtype="pmpid" link="fulltext">11141055</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>An aminoacyl tRNA synthetase whose sequence fits into neither of the two known classes.</p>
            </title>
            <aug>
               <au>
                  <snm>Fabrega</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Farrow</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Mukhopadhyay</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>de Crecy-Lagard</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Ortiz</snm>
                  <fnm>AR</fnm>
               </au>
               <au>
                  <snm>Schimmel</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2001</pubdate>
            <volume>411</volume>
            <fpage>110</fpage>
            <lpage>114</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/35075121</pubid>
                  <pubid idtype="pmpid" link="fulltext">11333988</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>Evolution of aminoacyl-tRNA synthetases - analysis of unique domain architectures and phylogenetic trees reveals a complex history of horizontal gene transfer events.</p>
            </title>
            <aug>
               <au>
                  <snm>Wolf</snm>
                  <fnm>YI</fnm>
               </au>
               <au>
                  <snm>Aravind</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Grishin</snm>
                  <fnm>NV</fnm>
               </au>
               <au>
                  <snm>Koonin</snm>
                  <fnm>EV</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>1999</pubdate>
            <volume>9</volume>
            <fpage>689</fpage>
            <lpage>710</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">10447505</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>Machine learning approaches for the prediction of signal peptides and other protein sorting signals.</p>
            </title>
            <aug>
               <au>
                  <snm>Nielsen</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Brunak</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>von Heijne</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Protein Eng</source>
            <pubdate>1999</pubdate>
            <volume>12</volume>
            <fpage>3</fpage>
            <lpage>9</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/protein/12.1.3</pubid>
                  <pubid idtype="pmpid" link="fulltext">10065704</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>Molecular cloning and sequence analysis of the gene encoding an endo &#945;-1,4 polygalactosaminidase of <it>Pseudomonas</it> sp. 881.</p>
            </title>
            <aug>
               <au>
                  <snm>Tamura</snm>
                  <fnm>J-I</fnm>
               </au>
               <au>
                  <snm>Kaname</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Kadowaki</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Igarashi</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Kodama</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>J Ferment Bioeng</source>
            <pubdate>1995</pubdate>
            <volume>80</volume>
            <fpage>305</fpage>
            <lpage>310</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1016/0922-338X(95)94196-X</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>Hybrid fold recognition: combining sequence derived properties with evolutionary information.</p>
            </title>
            <aug>
               <au>
                  <snm>Fischer</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Pac Symp Biocomput</source>
            <pubdate>2000</pubdate>
            <fpage>119</fpage>
            <lpage>130</lpage>
            <xrefbib>
               <pubid idtype="pmpid">10902162</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>Enhanced genome annotation using structural profiles in the program 3D-PSSM.</p>
            </title>
            <aug>
               <au>
                  <snm>Kelley</snm>
                  <fnm>LA</fnm>
               </au>
               <au>
                  <snm>MacCallum</snm>
                  <fnm>RM</fnm>
               </au>
               <au>
                  <snm>Sternberg</snm>
                  <fnm>MJ</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2000</pubdate>
            <volume>299</volume>
            <fpage>499</fpage>
            <lpage>520</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.2000.3741</pubid>
                  <pubid idtype="pmpid" link="fulltext">10860755</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>Structure and function of the dihydropteroate synthase from <it>Staphylococcus aureus</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Hampele</snm>
                  <fnm>IC</fnm>
               </au>
               <au>
                  <snm>D'Arcy</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Dale</snm>
                  <fnm>GE</fnm>
               </au>
               <au>
                  <snm>Kostrewa</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Nielsen</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Oefner</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Page</snm>
                  <fnm>MG</fnm>
               </au>
               <au>
                  <snm>Schonfeld</snm>
                  <fnm>HJ</fnm>
               </au>
               <au>
                  <snm>Stuber</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Then</snm>
                  <fnm>RL</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1997</pubdate>
            <volume>268</volume>
            <fpage>21</fpage>
            <lpage>30</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.1997.0944</pubid>
                  <pubid idtype="pmpid" link="fulltext">9149138</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>Identifying two ancient enzymes in Archaea using predicted secondary structure alignment.</p>
            </title>
            <aug>
               <au>
                  <snm>Xu</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Aurora</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Rose</snm>
                  <fnm>GD</fnm>
               </au>
               <au>
                  <snm>White</snm>
                  <fnm>RH</fnm>
               </au>
            </aug>
            <source>Nat Struct Biol</source>
            <pubdate>1999</pubdate>
            <volume>6</volume>
            <fpage>750</fpage>
            <lpage>754</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/11525</pubid>
                  <pubid idtype="pmpid" link="fulltext">10426953</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>COG: Phylogenetic classification of proteins encoded in complete genomes</p>
            </title>
            <url>http://www.ncbi.nlm.nih.gov/COG/</url>
         </bibl>
         <bibl id="B27">
            <title>
               <p>Seeking an ancient enzyme in <it>Methanococcus jannaschii</it> using ORF, a program based on predicted secondary structure comparisons.</p>
            </title>
            <aug>
               <au>
                  <snm>Aurora</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Rose</snm>
                  <fnm>GD</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>1998</pubdate>
            <volume>95</volume>
            <fpage>2818</fpage>
            <lpage>2823</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">19652</pubid>
                  <pubid idtype="pmpid" link="fulltext">9501173</pubid>
                  <pubid idtype="doi">10.1073/pnas.95.6.2818</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B28">
            <title>
               <p>An evolutionary classification of the metallo-&#946;-lactamase fold proteins.</p>
            </title>
            <aug>
               <au>
                  <snm>Aravind</snm>
                  <fnm>L</fnm>
               </au>
            </aug>
            <source>In Silico Biology</source>
            <pubdate>1998</pubdate>
            <volume>1</volume>
            <fpage>8</fpage>
            <url>http://www.bioinfo.de/isb/1998/01/0008/</url>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11471246</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B29">
            <title>
               <p>Crystal structure of <it>Escherichia coli</it> thymidylate synthase with FdUMP and 10-propargyl-5,8-dideazafolate.</p>
            </title>
            <aug>
               <au>
                  <snm>Matthews</snm>
                  <fnm>DA</fnm>
               </au>
               <au>
                  <snm>Appelt</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Oatley</snm>
                  <fnm>SJ</fnm>
               </au>
            </aug>
            <source>Adv Enzyme Regul</source>
            <pubdate>1989</pubdate>
            <volume>29</volume>
            <fpage>47</fpage>
            <lpage>60</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0065-2571(89)90093-9</pubid>
                  <pubid idtype="pmpid">2699154</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B30">
            <title>
               <p>Molecular complementation of a genetic marker in <it>Dictyostelium</it> using a genomic DNA library.</p>
            </title>
            <aug>
               <au>
                  <snm>Dynes</snm>
                  <fnm>JL</fnm>
               </au>
               <au>
                  <snm>Firtel</snm>
                  <fnm>RA</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>1989</pubdate>
            <volume>86</volume>
            <fpage>7966</fpage>
            <lpage>7970</lpage>
            <xrefbib>
               <pubid idtype="pmpid">2813371</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B31">
            <title>
               <p>Cell-to-cell movement of plant viruses. Insights from amino acid sequence comparisons of movement proteins and from analogies with cellular transport systems.</p>
            </title>
            <aug>
               <au>
                  <snm>Mushegian</snm>
                  <fnm>AR</fnm>
               </au>
               <au>
                  <snm>Koonin</snm>
                  <fnm>EV</fnm>
               </au>
            </aug>
            <source>Arch Virol</source>
            <pubdate>1993</pubdate>
            <volume>133</volume>
            <fpage>239</fpage>
            <lpage>257</lpage>
            <xrefbib>
               <pubid idtype="pmpid">8257287</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B32">
            <title>
               <p>The '30K' superfamily of viral movement proteins.</p>
            </title>
            <aug>
               <au>
                  <snm>Melcher</snm>
                  <fnm>U</fnm>
               </au>
            </aug>
            <source>J Gen Virol</source>
            <pubdate>2000</pubdate>
            <volume>81</volume>
            <fpage>257</fpage>
            <lpage>266</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">10640565</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B33">
            <title>
               <p>Plant paralog to viral movement protein that potentiates transport of mRNA into the phloem.</p>
            </title>
            <aug>
               <au>
                  <snm>Xoconostle-Cazares</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Xiang</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Ruiz-Medrano</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>HL</fnm>
               </au>
               <au>
                  <snm>Monzer</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Yoo</snm>
                  <fnm>BC</fnm>
               </au>
               <au>
                  <snm>McFarland</snm>
                  <fnm>KC</fnm>
               </au>
               <au>
                  <snm>Franceschi</snm>
                  <fnm>VR</fnm>
               </au>
               <au>
                  <snm>Lucas</snm>
                  <fnm>WJ</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1999</pubdate>
            <volume>283</volume>
            <fpage>94</fpage>
            <lpage>98</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/aphy.2000.6050</pubid>
                  <pubid idtype="pmpid" link="fulltext">9872750</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B34">
            <title>
               <p>CD-search</p>
            </title>
            <url>http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi</url>
         </bibl>
         <bibl id="B35">
            <title>
               <p>Extending the C2 domain family: C2s in PKCs delta, epsilon, eta, theta, phospholipases, GAPs, and perforin.</p>
            </title>
            <aug>
               <au>
                  <snm>Ponting</snm>
                  <fnm>CP</fnm>
               </au>
               <au>
                  <snm>Parker</snm>
                  <fnm>PJ</fnm>
               </au>
            </aug>
            <source>Protein Sci</source>
            <pubdate>1996</pubdate>
            <volume>5</volume>
            <fpage>162</fpage>
            <lpage>166</lpage>
            <xrefbib>
               <pubid idtype="pmpid">8771209</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B36">
            <title>
               <p>Proteolytic processing of CmPP36, a protein from the cytochrome b(5) reductase family, is required for entry into the phloem translocation pathway.</p>
            </title>
            <aug>
               <au>
                  <snm>Xoconostle-Cazares</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Ruiz-Medrano</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Lucas</snm>
                  <fnm>WJ</fnm>
               </au>
            </aug>
            <source>Plant J</source>
            <pubdate>2000</pubdate>
            <volume>24</volume>
            <fpage>735</fpage>
            <lpage>747</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1046/j.1365-313X.2000.00916.x</pubid>
                  <pubid idtype="pmpid" link="fulltext">11135108</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B37">
            <title>
               <p>GCN5-related histone N-acetyl-transferases belong to a diverse superfamily that includes the yeast SPT10 protein.</p>
            </title>
            <aug>
               <au>
                  <snm>Neuwald</snm>
                  <fnm>AF</fnm>
               </au>
               <au>
                  <snm>Landsman</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Trends Biochem Sci</source>
            <pubdate>1997</pubdate>
            <volume>22</volume>
            <fpage>154</fpage>
            <lpage>155</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0968-0004(97)01034-7</pubid>
                  <pubid idtype="pmpid" link="fulltext">9175471</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B38">
            <title>
               <p>Transcription factor ATF cDNA clones: an extensive family of leucine zipper proteins able to selectively form DNA-binding heterodimers.</p>
            </title>
            <aug>
               <au>
                  <snm>Hai</snm>
                  <fnm>TW</fnm>
               </au>
               <au>
                  <snm>Liu</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Coukos</snm>
                  <fnm>WJ</fnm>
               </au>
               <au>
                  <snm>Green</snm>
                  <fnm>MR</fnm>
               </au>
            </aug>
            <source>Genes Dev</source>
            <pubdate>1989</pubdate>
            <volume>3</volume>
            <fpage>2083</fpage>
            <lpage>2090</lpage>
            <xrefbib>
               <pubid idtype="pmpid">2516827</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B39">
            <title>
               <p>Solution structure of the transactivation domain of ATF-2 comprising a zinc finger-like subdomain and a flexible subdomain.</p>
            </title>
            <aug>
               <au>
                  <snm>Nagadoi</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Nakazawa</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Uda</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Okuno</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Maekawa</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Ishii</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Nishimura</snm>
                  <fnm>Y</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1999</pubdate>
            <volume>287</volume>
            <fpage>593</fpage>
            <lpage>607</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.1999.2620</pubid>
                  <pubid idtype="pmpid" link="fulltext">10092462</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B40">
            <title>
               <p>ATF-2 has intrinsic histone acetyltransferase activity which is modulated by phosphorylation.</p>
            </title>
            <aug>
               <au>
                  <snm>Kawasaki</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Schiltz</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Chiu</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Itakura</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Taira</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Nakatani</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Yokoyama</snm>
                  <fnm>KK</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2000</pubdate>
            <volume>405</volume>
            <fpage>195</fpage>
            <lpage>200</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0168-9002(96)01050-9</pubid>
                  <pubid idtype="pmpid" link="fulltext">10821277</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B41">
            <title>
               <p>Non-globular domains in protein sequences: automated segmentation using complexity measures.</p>
            </title>
            <aug>
               <au>
                  <snm>Wootton</snm>
                  <fnm>JC</fnm>
               </au>
            </aug>
            <source>Comput Chem</source>
            <pubdate>1994</pubdate>
            <volume>18</volume>
            <fpage>269</fpage>
            <lpage>285</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0097-8485(94)85023-2</pubid>
                  <pubid idtype="pmpid">7952898</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B42">
            <title>
               <p>PAS domains: internal sensors of oxygen, redox potential, and light.</p>
            </title>
            <aug>
               <au>
                  <snm>Taylor</snm>
                  <fnm>BL</fnm>
               </au>
               <au>
                  <snm>Zhulin</snm>
                  <fnm>IB</fnm>
               </au>
            </aug>
            <source>Microbiol Mol Biol Rev</source>
            <pubdate>1999</pubdate>
            <volume>63</volume>
            <fpage>479</fpage>
            <lpage>506</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">98974</pubid>
                  <pubid idtype="pmpid" link="fulltext">10357859</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B43">
            <title>
               <p>Regulatory potential, phyletic distribution and evolution of ancient, intracellular small-molecule-binding domains.</p>
            </title>
            <aug>
               <au>
                  <snm>Anantharaman</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Koonin</snm>
                  <fnm>EV</fnm>
               </au>
               <au>
                  <snm>Aravind</snm>
                  <fnm>L</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2001</pubdate>
            <volume>307</volume>
            <fpage>1271</fpage>
            <lpage>1292</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.2001.4508</pubid>
                  <pubid idtype="pmpid" link="fulltext">11292341</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B44">
            <title>
               <p>PIF3, a phytochrome-interacting factor necessary for normal photoinduced signal transduction, is a novel basic helix-loop-helix protein.</p>
            </title>
            <aug>
               <au>
                  <snm>Ni</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Tepperman</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Quail</snm>
                  <fnm>PH</fnm>
               </au>
            </aug>
            <source>Cell</source>
            <pubdate>1998</pubdate>
            <volume>95</volume>
            <fpage>657</fpage>
            <lpage>667</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">9845368</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B45">
            <title>
               <p>Phytochrome B binds with greater apparent affinity than phytochrome A to the basic helix-loop-helix factor PIF3 in a reaction requiring the PAS domain of PIF3.</p>
            </title>
            <aug>
               <au>
                  <snm>Zhu</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Tepperman</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Fairchild</snm>
                  <fnm>CD</fnm>
               </au>
               <au>
                  <snm>Quail</snm>
                  <fnm>PH</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2000</pubdate>
            <volume>97</volume>
            <fpage>13419</fpage>
            <lpage>13424</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">27239</pubid>
                  <pubid idtype="pmpid" link="fulltext">11069292</pubid>
                  <pubid idtype="doi">10.1073/pnas.230433797</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B46">
            <title>
               <p>PDZ-like domains mediate binding specificity in the Clp/Hsp100 family of chaperones and protease regulatory subunits.</p>
            </title>
            <aug>
               <au>
                  <snm>Levchenko</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Smith</snm>
                  <fnm>CK</fnm>
               </au>
               <au>
                  <snm>Walsh</snm>
                  <fnm>NP</fnm>
               </au>
               <au>
                  <snm>Sauer</snm>
                  <fnm>RT</fnm>
               </au>
               <au>
                  <snm>Baker</snm>
                  <fnm>TA</fnm>
               </au>
            </aug>
            <source>Cell</source>
            <pubdate>1997</pubdate>
            <volume>91</volume>
            <fpage>939</fpage>
            <lpage>947</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">9428517</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B47">
            <title>
               <p>AAA+: A class of chaperone-like ATPases associated with the assembly, operation, and disassembly of protein complexes.</p>
            </title>
            <aug>
               <au>
                  <snm>Neuwald</snm>
                  <fnm>AF</fnm>
               </au>
               <au>
                  <snm>Aravind</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Spouge</snm>
                  <fnm>JL</fnm>
               </au>
               <au>
                  <snm>Koonin</snm>
                  <fnm>EV</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>1999</pubdate>
            <volume>9</volume>
            <fpage>27</fpage>
            <lpage>43</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">9927482</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B48">
            <title>
               <p>The CED-4-homologous protein FLASH is involved in Fas-mediated activation of caspase-8 during apoptosis.</p>
            </title>
            <aug>
               <au>
                  <snm>Imai</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Kimura</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Murakami</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Yajima</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Sakamaki</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Yonehara</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>1999</pubdate>
            <volume>398</volume>
            <fpage>777</fpage>
            <lpage>785</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/19709</pubid>
                  <pubid idtype="pmpid" link="fulltext">10235259</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B49">
            <title>
               <p>Apoptosis. Searching for FLASH domains.</p>
            </title>
            <aug>
               <au>
                  <snm>Koonin</snm>
                  <fnm>EV</fnm>
               </au>
               <au>
                  <snm>Aravind</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Hofmann</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Tschopp</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Dixit</snm>
                  <fnm>VM</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>1999</pubdate>
            <volume>401</volume>
            <fpage>662</fpage>
            <lpage>663</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/44317</pubid>
                  <pubid idtype="pmpid" link="fulltext">10537104</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B50">
            <aug>
               <au>
                  <snm>Popper</snm>
                  <fnm>K</fnm>
               </au>
            </aug>
            <source>The Logic of Scientific Discovery. New York/London: Routledge;</source>
            <pubdate>1999</pubdate>
         </bibl>
         <bibl id="B51">
            <title>
               <p>PSI-BLAST - a tool for making discoveries in sequence databases.</p>
            </title>
            <aug>
               <au>
                  <snm>Altschul</snm>
                  <fnm>SF</fnm>
               </au>
               <au>
                  <snm>Koonin</snm>
                  <fnm>EV</fnm>
               </au>
            </aug>
            <source>Trends Biochem Sci</source>
            <pubdate>1998</pubdate>
            <volume>23</volume>
            <fpage>444</fpage>
            <lpage>447</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0968-0004(98)01298-5</pubid>
                  <pubid idtype="pmpid" link="fulltext">9852764</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B52">
            <title>
               <p>Profile hidden Markov models.</p>
            </title>
            <aug>
               <au>
                  <snm>Eddy</snm>
                  <fnm>SR</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>1998</pubdate>
            <volume>14</volume>
            <fpage>755</fpage>
            <lpage>763</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/14.9.755</pubid>
                  <pubid idtype="pmpid" link="fulltext">9918945</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B53">
            <title>
               <p>Statistics of sequence-structure threading.</p>
            </title>
            <aug>
               <au>
                  <snm>Bryant</snm>
                  <fnm>SH</fnm>
               </au>
               <au>
                  <snm>Altschul</snm>
                  <fnm>SF</fnm>
               </au>
            </aug>
            <source>Curr Opin Struct Biol</source>
            <pubdate>1995</pubdate>
            <volume>5</volume>
            <fpage>236</fpage>
            <lpage>244</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0959-440X(95)80082-4</pubid>
                  <pubid idtype="pmpid">7648327</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B54">
            <title>
               <p>GenTHREADER: an efficient and reliable protein fold recognition method for genomic sequences.</p>
            </title>
            <aug>
               <au>
                  <snm>Jones</snm>
                  <fnm>DT</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1999</pubdate>
            <volume>287</volume>
            <fpage>797</fpage>
            <lpage>815</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.1999.2583</pubid>
                  <pubid idtype="pmpid" link="fulltext">10191147</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B55">
            <title>
               <p>Threading with explicit models for evolutionary conservation of structure and sequence.</p>
            </title>
            <aug>
               <au>
                  <snm>Panchenko</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Marchler-Bauer</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Bryant</snm>
                  <fnm>SH</fnm>
               </au>
            </aug>
            <source>Proteins</source>
            <pubdate>1999</pubdate>
            <volume>37</volume>
            <fpage>133</fpage>
            <lpage>140</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1002/(SICI)1097-0134(1999)37:3+&lt;133::AID-PROT18>3.3.CO;2-4</pubid>
                  <pubid idtype="pmpid" link="fulltext">10553140</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B56">
            <title>
               <p>Combination of threading potentials and sequence profiles improves fold recognition.</p>
            </title>
            <aug>
               <au>
                  <snm>Panchenko</snm>
                  <fnm>AR</fnm>
               </au>
               <au>
                  <snm>Marchler-Bauer</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Bryant</snm>
                  <fnm>SH</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2000</pubdate>
            <volume>296</volume>
            <fpage>1319</fpage>
            <lpage>1331</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.2000.3541</pubid>
                  <pubid idtype="pmpid" link="fulltext">10698636</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B57">
            <title>
               <p>Hidden Markov models.</p>
            </title>
            <aug>
               <au>
                  <snm>Eddy</snm>
                  <fnm>SR</fnm>
               </au>
            </aug>
            <source>Curr Opin Struct Biol</source>
            <pubdate>1996</pubdate>
            <volume>6</volume>
            <fpage>361</fpage>
            <lpage>365</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0959-440X(96)80056-X</pubid>
                  <pubid idtype="pmpid">8804822</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B58">
            <title>
               <p>T-Coffee: A novel method for fast and accurate multiple sequence alignment.</p>
            </title>
            <aug>
               <au>
                  <snm>Notredame</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Higgins</snm>
                  <fnm>DG</fnm>
               </au>
               <au>
                  <snm>Heringa</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2000</pubdate>
            <volume>302</volume>
            <fpage>205</fpage>
            <lpage>217</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.2000.4042</pubid>
                  <pubid idtype="pmpid" link="fulltext">10964570</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B59">
            <title>
               <p>Protein fold recognition by prediction-based threading.</p>
            </title>
            <aug>
               <au>
                  <snm>Rost</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Schneider</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Sander</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1997</pubdate>
            <volume>270</volume>
            <fpage>471</fpage>
            <lpage>480</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.1997.1101</pubid>
                  <pubid idtype="pmpid" link="fulltext">9237912</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B60">
            <title>
               <p>PHD - an automatic mail server for protein secondary structure prediction.</p>
            </title>
            <aug>
               <au>
                  <snm>Rost</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Sander</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Schneider</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Comput Appl Biosci</source>
            <pubdate>1994</pubdate>
            <volume>10</volume>
            <fpage>53</fpage>
            <lpage>60</lpage>
            <xrefbib>
               <pubid idtype="pmpid">8193956</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B61">
            <title>
               <p>A neural network method for identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites.</p>
            </title>
            <aug>
               <au>
                  <snm>Nielsen</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Engelbrecht</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Brunak</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>von Heijne</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Int J Neural Syst</source>
            <pubdate>1997</pubdate>
            <volume>8</volume>
            <fpage>581</fpage>
            <lpage>599</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1142/S0129065797000537</pubid>
                  <pubid idtype="pmpid">10065837</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B62">
            <title>
               <p>A genomic perspective on protein families.</p>
            </title>
            <aug>
               <au>
                  <snm>Tatusov</snm>
                  <fnm>RL</fnm>
               </au>
               <au>
                  <snm>Koonin</snm>
                  <fnm>EV</fnm>
               </au>
               <au>
                  <snm>Lipman</snm>
                  <fnm>DJ</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1997</pubdate>
            <volume>278</volume>
            <fpage>631</fpage>
            <lpage>617</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.278.5338.631</pubid>
                  <pubid idtype="pmpid" link="fulltext">9381173</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B63">
            <title>
               <p>The COG database: new developments in phylogenetic classification of proteins from complete genomes.</p>
            </title>
            <aug>
               <au>
                  <snm>Tatusov</snm>
                  <fnm>RL</fnm>
               </au>
               <au>
                  <snm>Natale</snm>
                  <fnm>DA</fnm>
               </au>
               <au>
                  <snm>Garkavtsev</snm>
                  <fnm>IV</fnm>
               </au>
               <au>
                  <snm>Tatusova</snm>
                  <fnm>TA</fnm>
               </au>
               <au>
                  <snm>Shankavaram</snm>
                  <fnm>UT</fnm>
               </au>
               <au>
                  <snm>Rao</snm>
                  <fnm>BS</fnm>
               </au>
               <au>
                  <snm>Kiryutin</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Galperin</snm>
                  <fnm>MY</fnm>
               </au>
               <au>
                  <snm>Fedorova</snm>
                  <fnm>ND</fnm>
               </au>
               <au>
                  <snm>Koonin</snm>
                  <fnm>EV</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2001</pubdate>
            <volume>29</volume>
            <fpage>22</fpage>
            <lpage>28</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">29819</pubid>
                  <pubid idtype="pmpid" link="fulltext">11125040</pubid>
                  <pubid idtype="doi">10.1093/nar/29.1.22</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
      </refgrp>
   </bm>
</art>

