Significance and context
Massive genomic sequencing has shown that horizontal (lateral) gene transfer (HGT) is a major force in the evolution of bacterial genomes. HGT is the transmission of genetic material among organisms that are not directly related by descent. Some mechanisms accounting for this process in bacteria are conjugation, transformation, transposition, viral hitchhiking, and so on. These events can be detected, for instance, by the presence of biases in nucleotide composition or codon usage of the transferred region (for recent events) or by the existence of homologous sequences in very distant taxa without the occurrence of intermediates in interesting taxa. Imagine the case of vertebrate proteins that are similar to bacterial ones but are absent from non-vertebrate species. Previous analysis of the human genome draft sequence had revealed 223 protein sequences that share similarity to bacterial proteins but not to proteins of yeast, worm, fly and Arabidopsis (and other non-vertebrates) and thus were candidates for an origin by horizontal gene transfer from bacteria. A more detailed analysis retained only 113 of these genes. Their existence could imply transmission of bacterial DNA to the vertebrate germ line. This is hard to believe; in the initial list there was even monoamine oxidase, an enzyme that can alter our mood (and is a target of antidepressants). Stanhope et al. have now carried out a phylogenetic analysis of 28 putative cases of horizontally transferred genes whose presence in the human genome had been confirmed by PCR. They used parsimony and distance-based methods (the Phylip package) applied to alignments of homologous sequences obtained from scanning the usual sequence databases (detected by BLAST).
Stanhope et al. used a variety of phylogenetic criteria to discount the possibility of bacterial-vertebrate HGT events. One such criterion was the clustering of the eukaryotic sequences, including vertebrates and non-vertebrates, supporting monophyly (common origin) of Eucarya. Another of the criteria used was evidence of monophyly for the Archaea, the Bacteria and the Eucarya in a rooted tree with a paralog as outgroup, even in the absence of non-vertebrate homologs. The analysis showed that in more than a third of the putative HGT events, homologous sequences were present in more ancient eukaryotes and could have been transmitted to vertebrates by descent through common ancestry. One case received phylogenetic support as being due to HGT but, ironically, in the direction vertebrate-to-bacterium. Four cases remained ambiguous because of the absence of non-vertebrate eukaryotic sequences.
In some cases, to obtain these insightful results the authors simply included the 'EST others' division of GenBank in the scan for homologous sequences. In the publication of the human genome sequence draft the Human Genome Project (HGP) consortium seems to have overlooked these homologous EST sequences. In addition, phylogenetics (as performed by Stanhope et al.) was not an analytical tool used in the investigation of HGT in the human genome report. The HGP consortium rather derived their conclusions from the top BLAST reports. Stanhope et al. explain that in some cases the top BLAST hits using a human sequence as query are indeed bacterial proteins, but phylogenetic analysis of 'less similar' proteins (further down the list) from non-vertebrate eukaryotes supported monophyly for the eukaryotes, ruling out HGT.
A complementary paper from another team of researchers that comes to a similar conclusion is reported in Genome Biology2(3):reports0027.
Stanhope et al. show how (the simplest) phylogenetic analysis can be instrumental in validating evolutionary hypotheses - HGT in this instance. The statistical output of BLAST, as used in the HGP report, cannot be directly related to phylogenetic distances or even to the fraction of amino-acid identity. The HGP consortium is not to blame for this - just think of the enormous task of sequencing 3 gigabases and the speed at which it was analyzed to make this information available to the research community. Consider also the excitement of detecting human sequences closely related to bacterial ones and apparently absent from non-vertebrate eukaryotes, HGT being a hot topic in bacterial genomics. The information given in the HGP consortium's preliminary paper will be revisited many times for more careful analyses. The unresolved cases of HGT reported by Stanhope et al. will need further analysis when more genomic sequences from non-vertebrate eukaryotes are available to confirm or disprove the possibility of HGT.