Orthology prediction methods. (a-c) Pairwise-based and (d, e) phylogeny-based methods. Circles of different colors indicate proteins encoded in genomes from different species. Black arrows represent reciprocal BLAST hits. Proteins within dashed ovals are predicted by the method to belong to the same orthologous group. (a) Best bi-directional hit (BBH). All pairs of proteins with reciprocal best hits are considered orthologs. Note that this method is unable to predict the othology with the yellow protein 2. (b) COG-like approach. Proteins in the nodes of triangular networks of BBHs are considered as orthologs (green, red and yellow protein 1 in the example). New proteins are added to the orthologous group if they are present in BBH triangles that share an edge with a given cluster; for example, the gray protein will be added to the orthologous group because it forms a BBH triangle with the red and green proteins. Note that a BBH link with yellow protein 1 is not required. The COG-like approach can add additional proteins from the same genome if they are more similar to each other than to proteins in other genomes, or if they form BBH triangles with members of the cluster. This is not the case for yellow protein 2, which is, again, misclassified. (c) Inparanoid approach. This is similar to (a), but other proteins within a proteome (yellow protein 2 in this example) are included as 'in-paralogs' if they are more similar to each other than to their corresponding hits in the other species. (d) Tree-reconciliation phylogenetic approach. Duplication nodes (marked with a D) are defined by comparing the gene tree (small tree at the top) with the species tree (small tree at the bottom) to derive a reconciled tree (big tree on the right) in which the minimal number of duplication and gene loss (dashed lines) events necessary to explain the gene tree are included. In this case, both the yellow proteins are included in the orthologous group but the red and gray proteins are excluded. (e) Species-overlap phylogenetic approach. All proteins that derive from a common ancestor by speciation are considered members of the same orthologous group. Duplication nodes are detected when they define partitions with at least one shared species. A one-to-many orthology relationship emerges because of a recent duplication in the lineage leading to the yellow proteome.
Gabaldón Genome Biology 2008 9:235 doi:10.1186/gb-2008-9-10-235