This article is part of the supplement: Quantitative inference of gene function from diverse large-scale datasets
Combining guilt-by-association and guilt-by-profiling to predict Saccharomyces cerevisiae gene function
1 Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Longwood Avenue, Boston, Massachusetts 02115, USA
2 Department of Genetics, School of Medicine, Stanford University, Stanford, California 94305-5120, USA
3 Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Jimmy Fund Way, Boston, Massachusetts 02115, USA
4 McKinsey and Company, Hansen Way, Palo Alto, California 94304, USA
5 Merrimack Pharmaceuticals, Kendall Square, Cambridge, Massachusetts 02139, USA
6 Boston Biomedical Research Institute (BBRI), Grove St., Watertown, Massachusetts 02472, USA
7 Massachusetts Institute of Technology, Massachusetts Ave, Cambridge, Massachusetts 02139, USA
Genome Biology 2008, 9(Suppl 1):S7 doi:10.1186/gb-2008-9-s1-s7Published: 27 June 2008
Learning the function of genes is a major goal of computational genomics. Methods for inferring gene function have typically fallen into two categories: 'guilt-by-profiling', which exploits correlation between function and other gene characteristics; and 'guilt-by-association', which transfers function from one gene to another via biological relationships.
We have developed a strategy ('Funckenstein') that performs guilt-by-profiling and guilt-by-association and combines the results. Using a benchmark set of functional categories and input data for protein-coding genes in Saccharomyces cerevisiae, Funckenstein was compared with a previous combined strategy. Subsequently, we applied Funckenstein to 2,455 Gene Ontology terms. In the process, we developed 2,455 guilt-by-profiling classifiers based on 8,848 gene characteristics and 12 functional linkage graphs based on 23 biological relationships.
Funckenstein outperforms a previous combined strategy using a common benchmark dataset. The combination of 'guilt-by-profiling' and 'guilt-by-association' gave significant improvement over the component classifiers, showing the greatest synergy for the most specific functions. Performance was evaluated by cross-validation and by literature examination of the top-scoring novel predictions. These quantitative predictions should help prioritize experimental study of yeast gene functions.