FLN-based disease gene prioritization significantly outperforms random control. Performances are compared between FLN (inclusion or exclusion of text mining data) based disease-gene prioritization and the random control. The random control is generated using the FLN to prioritize randomly assembled disease gene sets (see Materials and methods). (a) Box plots of AUCs of disease gene prioritization performances for 110 diseases, based on disease-centric assessment (see Materials and methods). For each box plot, the bottom, middle, and top lines of the box represent the first quartile, the median, and the third quartile, respectively; whiskers represent 1.5 times the inter-quartile range; red plus signs represent outliers. (b) Disease gene prioritization performance based on gene-centric assessment using the artificial chromosome region background (see Materials and methods). Gene-centric assessment treats each known gene-disease association as a test case. For each test case, the task is to assess how well the known disease (seed) gene ranks relative to a background gene set according to the disease-association score (Si; Equation 4). The Si for each gene in each test case is calculated in leave-one-out setting based on the connectivity to the remaining seed genes. The background gene set used is referred to as the artificial chromosomal region, which is composed of a collection of 100 nearest genes flanking the tested disease gene physically on the chromosome. Finally, after the rank of each tested disease gene for each test case is determined, all the test cases are pooled together and the overall performance is assessed by evaluating the fraction of the tested disease genes ranked above various rank cutoffs. (c) Same evaluation as (b) using the background gene set composed of all the genes represented in the FLN, as opposed to just those proximate on the chromosome.
Linghu et al. Genome Biology 2009 10:R91 doi:10.1186/gb-2009-10-9-r91