Schematic representation of the data integration and data mining methodology. (a) Public databases with heterogeneous biomedical relations are integrated into a common network. (b) Illustratively, genes (green circles), diseases (red boxes) and protein domains (blue diamonds) are related through gene-disease associations, gene-gene interactions and gene-domain annotations and integrated into a unified graph. (c) The a priori accessibility of each concept is computed by performing stochastic random walks to detect highly connected hubs in the network (area of a node scales with its rank score). (d) The a posteriori rank of each concept with respect to a source concept, in this case disease A, is computed by performing random walks with restarts in the source. (e) The posterior probabilities are adjusted using the prior probabilities to score the importance of each concept, specific to the source target (area of node scales with log of rank score). Genes (green circles) are ranked according to this score, gene 1 being most specific to disease A and gene 8 least specific.
Liekens et al. Genome Biology 2011 12:R57 doi:10.1186/gb-2011-12-6-r57