Any concept in the biomedical literature - for instance, a protein or a disease - can be treated as a source concept (depicted as a blue ball throughout the picture and the system). There may be curated information in authoritative databases such as UMLS or UniProtKB/Swiss-Prot concerning the concept and its factual relationships with other concepts. This information is captured and all concepts that have a 'factual' relationship with the source concept in any of the participating databases are thus included in the Knowlet of that concept. These 'factually associated concepts' are depicted in the Knowlet visualisation as solid green balls. In addition, the source concept may be mentioned with other concepts in one and the same sentence in the literature. In that case, especially when there are multiple sentences in which the two concepts co-occur, there is a high chance for a meaningful, sometimes causal, relationship between the two concepts. Most concepts that have a factual relationship are likely to be mentioned in one or more sentences in the literature at large, but as we have mined only PubMed so far, there might be many other factual associations that are not easy to recover from PubMed abstracts alone. For instance, many protein-protein interactions described in UniProtKB/Swiss-Prot cannot be found as co-occurrences in PubMed. Target concepts that co-occur minimally once in the same sentence as the source concept are depicted as green rings in the visualisation of the Knowlet. The last category of concepts is formed by those that have no co-occurrence per sentence in the indexed resources but have sufficient concepts in common with the source concepts in their own Knowlet to be of potential interest. These concepts are depicted as yellow rings and could represent implicit associations. Over one million Knowlets have been created so far. Each source concept has a relationship of varying strength with other (target) concepts and each of these distances has been assigned with a value for factual (F), co-occurrence (C) and associative (A) parameters. All Knowlets are dynamically coupled into the concept space. The semantic association between each concept pair is computed based on these values. In the near future additional data will be added, such as co-expression statistics between genes.
Mons et al. Genome Biology 2008 9:R89 doi:10.1186/gb-2008-9-5-r89