Boolean implication extraction process. The expression levels of each probeset are sorted and a step function fitted (using StepMiner) to the sorted expression level w minimizes the square error between the original and the fitted values. A threshold t is chosen, where the step crosses the original data. The region between t - 0.5 and t + 0.5 is classified as 'intermediate', the region below t - 0.5 is classified as 'low' and the region above t + 0.5 is classified as 'high'. The examples show probesets for two genes, CDH1 and CDC2. As can be seen, CDH1 has a sharp rise between 6 and 9 and the StepMiner algorithm was able to assign a threshold in this region. CDC2, however, is very linear, and the StepMiner algorithm assigns the threshold approximately in the middle of the line. A scatter plot is shown to illustrate the analysis. Each point in the scatter plot corresponds to a microarray experiment, where the value for the x-axis is CDC2 expression and the value for the y-axis is CDH1 expression. Boolean implication discovery analysis is performed on a pair of probesets, which ignores all the points that lie in the intermediate region and analyzes the four quadrants of the scatter plot. Four asymmetric relationships (low ⇒ low, low ⇒ high, high ⇒ low, high ⇒ high) are discovered, each corresponding to exactly one sparse quadrant in the scatter plot; and two symmetric relationships (equivalent and opposite) are discovered, each corresponding to two diagonally opposite sparse quadrants.
Sahoo et al. Genome Biology 2008 9:R157 doi:10.1186/gb-2008-9-10-r157