Principle of the DDM-MDS approach. A color code is consistently used in this Figure to indicate the status of the TFBSs predicted by a PWM. In the first set of promoters a CRM of three TFBSs is present (reddish), whereas the second set of promoters contains a CRM of two TFBSs (greenish). TFBSs not relevant for the differential expression between the genes corresponding to the two promoter sets are indicated in gray. (a) Two matrices, each of which contains the numbers of predicted TFBSs per PWM and per promoter (counts) for one set of promoters of differentially regulated genes. These counts are obtained by scanning the promoters with a precompiled library of PWMs. The number of promoters in both sets is the same in this artificial example, but does not need to be (see normalization in Materials and methods). Two PWMs are considered associated on the TFBS level if their corresponding columns (PWM-vectors) in the matrix are similar. This similarity can be measured using a distance function. (b) Distance matrices summarizing all PWM associations are constructed in both sets of promoters. (c) Subtraction of those distance matrices gives the DDM. PWMs predicting TFBSs in both promoter sets to the same amount (false positives as well as true positives: gray) and hence not involved in differential expression will show low DD values among each other. The DD values among the PWMs with associated and overrepresented TFBS predictions (greenish and reddish) will be just as low, but the DD values between those PWMs and the non-involved ones will be much higher (c). By performing MDS on the DDM, we can map the PWMs onto two-dimensional space and distinguish PWMs whose TFBSs are not contributing to the observed differential gene expression, as they will be mapped on the origin, from 'deviating' PWMs whose TFBSs are likely responsible for the observed differential gene expression. (d) The DDM-MDS plot clusters PWMs whose predicted TFBSs are strongly associated closer together than PWMs with less associated predicted TFBSs.
De Bleser et al. Genome Biology 2007 8:R83 doi:10.1186/gb-2007-8-5-r83