Method summary. Figure 1A: Data preprocessing. High-confidence TF target genes of four primary classes are collected from several datasets (Steps 1-2) and merged into composite lists with four extra classes for multiple lines of evidence (Step 3). Process-specific gene lists (Step 4) and TF target genes are assembled into a regulatory matrix (Step 5) such that unrelated genes are assigned to an additional "baseline" class. Figure 1B: TF significance tests with multinomial logistic regression models. For a given TF, the alternative model H1 is a univariate multinomial regression model that associates response (process genes) and one predictor (TF target genes), such that TF targets are linearly associated to probabilities of process gene classes (Step 6). The null model H0 associates response (process genes) to their relative frequency in the dataset (Step 7). Log-likelihood ratio test measures if H1 provides a better fit to data than the simpler H0 model (Step 8). All TFs are subject to independent testing (Step 9) and subsequent multiple testing correction (Step 10). TF, transcription factor knockout strain; ChIP, chromatin immunoprecipitation; TF, transcription factor; TFBS, transcription factor binding site.
Reimand et al. Genome Biology 2012 13:R55 doi:10.1186/gb-2012-13-6-r55