PACK flowchart. (a) A schematic diagram of PACK, as used in this study. For each gene expression profile an unbiased estimate of its kurtosis, K, is computed. Genes with negative kurtosis are selected because only these define large subgroups (of sizes >22% of the total sample size). Further unsupervised clustering may then be performed on this subset of negative kurtosis profiles to find novel tumor subclasses. Alternatively, to find robust prognostic markers, negative kurtosis profiles are filtered further based on whether there is evidence of bimodality (C = 2). This step requires a cluster inference algorithm and a model selection criterion to discard those profiles that are best described by a single gaussian (C = 1; by random chance gaussian profiles may have negative kurtosis). Correlation to phenotypes (here phenotypes) is done with Fisher's test to evaluate whether the distribution of the categorical phenotype across the two clusters is significantly different from random. (b) Density curves of typical bimodal negative and positive kurtosis gene expression profiles. X-axis shows gene expression on a log2 scale. PACK, Profile Analysis using Clustering and Kurtosis.
Teschendorff et al. Genome Biology 2007 8:R157 doi:10.1186/gb-2007-8-8-r157