The first global analysis of human microarray data, appearing in the October Nature Genetics, maps which gene expression patterns are common among, and unique to, different types of tumors (Nat Genet 2004, DOI: 10.1038/ng1434).
The cancer map is publicly available, as is the data analysis and visualization engine, GeneXPress, used to create it. "The tool allows researchers to look not just at single genes, but gives a higher order view by looking at modules of multiple genes at the same time," coauthor of the paper, Aviv Regev of Harvard University, told us. "And it's very easy to use."
"The whole thing is done in an automated fashion and can work on gene sets from other species besides humans and processes that are not cancer," added Daphne Koller from Stanford University, who also worked on the project.
The researchers scanned a cancer compendium of expression profile data from the Stanford Microarray Database and the Whitehead Institute Center for Genomic Research, measuring the expression of 14,145 genes in 1975 published DNA microarrays spanning 22 tumor types. They then organized the genes into modules, or sets of genes that act in concert to carry out a specific function. Lead author Eran Segal, from Stanford University, and colleagues started with 2849 gene sets as potential modules and ran an algorithm across all assays to identify concordant and statistically significant gene expression patterns to weed out redundancies and mistakes.
Overall, the research group identified 456 statistically significant modules that span various processes and functions. The researchers then identified conditions in which different modules are induced or repressed, using extensive annotation of the arrays with 263 biological and clinical conditions, including tissue and tumor type, diagnostic and prognostic information, and molecular markers.
The resulting map cross-references modules with conditions and shows that activation of a number of modules could prove specific to particular tumor types - for instance, a growth inhibitory module is significantly repressed in acute leukemia. The module consists primarily of growth suppressors (11 of 16 genes), some of which repress ERK1, an activator of cell proliferation known to be constitutively active in acute leukemia, while others activate apoptosis repressor p38. Only one of the suppressor genes, DUSP2, was previously implicated in acute leukemia - the other genes in the module now offer new therapeutic targets, Regev said.
Other modules are shared across a wide range of malignancies, suggesting common tumor progression mechanisms. For example, the bone osteoblastic module, which includes genes associated with the proliferation and differentiation of bone-building cells, was also implicated in breast cancer, lung cancer, hepatocellular carcinoma, and acute lymphoblastic leukemia. Bone-related clinical conditions are associated with all of these malignancies, and in particular bone metastasis is a key phenomenon in breast cancer.
"We formulate a hypothesis that primary breast cancer tumors have hijacked a module in the cell usually used by bone cells that gives an advantage for primary tumors as well," Regev said. "But this is a computational prediction that still needs to be validated experimentally."
Study author Nir Friedman from the Hebrew University of Jerusalem warned that their results should be interpreted with some caution, because they combined diverse sets whose raw data may not have directly comparable reference points. "We were very careful with our normalization approach, but there are other normalization approaches that can have different interpretations," Regev added.
Naftali Kaminski of the University of Pittsburgh Medical Center, who did not participate in this project, told us: "I was very doubtful this approach would work at first, because my intuition was the cross-laboratory batch effects would affect the data so much, they would only find noise. But they proved me wrong. I find it very exciting work."
"One problem with genomics studies is [that] a single group is limited by the number of samples they do. They present for us a unifying computational framework for multiple data sets together, where we can learn something about general mechanisms for lots of cancers or specific gene modules relevant to specific malignancies," Kaminski added. "It's a hypothesis-generating tool. Each marker needs to be checked, but I think it's a major advance to this field."
Arul Chinnaiyan at the University of Michigan in Ann Arbor, whose lab developed public cancer bioinformatics resource Oncomine but who did not participate in this study, said: "There could be ways for our projects to potentially work together on this effort, to look at cancer with a systems biology perspective."
A 'module map' Showing Conditional Activity of Expression Modules in Cancer
Stanford Microarray Database
Whitehead Institute Center for Genomic Research: Cancer Program Data Sets
Arul M. Chinnaiyan