Open Access Highly Accessed Open Badges Research

Genome-driven integrated classification of breast cancer validated in over 7,500 samples

H Raza Ali124, Oscar M Rueda1, Suet-Feung Chin1, Christina Curtis5, Mark J Dunning1, Samuel AJR Aparicio6 and Carlos Caldas134*

Author Affiliations

1 Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Robinson Way, Cambridge, CB2 0RE, UK

2 Department of Pathology, University of Cambridge, Tennis Court Road, Cambridge, CB2 1QP, UK

3 Department of Oncology, University of Cambridge, Addenbrooke┬┐s Hospital, Hills Road, Cambridge, CB2 0QQ, UK

4 Cambridge Experimental Cancer Medicine Centre and NIHR Cambridge Biomedical, Research Centre, Cambridge University Hospitals NHS, Hills Road, Cambridge, CB2 0QQ, UK

5 Keck School of Medicine, University of Southern California, California, 90033, CA, USA

6 Department of Molecular Oncology, British Columbia Cancer Research Centre, Vancouver, British Columbia, V5Z 1L3, Canada

For all author emails, please log on.

Genome Biology 2014, 15:431  doi:10.1186/s13059-014-0431-1

Published: 28 August 2014



IntClust is a classification of breast cancer comprising 10 subtypes based on molecular drivers identified through the integration of genomic and transcriptomic data from 1,000 breast tumors and validated in a further 1,000. We present a reliable method for subtyping breast tumors into the IntClust subtypes based on gene expression and demonstrate the clinical and biological validity of the IntClust classification.


We developed a gene expression-based approach for classifying breast tumors into the ten IntClust subtypes by using the ensemble profile of the index discovery dataset. We evaluate this approach in 983 independent samples for which the combined copy-number and gene expression IntClust classification was available. Only 24 samples are discordantly classified. Next, we compile a consolidated external dataset composed of a further 7,544 breast tumors. We use our approach to classify all samples into the IntClust subtypes. All ten subtypes are observable in most studies at comparable frequencies. The IntClust subtypes are significantly associated with relapse-free survival and recapitulate patterns of survival observed previously. In studies of neo-adjuvant chemotherapy, IntClust reveals distinct patterns of chemosensitivity. Finally, patterns of expression of genomic drivers reported by TCGA (The Cancer Genome Atlas) are better explained by IntClust as compared to the PAM50 classifier.


IntClust subtypes are reproducible in a large meta-analysis, show clinical validity and best capture variation in genomic drivers. IntClust is a driver-based breast cancer classification and is likely to become increasingly relevant as more targeted biological therapies become available.