Journal Article
. 2011 Mar;6(3).
doi: 10.1371/journal.pone.0017795.

Do two machine-learning based prognostic signatures for breast cancer capture the same biological processes?

Yotam Drier 1 Eytan Domany  
  • PMID: 21423753
  •     58 References
  •     18 citations


The fact that there is very little if any overlap between the genes of different prognostic signatures for early-discovery breast cancer is well documented. The reasons for this apparent discrepancy have been explained by the limits of simple machine-learning identification and ranking techniques, and the biological relevance and meaning of the prognostic gene lists was questioned. Subsequently, proponents of the prognostic gene lists claimed that different lists do capture similar underlying biological processes and pathways. The present study places under scrutiny the validity of this claim, for two important gene lists that are at the focus of current large-scale validation efforts. We performed careful enrichment analysis, controlling the effects of multiple testing in a manner which takes into account the nested dependent structure of gene ontologies. In contradiction to several previous publications, we find that the only biological process or pathway for which statistically significant concordance can be claimed is cell proliferation, a process whose relevance and prognostic value was well known long before gene expression profiling. We found that the claims reported by others, of wider concordance between the biological processes captured by the two prognostic signatures studied, were found either to be lacking statistical rigor or were in fact based on addressing some other question.

Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles.
Aravind Subramanian, Pablo Tamayo, +8 authors, Jill P Mesirov.
Proc Natl Acad Sci U S A, 2005 Oct 04; 102(43). PMID: 16199517    Free PMC article.
Highly Cited.
Gene expression patterns associated with p53 status in breast cancer.
Melissa A Troester, Jason I Herschkowitz, +4 authors, Charles M Perou.
BMC Cancer, 2006 Dec 08; 6. PMID: 17150101    Free PMC article.
Gene expression signature of fibroblast serum response predicts human cancer progression: similarities between tumors and wounds.
Howard Y Chang, Julie B Sneddon, +7 authors, Patrick O Brown.
PLoS Biol, 2004 Jan 23; 2(2). PMID: 14737219    Free PMC article.
Highly Cited.
Outcome signature genes in breast cancer: is there a unique set?
Liat Ein-Dor, Itai Kela, +2 authors, Eytan Domany.
Bioinformatics, 2004 Aug 17; 21(2). PMID: 15308542
Highly Cited.
Lung metastasis genes couple breast tumor size and metastatic spread.
Andy J Minn, Gaorav P Gupta, +10 authors, Joan Massagué.
Proc Natl Acad Sci U S A, 2007 Apr 11; 104(16). PMID: 17420468    Free PMC article.
Highly Cited.
Robustness, scalability, and integration of a wound-response gene expression signature in predicting breast cancer survival.
Howard Y Chang, Dimitry S A Nuyten, +10 authors, Marc J van de Vijver.
Proc Natl Acad Sci U S A, 2005 Feb 11; 102(10). PMID: 15701700    Free PMC article.
Highly Cited.
Microarray analysis identifies a death-from-cancer signature predicting therapy failure in patients with multiple types of cancer.
Gennadi V Glinsky, Olga Berezovska, Anna B Glinskii.
J Clin Invest, 2005 Jun 03; 115(6). PMID: 15931389    Free PMC article.
Highly Cited.
Effects of sample size on robustness and prediction accuracy of a prognostic gene signature.
Seon-Young Kim.
BMC Bioinformatics, 2009 May 19; 10. PMID: 19445687    Free PMC article.
Critical review of microarray-based prognostic tests and trials in breast cancer.
Serge Koscielny.
Curr Opin Obstet Gynecol, 2008 Jan 17; 20(1). PMID: 18197005
Validation and clinical utility of a 70-gene prognostic signature for women with node-negative breast cancer.
Marc Buyse, Sherene Loi, +18 authors, TRANSBIG Consortium.
J Natl Cancer Inst, 2006 Sep 07; 98(17). PMID: 16954471
Highly Cited.
Molecular portraits of human breast tumours.
C M Perou, T Sørlie, +15 authors, D Botstein.
Nature, 2000 Aug 30; 406(6797). PMID: 10963602
Highly Cited.
Gene expression profiling predicts clinical outcome of breast cancer.
Laura J van 't Veer, Hongyue Dai, +13 authors, Stephen H Friend.
Nature, 2002 Feb 02; 415(6871). PMID: 11823860
Highly Cited.
Pathway analysis of gene signatures predicting metastasis of node-negative primary breast cancer.
Jack X Yu, Anieta M Sieuwerts, +5 authors, John A Foekens.
BMC Cancer, 2007 Sep 27; 7. PMID: 17894856    Free PMC article.
Thousands of samples are needed to generate a robust gene list for predicting outcome in cancer.
Liat Ein-Dor, Or Zuk, Eytan Domany.
Proc Natl Acad Sci U S A, 2006 Apr 06; 103(15). PMID: 16585533    Free PMC article.
Highly Cited.
An expression signature for p53 status in human breast cancer predicts mutation status, transcriptional effects, and patient survival.
Lance D Miller, Johanna Smeds, +8 authors, Jonas Bergh.
Proc Natl Acad Sci U S A, 2005 Sep 06; 102(38). PMID: 16141321    Free PMC article.
Highly Cited.
Gene expression programs in response to hypoxia: cell type specificity and prognostic significance in human cancers.
Jen-Tsan Chi, Zhen Wang, +13 authors, Patrick O Brown.
PLoS Med, 2006 Jan 19; 3(3). PMID: 16417408    Free PMC article.
Highly Cited.
Gene expression profiling: does it add predictive accuracy to clinical characteristics in cancer prognosis?
Daniela Dunkler, Stefan Michiels, Michael Schemper.
Eur J Cancer, 2007 Jan 30; 43(4). PMID: 17257824
An immune response gene expression module identifies a good prognosis subtype in estrogen receptor negative breast cancer.
Andrew E Teschendorff, Ahmad Miremadi, +2 authors, Carlos Caldas.
Genome Biol, 2007 Aug 09; 8(8). PMID: 17683518    Free PMC article.
Highly Cited.
Why most gene expression signatures of tumors have not been useful in the clinic.
Serge Koscielny.
Sci Transl Med, 2010 Apr 08; 2(14). PMID: 20371465
Concordance among gene-expression-based predictors for breast cancer.
Cheng Fan, Daniel S Oh, +5 authors, Charles M Perou.
N Engl J Med, 2006 Aug 11; 355(6). PMID: 16899776
Highly Cited.
Prediction of cancer outcome with microarrays: a multiple random validation strategy.
Stefan Michiels, Serge Koscielny, Catherine Hill.
Lancet, 2005 Feb 12; 365(9458). PMID: 15705458
Highly Cited.
A review of modern multiple hypothesis testing, with particular attention to the false discovery proportion.
Alessio Farcomeni.
Stat Methods Med Res, 2007 Aug 19; 17(4). PMID: 17698936
"Good Old" clinical markers have similar power in breast cancer prognosis as microarray gene expression profilers.
Patrik Edén, Cecilia Ritz, +2 authors, Carsten Peterson.
Eur J Cancer, 2004 Aug 04; 40(12). PMID: 15288284
Identification of a proliferation gene cluster associated with HPV E6/E7 expression level and viral DNA load in invasive cervical carcinoma.
Christophe Rosty, Michal Sheffer, +12 authors, Xavier Sastre-Garau.
Oncogene, 2005 Jul 12; 24(47). PMID: 16007141
Gene expression profiling: decoding breast cancer.
Femke de Snoo, Richard Bender, Annuska Glas, Emiel Rutgers.
Surg Oncol, 2009 Nov 03; 18(4). PMID: 19879448
Bioinformatics and breast cancer: what can high-throughput genomic approaches actually tell us?
A H Sims.
J Clin Pathol, 2009 Jan 29; 62(10). PMID: 19174421
Basic local alignment search tool.
S F Altschul, W Gish, +2 authors, D J Lipman.
J Mol Biol, 1990 Oct 05; 215(3). PMID: 2231712
Highly Cited.
DAVID: Database for Annotation, Visualization, and Integrated Discovery.
Glynn Dennis, Brad T Sherman, +4 authors, Richard A Lempicki.
Genome Biol, 2003 May 08; 4(5). PMID: 12734009
Highly Cited.
Validation of biomarker-based risk prediction models.
Jeremy M G Taylor, Donna P Ankerst, Rebecca R Andridge.
Clin Cancer Res, 2008 Oct 03; 14(19). PMID: 18829476    Free PMC article.
Enabling personalized cancer medicine through analysis of gene-expression patterns.
Laura J van't Veer, René Bernards.
Nature, 2008 Apr 04; 452(7187). PMID: 18385730
Highly Cited. Review.
EXPANDER--an integrative program suite for microarray data analysis.
Ron Shamir, Adi Maron-Katz, +5 authors, Ran Elkon.
BMC Bioinformatics, 2005 Sep 24; 6. PMID: 16176576    Free PMC article.
Highly Cited.
Gene expression and benefit of chemotherapy in women with node-negative, estrogen receptor-positive breast cancer.
Soonmyung Paik, Gong Tang, +11 authors, Norman Wolmark.
J Clin Oncol, 2006 May 25; 24(23). PMID: 16720680
Highly Cited.
Systematic variation in gene expression patterns in human cancer cell lines.
D T Ross, U Scherf, +15 authors, P O Brown.
Nat Genet, 2000 Mar 04; 24(3). PMID: 10700174
Highly Cited.
A multigene assay to predict recurrence of tamoxifen-treated, node-negative breast cancer.
Soonmyung Paik, Steven Shak, +12 authors, Norman Wolmark.
N Engl J Med, 2004 Dec 14; 351(27). PMID: 15591335
Highly Cited.
Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer.
Yixin Wang, Jan G M Klijn, +11 authors, John A Foekens.
Lancet, 2005 Feb 22; 365(9460). PMID: 15721472
Highly Cited.
Taking gene-expression profiling to the clinic: when will molecular signatures become relevant to patient care?
Christos Sotiriou, Martine J Piccart.
Nat Rev Cancer, 2007 Jun 23; 7(7). PMID: 17585334
Highly Cited. Review.
Common markers of proliferation.
Michael L Whitfield, Lacy K George, Gavin D Grant, Charles M Perou.
Nat Rev Cancer, 2006 Feb 24; 6(2). PMID: 16491069
Highly Cited. Review.
The properties of high-dimensional data spaces: implications for exploring gene and protein expression data.
Robert Clarke, Habtom W Ressom, +4 authors, Yue Wang.
Nat Rev Cancer, 2007 Dec 22; 8(1). PMID: 18097463    Free PMC article.
Highly Cited. Review.
Gene-expression signatures in breast cancer.
Christos Sotiriou, Lajos Pusztai.
N Engl J Med, 2009 Feb 21; 360(8). PMID: 19228622
Highly Cited. Review.
Use and misuse of the gene ontology annotations.
Seung Yon Rhee, Valerie Wood, Kara Dolinski, Sorin Draghici.
Nat Rev Genet, 2008 May 14; 9(7). PMID: 18475267
Highly Cited. Review.
Pathway analysis reveals functional convergence of gene expression profiles in breast cancer.
Ronglai Shen, Arul M Chinnaiyan, Debashis Ghosh.
BMC Med Genomics, 2008 Jul 01; 1. PMID: 18588682    Free PMC article.
Multiple significance tests: the Bonferroni method.
J M Bland, D G Altman.
BMJ, 1995 Jan 21; 310(6973). PMID: 7833759    Free PMC article.
Highly Cited.
Gene expression profiling in breast cancer: understanding the molecular basis of histologic grade to improve prognosis.
Christos Sotiriou, Pratyaksha Wirapati, +17 authors, Mauro Delorenzi.
J Natl Cancer Inst, 2006 Feb 16; 98(4). PMID: 16478745
Highly Cited.
MatchMiner: a tool for batch navigation among gene and gene product identifiers.
Kimberly J Bussey, David Kane, +6 authors, John N Weinstein.
Genome Biol, 2003 Apr 19; 4(4). PMID: 12702208    Free PMC article.
Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources.
Da Wei Huang, Brad T Sherman, Richard A Lempicki.
Nat Protoc, 2009 Jan 10; 4(1). PMID: 19131956
Highly Cited.
A gene-expression signature as a predictor of survival in breast cancer.
Marc J van de Vijver, Yudong D He, +18 authors, René Bernards.
N Engl J Med, 2002 Dec 20; 347(25). PMID: 12490681
Highly Cited.
A comprehensive analysis of prognostic signatures reveals the high predictive capacity of the proliferation, immune response and RNA splicing modules in breast cancer.
Fabien Reyal, Martin H van Vliet, +10 authors, Lodewyk F A Wessels.
Breast Cancer Res, 2008 Nov 19; 10(6). PMID: 19014521    Free PMC article.
Gene-expression signatures in breast cancer.
David F Ransohoff.
N Engl J Med, 2003 Apr 26; 348(17). PMID: 12712998
Critical review of published microarray studies for cancer outcome and guidelines on statistical analysis and reporting.
Alain Dupuy, Richard M Simon.
J Natl Cancer Inst, 2007 Jan 18; 99(2). PMID: 17227998
Highly Cited. Review.
The GOA database in 2009--an integrated Gene Ontology Annotation resource.
Daniel Barrell, Emily Dimmer, +3 authors, Rolf Apweiler.
Nucleic Acids Res, 2008 Oct 30; 37(Database issue). PMID: 18957448    Free PMC article.
Highly Cited.
Prediction of breast cancer prognosis by gene expression profile of TP53 status.
Shin Takahashi, Takuya Moriya, +4 authors, Chikashi Ishioka.
Cancer Sci, 2008 Feb 15; 99(2). PMID: 18271932
New trends in molecular biomarker discovery for breast cancer.
Ramin Radpour, Zeinab Barekati, +2 authors, Xiao Yan Zhong.
Genet Test Mol Biomarkers, 2009 Oct 10; 13(5). PMID: 19814613
Gene ontology: tool for the unification of biology. The Gene Ontology Consortium.
M Ashburner, C A Ball, +17 authors, G Sherlock.
Nat Genet, 2000 May 10; 25(1). PMID: 10802651    Free PMC article.
Highly Cited.
Gene expression profiling of breast cancer.
Lajos Pusztai.
Breast Cancer Res, 2010 Jan 20; 11 Suppl 3. PMID: 20030862    Free PMC article.
Biological convergence of cancer signatures.
Xavier Solé, Núria Bonifaci, +11 authors, Miguel Angel Pujana.
PLoS One, 2009 Feb 21; 4(2). PMID: 19229342    Free PMC article.
A comparative study of survival models for breast cancer prognostication based on microarray data: does a single gene beat them all?
B Haibe-Kains, C Desmedt, C Sotiriou, G Bontempi.
Bioinformatics, 2008 Jul 19; 24(19). PMID: 18635567    Free PMC article.
Microarray methods to identify factors determining breast cancer progression: potentials, limitations, and challenges.
B van der Vegt, G H de Bock, H Hollema, J Wesseling.
Crit Rev Oncol Hematol, 2008 Oct 14; 70(1). PMID: 18848465
Lost in translation: problems and pitfalls in translating laboratory observations to clinical utility.
Richard Simon.
Eur J Cancer, 2008 Nov 04; 44(18). PMID: 18977655    Free PMC article.
Predicting relapse prior to transplantation in chronic myeloid leukemia by integrating expert knowledge and expression data.
K Y Yeung, T A Gooley, +3 authors, V G Oehler.
Bioinformatics, 2012 Feb 03; 28(6). PMID: 22296787    Free PMC article.
Prognostic gene signatures for patient stratification in breast cancer: accuracy, stability and interpretability of gene selection approaches using prior knowledge on protein-protein interactions.
Yupeng Cun, Holger Fröhlichholger Fröhlich.
BMC Bioinformatics, 2012 May 03; 13. PMID: 22548963    Free PMC article.
Stromal genes add prognostic information to proliferation and histoclinical markers: a basis for the next generation of breast cancer gene signatures.
Dwain Mefford, Joel Mefford.
PLoS One, 2012 Jun 22; 7(6). PMID: 22719844    Free PMC article.
A network-based gene expression signature informs prognosis and treatment for colorectal cancer patients.
Mingguang Shi, R Daniel Beauchamp, Bing Zhang.
PLoS One, 2012 Jul 31; 7(7). PMID: 22844451    Free PMC article.
Pathway-based personalized analysis of cancer.
Yotam Drier, Michal Sheffer, Eytan Domany.
Proc Natl Acad Sci U S A, 2013 Apr 03; 110(16). PMID: 23547110    Free PMC article.
Highly Cited.
Systematic assessment of prognostic gene signatures for breast cancer shows distinct influence of time and ER status.
Xi Zhao, Einar Andreas Rødland, +5 authors, Anne-Lise Børresen-Dale.
BMC Cancer, 2014 Mar 22; 14. PMID: 24645668    Free PMC article.
Chromosomal instability selects gene copy-number variants encoding core regulators of proliferation in ER+ breast cancer.
David Endesfelder, Rebecca Burrell, +6 authors, Maik Kschischo.
Cancer Res, 2014 Jun 28; 74(17). PMID: 24970479    Free PMC article.
Functional characterization of breast cancer using pathway profiles.
Feng Tian, Yajie Wang, Michael Seiler, Zhenjun Hu.
BMC Med Genomics, 2014 Jul 22; 7. PMID: 25041817    Free PMC article.
Detection of leukocoria using a soft fusion of expert classifiers under non-clinical settings.
Pablo Rivas-Perea, Erich Baker, Greg Hamerly, Bryan F Shaw.
BMC Ophthalmol, 2014 Sep 11; 14. PMID: 25204762    Free PMC article.
Distance-based classifiers as potential diagnostic and prediction tools for human diseases.
Boris Veytsman, Lei Wang, +2 authors, Ancha Baranova.
BMC Genomics, 2015 Jan 08; 15 Suppl 12. PMID: 25563076    Free PMC article.
Machine learning applications in cancer prognosis and prediction.
Konstantina Kourou, Themis P Exarchos, +2 authors, Dimitrios I Fotiadis.
Comput Struct Biotechnol J, 2015 Mar 10; 13. PMID: 25750696    Free PMC article.
Highly Cited. Review.
Genomic signatures for paclitaxel and gemcitabine resistance in breast cancer derived by machine learning.
Stephanie N Dorman, Katherina Baranova, +4 authors, Peter K Rogan.
Mol Oncol, 2015 Sep 16; 10(1). PMID: 26372358    Free PMC article.
Molecular subtyping of colorectal cancer: Recent progress, new challenges and emerging opportunities.
Wei Wang, Raju Kandimalla, +5 authors, Xin Wang.
Semin Cancer Biol, 2018 May 19; 55. PMID: 29775690    Free PMC article.
Identification and transfer of spatial transcriptomics signatures for cancer diagnosis.
Niyaz Yoosuf, José Fernández Navarro, +2 authors, Carsten O Daub.
Breast Cancer Res, 2020 Jan 15; 22(1). PMID: 31931856    Free PMC article.
Radiomics and Machine Learning for Radiotherapy in Head and Neck Cancers.
Paul Giraud, Philippe Giraud, +7 authors, Jean-Emmanuel Bibault.
Front Oncol, 2019 Apr 12; 9. PMID: 30972291    Free PMC article.
Integrating biological knowledge and gene expression data using pathway-guided random forests: a benchmarking study.
Stephan Seifert, Sven Gundlach, Olaf Junge, Silke Szymczak.
Bioinformatics, 2020 May 14; 36(15). PMID: 32399562    Free PMC article.
Evaluation of variable selection methods for random forests and omics data sets.
Frauke Degenhardt, Stephan Seifert, Silke Szymczak.
Brief Bioinform, 2017 Oct 19; 20(2). PMID: 29045534    Free PMC article.
DeepCC: a novel deep learning-based framework for cancer molecular subtype classification.
Feng Gao, Wei Wang, +5 authors, Xin Wang.
Oncogenesis, 2019 Aug 20; 8(9). PMID: 31420533    Free PMC article.