Journal Article
. 2020 Aug; 10(1):14071.
doi: 10.1038/s41598-020-70832-2.

Machine learning for RNA sequencing-based intrinsic subtyping of breast cancer

Silvia Cascianelli 1 Ivan Molineris 2 Claudio Isella 2 Marco Masseroli 3 Enzo Medico 2 
  • PMID: 32826944
  •     27 References
  •     2 citations


Stratification of breast cancer (BC) into molecular subtypes by multigene expression assays is of demonstrated clinical utility. In principle, global RNA-sequencing (RNA-seq) should enable reconstructing existing transcriptional classifications of BC samples. Yet, it is not clear whether adaptation to RNA-seq of classifiers originally developed using PCR or microarrays, or reconstruction through machine learning (ML) is preferable. Hence, we focused on robustness and portability of PAM50, a nearest-centroid classifier developed on microarray data to identify five BC "intrinsic subtypes". We found that standard PAM50 is profoundly affected by the composition of the sample cohort used for reference construction, and we propose a strategy, named AWCA, to mitigate this issue, improving classification robustness, with over 90% of concordance, and prognostic ability; we also show that AWCA-based PAM50 can even be applied as single-sample method. Furthermore, we explored five supervised learners to build robust, single-sample intrinsic subtype callers via RNA-seq. From our ML-based survey, regularized multiclass logistic regression (mLR) displayed the best performance, further increased by ad-hoc gene selection on the global transcriptome. On external test sets, mLR classifications reached 90% concordance with PAM50-based calls, without need of reference sample; mLR proven robustness and prognostic ability make it an equally valuable single-sample method to strengthen BC subtyping.

Cross comparison and prognostic assessment of breast cancer multigene signatures in a large population-based contemporary clinical series.
Johan Vallon-Christersson, Jari Häkkinen, +12 authors, Johan Staaf.
Sci Rep, 2019 Aug 23; 9(1). PMID: 31434940    Free PMC article.
Repeated observation of breast tumor subtypes in independent gene expression data sets.
Therese Sorlie, Robert Tibshirani, +13 authors, David Botstein.
Proc Natl Acad Sci U S A, 2003 Jun 28; 100(14). PMID: 12829800    Free PMC article.
Highly Cited.
Outcome signature genes in breast cancer: is there a unique set?
Liat Ein-Dor, Itai Kela, +2 authors, Eytan Domany.
Bioinformatics, 2004 Aug 17; 21(2). PMID: 15308542
Highly Cited.
Deep-learning approach to identifying cancer subtypes using high-dimensional genomic data.
Runpu Chen, Le Yang, Steve Goodison, Yijun Sun.
Bioinformatics, 2019 Oct 12; 36(5). PMID: 31603461    Free PMC article.
Breast cancer intrinsic subtype classification, clinical use and future trends.
Xiaofeng Dai, Ting Li, +4 authors, Bozhi Shi.
Am J Cancer Res, 2015 Dec 23; 5(10). PMID: 26693050    Free PMC article.
Highly Cited. Review.
US incidence of breast cancer subtypes defined by joint hormone receptor and HER2 status.
Nadia Howlader, Sean F Altekruse, +4 authors, Kathleen A Cronin.
J Natl Cancer Inst, 2014 Apr 30; 106(5). PMID: 24777111    Free PMC article.
Highly Cited.
Comprehensive Molecular Portraits of Invasive Lobular Breast Cancer.
Giovanni Ciriello, Michael L Gatza, +30 authors, Charles M Perou.
Cell, 2015 Oct 10; 163(2). PMID: 26451490    Free PMC article.
Highly Cited.
An Update on Breast Cancer Multigene Prognostic Tests-Emergent Clinical Biomarkers.
André Filipe Vieira, Fernando Schmitt.
Front Med (Lausanne), 2018 Sep 21; 5. PMID: 30234119    Free PMC article.
Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation.
Cole Trapnell, Brian A Williams, +6 authors, Lior Pachter.
Nat Biotechnol, 2010 May 04; 28(5). PMID: 20436464    Free PMC article.
Highly Cited.
Comprehensive molecular portraits of human breast tumours.
Cancer Genome Atlas Network.
Nature, 2012 Sep 25; 490(7418). PMID: 23000897    Free PMC article.
Highly Cited.
Molecular portraits of human breast tumours.
C M Perou, T Sørlie, +15 authors, D Botstein.
Nature, 2000 Aug 30; 406(6797). PMID: 10963602
Highly Cited.
limma powers differential expression analyses for RNA-sequencing and microarray studies.
Matthew E Ritchie, Belinda Phipson, +4 authors, Gordon K Smyth.
Nucleic Acids Res, 2015 Jan 22; 43(7). PMID: 25605792    Free PMC article.
Highly Cited.
Prognostic value of PAM50 and risk of recurrence score in patients with early-stage breast cancer with long-term follow-up.
Hege O Ohnstad, Elin Borgen, +11 authors, Bjørn Naume.
Breast Cancer Res, 2017 Nov 16; 19(1). PMID: 29137653    Free PMC article.
Assessment of Breast Cancer Risk Factors Reveals Subtype Heterogeneity.
Johanna Holm, Louise Eriksson, +5 authors, Kamila Czene.
Cancer Res, 2017 May 18; 77(13). PMID: 28512241
Machine learning applications in cancer prognosis and prediction.
Konstantina Kourou, Themis P Exarchos, +2 authors, Dimitrios I Fotiadis.
Comput Struct Biotechnol J, 2015 Mar 10; 13. PMID: 25750696    Free PMC article.
Highly Cited. Review.
Diagnosis of multiple cancer types by shrunken centroids of gene expression.
Robert Tibshirani, Trevor Hastie, Balasubramanian Narasimhan, Gilbert Chu.
Proc Natl Acad Sci U S A, 2002 May 16; 99(10). PMID: 12011421    Free PMC article.
Highly Cited.
DeepCC: a novel deep learning-based framework for cancer molecular subtype classification.
Feng Gao, Wei Wang, +5 authors, Xin Wang.
Oncogenesis, 2019 Aug 20; 8(9). PMID: 31420533    Free PMC article.
Supervised risk predictor of breast cancer based on intrinsic subtypes.
Joel S Parker, Michael Mullins, +17 authors, Philip S Bernard.
J Clin Oncol, 2009 Feb 11; 27(8). PMID: 19204204    Free PMC article.
Highly Cited.
Absolute assignment of breast cancer intrinsic molecular subtype.
Eric R Paquet, Michael T Hallett.
J Natl Cancer Inst, 2014 Dec 07; 107(1). PMID: 25479802
PCA-PAM50 improves consistency between breast cancer intrinsic and clinical subtyping reclassifying a subset of luminal A tumors as luminal B.
Praveen-Kumar Raj-Kumar, Jianfang Liu, +4 authors, Hai Hu.
Sci Rep, 2019 May 30; 9(1). PMID: 31138829    Free PMC article.
RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome.
Bo Li, Colin N Dewey.
BMC Bioinformatics, 2011 Aug 06; 12. PMID: 21816040    Free PMC article.
Highly Cited.
Development and verification of the PAM50-based Prosigna breast cancer gene signature assay.
Brett Wallden, James Storhoff, +16 authors, Joel S Parker.
BMC Med Genomics, 2015 Aug 25; 8. PMID: 26297356    Free PMC article.
Highly Cited.
Analytical validation of the PAM50-based Prosigna Breast Cancer Prognostic Gene Signature Assay and nCounter Analysis System using formalin-fixed paraffin-embedded breast tumor specimens.
Torsten Nielsen, Brett Wallden, +7 authors, James Storhoff.
BMC Cancer, 2014 Mar 15; 14. PMID: 24625003    Free PMC article.
Highly Cited.
Biological subtypes of breast cancer: Prognostic and therapeutic implications.
Ozlem Yersal, Sabri Barutca.
World J Clin Oncol, 2014 Aug 13; 5(3). PMID: 25114856    Free PMC article.
Highly Cited. Review.
Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications.
T Sørlie, C M Perou, +14 authors, A L Børresen-Dale.
Proc Natl Acad Sci U S A, 2001 Sep 13; 98(19). PMID: 11553815    Free PMC article.
Highly Cited.
Proteomic analysis of breast tumors confirms the mRNA intrinsic molecular subtypes using different classifiers: a large-scale analysis of fresh frozen tissue samples.
Sofia Waldemarson, Emila Kurbasic, +4 authors, Peter James.
Breast Cancer Res, 2016 Jul 01; 18(1). PMID: 27357824    Free PMC article.
Parity, hormones and breast cancer subtypes - results from a large nested case-control study in a national screening program.
Merete Ellingjord-Dale, Linda Vos, +3 authors, Giske Ursin.
Breast Cancer Res, 2017 Jan 25; 19(1). PMID: 28114999    Free PMC article.
A Histone Acetylation Modulator Gene Signature for Classification and Prognosis of Breast Cancer.
Mengping Long, Wei Hou, Yiqiang Liu, Taobo Hu.
Curr Oncol, 2021 Feb 23; 28(1). PMID: 33617509    Free PMC article.
Artificial Intelligence in Bulk and Single-Cell RNA-Sequencing Data to Foster Precision Oncology.
Marco Del Giudice, Serena Peirone, +5 authors, Matteo Cereda.
Int J Mol Sci, 2021 May 01; 22(9). PMID: 33925407    Free PMC article.