Journal Article
. 2014 Feb;9(1).
doi: 10.1371/journal.pone.0086309.

Integrative gene network construction to analyze cancer recurrence using semi-supervised learning

Chihyun Park 1 Jaegyoon Ahn 1 Hyunjin Kim 1 Sanghyun Park 1 
  • PMID: 24497942
  •     18 References
  •     9 citations


Background: The prognosis of cancer recurrence is an important research area in bioinformatics and is challenging due to the small sample sizes compared to the vast number of genes. There have been several attempts to predict cancer recurrence. Most studies employed a supervised approach, which uses only a few labeled samples. Semi-supervised learning can be a great alternative to solve this problem. There have been few attempts based on manifold assumptions to reveal the detailed roles of identified cancer genes in recurrence.

Results: In order to predict cancer recurrence, we proposed a novel semi-supervised learning algorithm based on a graph regularization approach. We transformed the gene expression data into a graph structure for semi-supervised learning and integrated protein interaction data with the gene expression data to select functionally-related gene pairs. Then, we predicted the recurrence of cancer by applying a regularization approach to the constructed graph containing both labeled and unlabeled nodes.

Conclusions: The average improvement rate of accuracy for three different cancer datasets was 24.9% compared to existing supervised and semi-supervised methods. We performed functional enrichment on the gene networks used for learning. We identified that those gene networks are significantly associated with cancer-recurrence-related biological functions. Our algorithm was developed with standard C++ and is available in Linux and MS Windows formats in the STL library. The executable program is freely available at:

Cytoscape: a software environment for integrated models of biomolecular interaction networks.
Paul Shannon, Andrew Markiel, +6 authors, Trey Ideker.
Genome Res, 2003 Nov 05; 13(11). PMID: 14597658    Free PMC article.
Highly Cited.
Fully moderated T-statistic for small sample size gene expression arrays.
Lianbo Yu, Parul Gulati, +3 authors, David Jarjoura.
Stat Appl Genet Mol Biol, 2011 Jan 01; 10(1). PMID: 23089813    Free PMC article.
Association of cyclin D1 genotype with breast cancer risk and survival.
Xiao Ou Shu, Derek B Moore, +6 authors, Wei Zheng.
Cancer Epidemiol Biomarkers Prev, 2005 Jan 26; 14(1). PMID: 15668481
Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer.
Yixin Wang, Jan G M Klijn, +11 authors, John A Foekens.
Lancet, 2005 Feb 22; 365(9460). PMID: 15721472
Highly Cited.
BiNGO: a Cytoscape plugin to assess overrepresentation of gene ontology categories in biological networks.
Steven Maere, Karel Heymans, Martin Kuiper.
Bioinformatics, 2005 Jun 24; 21(16). PMID: 15972284
Highly Cited.
Gene expression and benefit of chemotherapy in women with node-negative, estrogen receptor-positive breast cancer.
Soonmyung Paik, Gong Tang, +11 authors, Norman Wolmark.
J Clin Oncol, 2006 May 25; 24(23). PMID: 16720680
Highly Cited.
Assessing differential gene expression with small sample sizes in oligonucleotide arrays using a mean-variance model.
Jianhua Hu, Fred A Wright.
Biometrics, 2007 Apr 24; 63(1). PMID: 17447928
Network-based classification of breast cancer metastasis.
Han-Yu Chuang, Eunjung Lee, +2 authors, Trey Ideker.
Mol Syst Biol, 2007 Oct 18; 3. PMID: 17940530    Free PMC article.
Highly Cited.
Biological processes associated with breast cancer clinical outcome depend on the molecular subtypes.
Christine Desmedt, Benjamin Haibe-Kains, +6 authors, Christos Sotiriou.
Clin Cancer Res, 2008 Aug 14; 14(16). PMID: 18698033
Highly Cited.
Dynamic modularity in protein interaction networks predicts breast cancer outcome.
Ian W Taylor, Rune Linding, +7 authors, Jeffrey L Wrana.
Nat Biotechnol, 2009 Feb 03; 27(2). PMID: 19182785
Highly Cited.
Experimentally derived metastasis gene expression profile predicts recurrence and death in patients with colon cancer.
J Joshua Smith, Natasha G Deane, +16 authors, R Daniel Beauchamp.
Gastroenterology, 2009 Nov 17; 138(3). PMID: 19914252    Free PMC article.
Highly Cited.
A semi-supervised learning approach to predict synthetic genetic interactions by combining functional and topological properties of functional gene network.
Zhu-Hong You, Zheng Yin, +2 authors, Xiaobo Zhou.
BMC Bioinformatics, 2010 Jun 25; 11. PMID: 20573270    Free PMC article.
Gene expression signature to improve prognosis prediction of stage II and III colorectal cancer.
Ramon Salazar, Paul Roepman, +16 authors, Rob Tollenaar.
J Clin Oncol, 2010 Nov 26; 29(1). PMID: 21098318
Highly Cited.
Bayesian ensemble methods for survival prediction in gene expression data.
Vinicius Bonato, Veerabhadran Baladandayuthapani, +3 authors, Kim-Anh Do.
Bioinformatics, 2010 Dec 15; 27(3). PMID: 21148161    Free PMC article.
Integrative gene network construction for predicting a set of complementary prostate cancer genes.
Jaegyoon Ahn, Youngmi Yoon, +2 authors, Sanghyun Park.
Bioinformatics, 2011 May 10; 27(13). PMID: 21551151
Semi-supervised learning improves gene expression-based prediction of cancer recurrence.
Mingguang Shi, Bing Zhang.
Bioinformatics, 2011 Sep 07; 27(21). PMID: 21893520    Free PMC article.
Detecting disease genes based on semi-supervised learning and protein-protein interaction networks.
Thanh-Phuong Nguyen, Tu-Bao Ho.
Artif Intell Med, 2011 Oct 18; 54(1). PMID: 22000346
Semi-supervised methods to predict patient survival from gene expression data.
Eric Bair, Robert Tibshirani.
PLoS Biol, 2004 Apr 20; 2(4). PMID: 15094809    Free PMC article.
Highly Cited.
Machine learning applications in cancer prognosis and prediction.
Konstantina Kourou, Themis P Exarchos, +2 authors, Dimitrios I Fotiadis.
Comput Struct Biotechnol J, 2015 Mar 10; 13. PMID: 25750696    Free PMC article.
Highly Cited. Review.
CAMUR: Knowledge extraction from RNA-seq cancer data through equivalent classification rules.
Valerio Cestarelli, Giulia Fiscon, +2 authors, Emanuel Weitschek.
Bioinformatics, 2015 Nov 01; 32(5). PMID: 26519501    Free PMC article.
Systematic identification of an integrative network module during senescence from time-series gene expression.
Chihyun Park, So Jeong Yun, +4 authors, Sang Chul Park.
BMC Syst Biol, 2017 Mar 17; 11(1). PMID: 28298218    Free PMC article.
Ensemble Methods with Voting Protocols Exhibit Superior Performance for Predicting Cancer Clinical Endpoints and Providing More Complete Coverage of Disease-Related Genes.
Runyu Jing, Yu Liang, +3 authors, Li He.
Int J Genomics, 2018 Mar 17; 2018. PMID: 29546047    Free PMC article.
Machine learning-based identification of genetic interactions from heterogeneous gene expression profiles.
Chihyun Park, JungRim Kim, Jeongwoo Kim, Sanghyun Park.
PLoS One, 2018 Jul 27; 13(7). PMID: 30048494    Free PMC article.
Network modeling of patients' biomolecular profiles for clinical phenotype/outcome prediction.
Jessica Gliozzo, Paolo Perlasca, +9 authors, Giorgio Valentini.
Sci Rep, 2020 Feb 29; 10(1). PMID: 32107391    Free PMC article.
Proteomic research in sarcomas - current status and future opportunities.
Jessica Burns, Christopher P Wilding, Robin L Jones, Paul H Huang.
Semin Cancer Biol, 2019 Nov 14; 61. PMID: 31722230    Free PMC article.
A Multi-Modal Graph-Based Semi-Supervised Pipeline for Predicting Cancer Survival.
Hamid Reza Hassanzadeh, John H Phan, May D Wang.
Proceedings (IEEE Int Conf Bioinformatics Biomed), 2016 Dec 01; 2016. PMID: 32655981    Free PMC article.
Predicting of Sentinel Lymph Node Status in Breast Cancer Patients with Clinically Negative Nodes: A Validation Study.
Annarita Fanizzi, Domenico Pomarico, +11 authors, Raffaella Massafra.
Cancers (Basel), 2021 Jan 23; 13(2). PMID: 33477893    Free PMC article.