Journal Article
. 2018 Sep; 2018:7538204.
doi: 10.1155/2018/7538204.

An Efficient Feature Selection Strategy Based on Multiple Support Vector Machine Technology with Gene Expression Data

Ying Zhang 1 Qingchun Deng 2 Wenbin Liang 3 Xianchun Zou 1 
  • PMID: 30228989
  •     31 References
  •     3 citations


The application of gene expression data to the diagnosis and classification of cancer has become a hot issue in the field of cancer classification. Gene expression data usually contains a large number of tumor-free data and has the characteristics of high dimensions. In order to select determinant genes related to breast cancer from the initial gene expression data, we propose a new feature selection method, namely, support vector machine based on recursive feature elimination and parameter optimization (SVM-RFE-PO). The grid search (GS) algorithm, the particle swarm optimization (PSO) algorithm, and the genetic algorithm (GA) are applied to search the optimal parameters in the feature selection process. Herein, the new feature selection method contains three kinds of algorithms: support vector machine based on recursive feature elimination and grid search (SVM-RFE-GS), support vector machine based on recursive feature elimination and particle swarm optimization (SVM-RFE-PSO), and support vector machine based on recursive feature elimination and genetic algorithm (SVM-RFE-GA). Then the selected optimal feature subsets are used to train the SVM classifier for cancer classification. We also use random forest feature selection (RFFS), random forest feature selection and grid search (RFFS-GS), and minimal redundancy maximal relevance (mRMR) algorithm as feature selection methods to compare the effects of the SVM-RFE-PO algorithm. The results showed that the feature subset obtained by feature selection using SVM-RFE-PSO algorithm results has a better prediction performance of Area Under Curve (AUC) in the testing data set. This algorithm not only is time-saving, but also is capable of extracting more representative and useful genes.

edgeR: a Bioconductor package for differential expression analysis of digital gene expression data.
Mark D Robinson, Davis J McCarthy, Gordon K Smyth.
Bioinformatics, 2009 Nov 17; 26(1). PMID: 19910308    Free PMC article.
Highly Cited.
Biomarker Discovery Based on Hybrid Optimization Algorithm and Artificial Neural Networks on Microarray Data for Cancer Classification.
Niloofar Yousefi Moteghaed, Keivan Maghooli, Shiva Pirhadi, Masoud Garshasbi.
J Med Signals Sens, 2015 Jun 30; 5(2). PMID: 26120567    Free PMC article.
RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays.
John C Marioni, Christopher E Mason, +2 authors, Yoav Gilad.
Genome Res, 2008 Jun 14; 18(9). PMID: 18550803    Free PMC article.
Highly Cited.
Gene expression data classification with Kernel principal component analysis.
Zhenqiu Liu, Dechang Chen, Halima Bensmail.
J Biomed Biotechnol, 2005 Jul 28; 2005(2). PMID: 16046821    Free PMC article.
Differential expression analysis for sequence count data.
Simon Anders, Wolfgang Huber.
Genome Biol, 2010 Oct 29; 11(10). PMID: 20979621    Free PMC article.
Highly Cited.
Discriminating cirRNAs from other lncRNAs using a hierarchical extreme learning machine (H-ELM) algorithm with feature selection.
Lei Chen, Yu-Hang Zhang, +4 authors, Yu-Dong Cai.
Mol Genet Genomics, 2017 Sep 16; 293(1). PMID: 28913654
Gene Expression Profiling of Large Cell Lung Cancer Links Transcriptional Phenotypes to the New Histological WHO 2015 Classification.
Anna Karlsson, Hans Brunnström, +8 authors, Johan Staaf.
J Thorac Oncol, 2017 May 26; 12(8). PMID: 28535939
Support vector machine for breast cancer classification using diffusion-weighted MRI histogram features: Preliminary study.
Igor Vidić, Liv Egnell, +6 authors, Pål Erik Goa.
J Magn Reson Imaging, 2017 Oct 19; 47(5). PMID: 29044896
Breast Cancer Cell Line Classification and Its Relevance with Breast Tumor Subtyping.
Xiaofeng Dai, Hongye Cheng, Zhonghu Bai, Jia Li.
J Cancer, 2017 Nov 22; 8(16). PMID: 29158785    Free PMC article.
Highly Cited. Review.
Classification of Genes Based on Age-Related Differential Expression in Breast Cancer.
Gunhee Lee, Minho Lee.
Genomics Inform, 2018 Jan 09; 15(4). PMID: 29307142    Free PMC article.
Development of two-stage SVM-RFE gene selection strategy for microarray expression data analysis.
Yuchun Tang, Yan-Qing Zhang, Zhen Huang.
IEEE/ACM Trans Comput Biol Bioinform, 2007 Aug 02; 4(3). PMID: 17666757
Personalized treatment of women with early breast cancer: a risk-group specific cost-effectiveness analysis of adjuvant chemotherapy accounting for companion prognostic tests OncotypeDX and Adjuvant!Online.
Beate Jahn, Ursula Rochau, +9 authors, Uwe Siebert.
BMC Cancer, 2017 Oct 19; 17(1). PMID: 29037213    Free PMC article.
Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy.
Hanchuan Peng, Fuhui Long, Chris Ding.
IEEE Trans Pattern Anal Mach Intell, 2005 Aug 27; 27(8). PMID: 16119262
Highly Cited.
Cancer Classification in Microarray Data using a Hybrid Selective Independent Component Analysis and υ-Support Vector Machine Algorithm.
Hamidreza Saberkari, Mousa Shamsi, +2 authors, Mohammad Hossein Sedaaghi.
J Med Signals Sens, 2014 Nov 27; 4(4). PMID: 25426433    Free PMC article.
Recursive SVM biomarker selection for early detection of breast cancer in peripheral blood.
Fan Zhang, Howard L Kaufman, Youping Deng, Renee Drabier.
BMC Med Genomics, 2013 Feb 13; 6 Suppl 1. PMID: 23369435    Free PMC article.
Systematic identification of multiple tumor types in microarray data based on hybrid differential evolution algorithm.
Chun-Liang Lu, Tsan-Cheng Su, Tsun-Chen Lin, I-Fang Chung.
Technol Health Care, 2015 Dec 20; 24 Suppl 1. PMID: 26684567
Cost analysis of breast cancer diagnostic assessment programs.
G N Honein-AbouHaidar, J S Hoch, +3 authors, A R Gagliardi.
Curr Oncol, 2017 Nov 02; 24(5). PMID: 29089805    Free PMC article.
Large-scale integration of microarray data reveals genes and pathways common to multiple cancer types.
Noor B Dawany, Will N Dampier, Aydin Tozeren.
Int J Cancer, 2010 Dec 18; 128(12). PMID: 21165954
Nonlinear dimensionality reduction by locally linear embedding.
S T Roweis, L K Saul.
Science, 2000 Dec 23; 290(5500). PMID: 11125150
Highly Cited.
Integration of RNA-Seq data with heterogeneous microarray data for breast cancer profiling.
Daniel Castillo, Juan Manuel Gálvez, +3 authors, Ignacio Rojas.
BMC Bioinformatics, 2017 Nov 22; 18(1). PMID: 29157215    Free PMC article.
Independent component analysis-based penalized discriminant method for tumor classification using gene expression data.
De-Shuang Huang, Chun-Hou Zheng.
Bioinformatics, 2006 May 20; 22(15). PMID: 16709589
A fast and high performance multiple data integration algorithm for identifying human disease genes.
Bolin Chen, Min Li, +2 authors, Fang-Xiang Wu.
BMC Med Genomics, 2015 Sep 25; 8 Suppl 3. PMID: 26399620    Free PMC article.
Association between gene expression profile of the primary tumor and chemotherapy response of metastatic breast cancer.
Cemile Dilara Savci-Heijink, Hans Halfwerk, Jan Koster, Marc Joan Van de Vijver.
BMC Cancer, 2017 Nov 15; 17(1). PMID: 29132326    Free PMC article.
Differential gene expression profiles according to the Association for the Study of Lung Cancer/American Thoracic Society/European Respiratory Society histopathological classification in lung adenocarcinoma subtypes.
Camilo Molina-Romero, Claudia Rangel-Escareño, +7 authors, Oscar Arrieta.
Hum Pathol, 2017 Jun 13; 66. PMID: 28603066
Application of the Bayesian MMSE estimator for classification error to gene expression microarray data.
Lori A Dalton, Edward R Dougherty.
Bioinformatics, 2011 May 10; 27(13). PMID: 21551140
Prognostic value of PAM50 and risk of recurrence score in patients with early-stage breast cancer with long-term follow-up.
Hege O Ohnstad, Elin Borgen, +11 authors, Bjørn Naume.
Breast Cancer Res, 2017 Nov 16; 19(1). PMID: 29137653    Free PMC article.
Fuzzy logic selection as a new reliable tool to identify molecular grade signatures in breast cancer--the INNODIAG study.
Tatiana Kempowsky-Hamon, Carine Valle, +12 authors, Véronique Anton-Leberre.
BMC Med Genomics, 2015 Apr 19; 8. PMID: 25888889    Free PMC article.
A kernel-based multivariate feature selection method for microarray data classification.
Shiquan Sun, Qinke Peng, Adnan Shakoor.
PLoS One, 2014 Jul 23; 9(7). PMID: 25048512    Free PMC article.
Robust prediction of B-factor profile from sequence using two-stage SVR based on random forest feature selection.
Xiao-Yong Pan, Hong-Bin Shen.
Protein Pept Lett, 2009 Dec 17; 16(12). PMID: 20001907
Molecular classification of cancer: class discovery and class prediction by gene expression monitoring.
T R Golub, D K Slonim, +9 authors, E S Lander.
Science, 1999 Oct 16; 286(5439). PMID: 10521349
Highly Cited.
Comparative study of joint analysis of microarray gene expression data in survival prediction and risk assessment of breast cancer patients.
Haleh Yasrebi.
Brief Bioinform, 2015 Oct 28; 17(5). PMID: 26504096    Free PMC article.
State-of-the-art in artificial neural network applications: A survey.
Oludare Isaac Abiodun, Aman Jantan, +3 authors, Humaira Arshad.
Heliyon, 2018 Dec 07; 4(11). PMID: 30519653    Free PMC article.
Machine Learning Based Computational Gene Selection Models: A Survey, Performance Evaluation, Open Issues, and Future Research Directions.
Nivedhitha Mahendran, P M Durai Raj Vincent, Kathiravan Srinivasan, Chuan-Yu Chang.
Front Genet, 2020 Dec 29; 11. PMID: 33362861    Free PMC article.
Preliminary Radiogenomic Evidence for the Prediction of Metastasis and Chemotherapy Response in Pediatric Patients with Osteosarcoma Using 18F-FDF PET/CT, EZRIN and KI67.
Byung-Chul Kim, Jingyu Kim, +6 authors, Sang-Keun Woo.
Cancers (Basel), 2021 Jun 03; 13(11). PMID: 34071614    Free PMC article.