Journal Article
. 2013 Dec;14 Suppl 14().
doi: 10.1186/1471-2105-14-S14-S4.

Towards the integration, annotation and association of historical microarray experiments with RNA-seq

Shweta S Chavan  Michael A Bauer  Erich A Peterson  Christoph J Heuck  Donald J Johann  
  • PMID: 24268045
  •     34 References
  •     8 citations


Background: Transcriptome analysis by microarrays has produced important advances in biomedicine. For instance in multiple myeloma (MM), microarray approaches led to the development of an effective disease subtyping via cluster assignment, and a 70 gene risk score. Both enabled an improved molecular understanding of MM, and have provided prognostic information for the purposes of clinical management. Many researchers are now transitioning to Next Generation Sequencing (NGS) approaches and RNA-seq in particular, due to its discovery-based nature, improved sensitivity, and dynamic range. Additionally, RNA-seq allows for the analysis of gene isoforms, splice variants, and novel gene fusions. Given the voluminous amounts of historical microarray data, there is now a need to associate and integrate microarray and RNA-seq data via advanced bioinformatic approaches.

Methods: Custom software was developed following a model-view-controller (MVC) approach to integrate Affymetrix probe set-IDs, and gene annotation information from a variety of sources. The tool/approach employs an assortment of strategies to integrate, cross reference, and associate microarray and RNA-seq datasets.

Results: Output from a variety of transcriptome reconstruction and quantitation tools (e.g., Cufflinks) can be directly integrated, and/or associated with Affymetrix probe set data, as well as necessary gene identifiers and/or symbols from a diversity of sources. Strategies are employed to maximize the annotation and cross referencing process. Custom gene sets (e.g., MM 70 risk score (GEP-70)) can be specified, and the tool can be directly assimilated into an RNA-seq pipeline.

Conclusion: A novel bioinformatic approach to aid in the facilitation of both annotation and association of historic microarray data, in conjunction with richer RNA-seq data, is now assisting with the study of MM cancer biology.

RNA-Seq: a revolutionary tool for transcriptomics.
Zhong Wang, Mark Gerstein, Michael Snyder.
Nat Rev Genet, 2008 Nov 19; 10(1). PMID: 19015660    Free PMC article.
Highly Cited. Review.
Biomarker discovery: tissues versus fluids versus both.
Donald J Johann, Josip Blonder.
Expert Rev Mol Diagn, 2007 Sep 26; 7(5). PMID: 17892354
Multigene predictors in early-stage breast cancer: moving in or moving out?
Jeffrey S Ross.
Expert Rev Mol Diagn, 2008 Mar 28; 8(2). PMID: 18366299
RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays.
John C Marioni, Christopher E Mason, +2 authors, Yoav Gilad.
Genome Res, 2008 Jun 14; 18(9). PMID: 18550803    Free PMC article.
Highly Cited.
Mapping and quantifying mammalian transcriptomes by RNA-Seq.
Ali Mortazavi, Brian A Williams, +2 authors, Barbara Wold.
Nat Methods, 2008 Jun 03; 5(7). PMID: 18516045
Highly Cited.
Nonsense-mediated mRNA decay in health and disease.
P A Frischmeyer, H C Dietz.
Hum Mol Genet, 1999 Sep 02; 8(10). PMID: 10469842
Highly Cited. Review.
Multiple myeloma.
Robert A Kyle, S Vincent Rajkumar.
N Engl J Med, 2004 Oct 29; 351(18). PMID: 15509819
Highly Cited. Review.
Cancer statistics, 2010.
Ahmedin Jemal, Rebecca Siegel, Jiaquan Xu, Elizabeth Ward.
CA Cancer J Clin, 2010 Jul 09; 60(5). PMID: 20610543
Highly Cited.
Combined blood/tissue analysis for cancer biomarker discovery: application to renal cell carcinoma.
Donald J Johann, Bih-Rong Wei, +11 authors, Josip Blonder.
Anal Chem, 2010 Feb 04; 82(5). PMID: 20121140    Free PMC article.
Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation.
Cole Trapnell, Brian A Williams, +6 authors, Lior Pachter.
Nat Biotechnol, 2010 May 04; 28(5). PMID: 20436464    Free PMC article.
Highly Cited.
From RNA-seq reads to differential expression results.
Alicia Oshlack, Mark D Robinson, Matthew D Young.
Genome Biol, 2010 Dec 24; 11(12). PMID: 21176179    Free PMC article.
Highly Cited. Review.
Pharmacogenomics of bortezomib test-dosing identifies hyperexpression of proteasome genes, especially PSMD4, as novel high-risk feature in myeloma treated with Total Therapy 3.
John D Shaughnessy, Pingping Qu, +15 authors, Bart Barlogie.
Blood, 2011 Jun 02; 118(13). PMID: 21628408    Free PMC article.
A validated gene expression model of high-risk multiple myeloma is defined by deregulated expression of genes mapping to chromosome 1.
John D Shaughnessy, Fenghuang Zhan, +24 authors, Bart Barlogie.
Blood, 2006 Nov 16; 109(6). PMID: 17105813
Highly Cited.
Development of the 21-gene assay and its application in clinical practice and clinical trials.
Joseph A Sparano, Soonmyung Paik.
J Clin Oncol, 2008 Feb 09; 26(5). PMID: 18258979
Highly Cited. Review.
Individualization of therapy using Mammaprint: from development to the MINDACT Trial.
Stella Mook, Laura J Van't Veer, +2 authors, Fatima Cardoso.
Cancer Genomics Proteomics, 2007 Sep 20; 4(3). PMID: 17878518
Antibody-based inhibition of DKK1 suppresses tumor-induced bone resorption and multiple myeloma growth in vivo.
Shmuel Yaccoby, Wen Ling, +3 authors, John D Shaughnessy.
Blood, 2006 Oct 28; 109(5). PMID: 17068150    Free PMC article.
Highly Cited.
Antitumor activity of thalidomide in refractory multiple myeloma.
S Singhal, J Mehta, +9 authors, B Barlogie.
N Engl J Med, 1999 Nov 24; 341(21). PMID: 10564685
Highly Cited.
Approach to the treatment of multiple myeloma: a clash of philosophies.
S Vincent Rajkumar, Gösta Gahrton, P Leif Bergsagel.
Blood, 2011 Jul 28; 118(12). PMID: 21791430    Free PMC article.
The molecular classification of multiple myeloma.
Fenghuang Zhan, Yongsheng Huang, +17 authors, John D Shaughnessy.
Blood, 2006 May 27; 108(6). PMID: 16728703    Free PMC article.
Highly Cited.
Estimation of alternative splicing isoform frequencies from RNA-Seq data.
Marius Nicolae, Serghei Mangul, Ion I Măndoiu, Alex Zelikovsky.
Algorithms Mol Biol, 2011 Apr 21; 6(1). PMID: 21504602    Free PMC article.
RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome.
Bo Li, Colin N Dewey.
BMC Bioinformatics, 2011 Aug 06; 12. PMID: 21816040    Free PMC article.
Highly Cited.
Total therapy with tandem transplants for newly diagnosed multiple myeloma.
B Barlogie, S Jagannath, +18 authors, J Crowley.
Blood, 1998 Dec 24; 93(1). PMID: 9864146
A genetic signature can predict prognosis and response to therapy in breast cancer: Oncotype DX.
Virginia Kaklamani.
Expert Rev Mol Diagn, 2006 Dec 05; 6(6). PMID: 17140367
Advances in understanding cancer genomes through second-generation sequencing.
Matthew Meyerson, Stacey Gabriel, Gad Getz.
Nat Rev Genet, 2010 Sep 18; 11(10). PMID: 20847746
Highly Cited. Review.
Gene expression and benefit of chemotherapy in women with node-negative, estrogen receptor-positive breast cancer.
Soonmyung Paik, Gong Tang, +11 authors, Norman Wolmark.
J Clin Oncol, 2006 May 25; 24(23). PMID: 16720680
Highly Cited.
Integrative genomics viewer.
James T Robinson, Helga Thorvaldsdóttir, +4 authors, Jill P Mesirov.
Nat Biotechnol, 2011 Jan 12; 29(1). PMID: 21221095    Free PMC article.
Highly Cited.
Genomics and the continuum of cancer care.
Ultan McDermott, James R Downing, Michael R Stratton.
N Engl J Med, 2011 Jan 28; 364(4). PMID: 21268726
Highly Cited. Review.
Adjuvant chemotherapy decisions in clinical practice for early-stage node-negative, estrogen receptor-positive, HER2-negative breast cancer: challenges and considerations.
Gayathri Nagaraj, Cynthia X Ma.
J Natl Compr Canc Netw, 2013 Mar 15; 11(3). PMID: 23486451
IsoLasso: a LASSO regression approach to RNA-Seq based transcriptome assembly.
Wei Li, Jianxing Feng, Tao Jiang.
J Comput Biol, 2011 Sep 29; 18(11). PMID: 21951053    Free PMC article.
The role of the Wnt-signaling antagonist DKK1 in the development of osteolytic lesions in multiple myeloma.
Erming Tian, Fenghuang Zhan, +4 authors, John D Shaughnessy.
N Engl J Med, 2003 Dec 26; 349(26). PMID: 14695408
Highly Cited.
Treatment of multiple myeloma.
S Vincent Rajkumar.
Nat Rev Clin Oncol, 2011 Apr 28; 8(8). PMID: 21522124    Free PMC article.
Highly Cited. Review.
Customized care 2020: how medical sequencing and network biology will enable personalized medicine.
Mark S Boguski, Ramy Arnaout, Colin Hill.
F1000 Biol Rep, 2009 Jan 01; 1. PMID: 20948615    Free PMC article.
Clinical application of the 70-gene profile: the MINDACT trial.
Fatima Cardoso, Laura Van't Veer, +3 authors, Martine J Piccart-Gebhart.
J Clin Oncol, 2008 Feb 09; 26(5). PMID: 18258980
Highly Cited.
Bortezomib or high-dose dexamethasone for relapsed multiple myeloma.
Paul G Richardson, Pieter Sonneveld, +19 authors, Assessment of Proteasome Inhibition for Extending Remissions (APEX) Investigators.
N Engl J Med, 2005 Jun 17; 352(24). PMID: 15958804
Highly Cited.
Proceedings of the 2013 MidSouth Computational Biology and Bioinformatics Society (MCBIOS) Conference.
Jonathan D Wren, Mikhail G Dozmorov, +5 authors, Gordon K Springer.
BMC Bioinformatics, 2013 Dec 07; 14 Suppl 14. PMID: 24267415    Free PMC article.
Risk stratification in myelodysplastic syndromes: is there a role for gene expression profiling?
Amer M Zeidan, Thomas Prebet, Ehab Saad Aldin, Steven David Gore.
Expert Rev Hematol, 2014 Feb 25; 7(2). PMID: 24559255    Free PMC article.
Integrative RNA-seq and microarray data analysis reveals GC content and gene length biases in the psoriasis transcriptome.
William R Swindell, Xianying Xing, +3 authors, Johann E Gudjonsson.
Physiol Genomics, 2014 May 23; 46(15). PMID: 24844236    Free PMC article.
Leveraging the new with the old: providing a framework for the integration of historic microarray studies with next generation sequencing.
Michael A Bauer, Shweta S Chavan, +2 authors, Donald J Johann.
BMC Bioinformatics, 2014 Oct 29; 15 Suppl 11. PMID: 25350881    Free PMC article.
Transcriptome analysis of the Capra hircus ovary.
Zhong Quan Zhao, Li Juan Wang, +4 authors, Jia Hua Zhang.
PLoS One, 2015 Mar 31; 10(3). PMID: 25822507    Free PMC article.
RNA-Seq and microarray analysis of the Xenopus inner ear transcriptome discloses orthologous OMIM(®) genes for hereditary disorders of hearing and balance.
Daniel Ramírez-Gordillo, TuShun R Powers, +3 authors, Elba E Serrano.
BMC Res Notes, 2015 Nov 20; 8. PMID: 26582541    Free PMC article.
Protein Sequence Annotation Tool (PSAT): a centralized web-based meta-server for high-throughput sequence annotations.
Elo Leung, Amy Huang, +3 authors, Carol L Ecale Zhou.
BMC Bioinformatics, 2016 Jan 23; 17. PMID: 26792120    Free PMC article.
A semi-parametric statistical model for integrating gene expression profiles across different platforms.
Yafei Lyu, Qunhua Li.
BMC Bioinformatics, 2016 Jan 29; 17 Suppl 1. PMID: 26818110    Free PMC article.