Journal Article
. 2019 May; 75(1):7-12.
doi: 10.1016/j.crad.2019.04.002.

Open access image repositories: high-quality data to enable machine learning research

F Prior 1 J Almeida 2 P Kathiravelu 3 T Kurc 4 K Smith 5 T J Fitzgerald 6 J Saltz 4 
  • PMID: 31040006
  •     70 References
  •     6 citations


Originally motivated by the need for research reproducibility and data reuse, large-scale, open access information repositories have become key resources for training and testing of advanced machine learning applications in biomedical and clinical research. To be of value, such repositories must provide large, high-quality data sets, where quality is defined as minimising variance due to data collection protocols and data misrepresentations. Curation is the key to quality. We have constructed a large public access image repository, The Cancer Imaging Archive, dedicated to the promotion of open science to advance the global effort to diagnose and treat cancer. Drawing on this experience and our experience in applying machine learning techniques to the analysis of radiology and pathology image data, we will review the requirements placed on such information repositories by state-of-the-art machine learning applications and how these requirements can be met.

Artificial intelligence in cancer imaging: Clinical challenges and applications.
Wenya Linda Bi, Ahmed Hosny, +16 authors, Hugo J W L Aerts.
CA Cancer J Clin, 2019 Feb 06; 69(2). PMID: 30720861    Free PMC article.
Highly Cited. Review.
Simultaneous truth and performance level estimation (STAPLE): an algorithm for the validation of image segmentation.
Simon K Warfield, Kelly H Zou, William M Wells.
IEEE Trans Med Imaging, 2004 Jul 15; 23(7). PMID: 15250643    Free PMC article.
Highly Cited.
Implementing the DICOM Standard for Digital Pathology.
Markus D Herrmann, David A Clunie, +19 authors, Jochen K Lennerz.
J Pathol Inform, 2018 Dec 12; 9. PMID: 30533276    Free PMC article.
Computer-aided diagnosis in medical imaging: historical review, current status and future potential.
Kunio Doi.
Comput Med Imaging Graph, 2007 Mar 14; 31(4-5). PMID: 17349778    Free PMC article.
Highly Cited. Review.
A Containerized Software System for Generation, Management, and Exploration of Features from Whole Slide Tissue Images.
Joel Saltz, Ashish Sharma, +9 authors, Tahsin Kurc.
Cancer Res, 2017 Nov 03; 77(21). PMID: 29092946    Free PMC article.
Radiomic feature clusters and prognostic signatures specific for Lung and Head & Neck cancer.
Chintan Parmar, Ralph T H Leijenaar, +7 authors, Hugo J W L Aerts.
Sci Rep, 2015 Aug 08; 5. PMID: 26251068    Free PMC article.
Highly Cited.
Integrated morphologic analysis for the identification and characterization of disease subtypes.
Lee A D Cooper, Jun Kong, +12 authors, Joel H Saltz.
J Am Med Inform Assoc, 2012 Jan 27; 19(2). PMID: 22278382    Free PMC article.
Artificial Intelligence and Machine Learning in Radiology: Opportunities, Challenges, Pitfalls, and Criteria for Success.
James H Thrall, Xiang Li, +4 authors, James Brink.
J Am Coll Radiol, 2018 Feb 07; 15(3 Pt B). PMID: 29402533
Highly Cited.
Digital Imaging and Communications in Medicine (DICOM) as standard in digital pathology.
Thomas Kalinski, Ralf Zwönitzer, +3 authors, Thomas Guenther.
Histopathology, 2012 May 04; 61(1). PMID: 22551421
The clinical value of large neuroimaging data sets in Alzheimer's disease.
Arthur W Toga.
Neuroimaging Clin N Am, 2012 Jan 31; 22(1). PMID: 22284737    Free PMC article.
The Cancer Imaging Archive (TCIA): maintaining and operating a public information repository.
Kenneth Clark, Bruce Vendt, +9 authors, Fred Prior.
J Digit Imaging, 2013 Jul 26; 26(6). PMID: 23884657    Free PMC article.
Highly Cited.
Quantitative Imaging Network: Data Sharing and Competitive AlgorithmValidation Leveraging The Cancer Imaging Archive.
Jayashree Kalpathy-Cramer, John Blake Freymann, +2 authors, Fred William Prior.
Transl Oncol, 2014 Apr 29; 7(1). PMID: 24772218    Free PMC article.
Standardization in digital pathology: Supplement 145 of the DICOM standards.
Rajendra Singh, Lauren Chubb, Liron Pantanowitz, Anil Parwani.
J Pathol Inform, 2011 Jun 03; 2. PMID: 21633489    Free PMC article.
Radiomics: Images Are More than Pictures, They Are Data.
Robert J Gillies, Paul E Kinahan, Hedvig Hricak.
Radiology, 2015 Nov 19; 278(2). PMID: 26579733    Free PMC article.
Highly Cited.
Evaluation of computer-aided detection and diagnosis systems.
Nicholas Petrick, Berkman Sahiner, +19 authors, Heang-Ping Chan.
Med Phys, 2013 Aug 10; 40(8). PMID: 23927365    Free PMC article.
Imaging in the Age of Precision Medicine: Summary of the Proceedings of the 10th Biannual Symposium of the International Society for Strategic Studies in Radiology.
Christian J Herold, Jonathan S Lewin, +7 authors, Hedvig Hricak.
Radiology, 2015 Oct 16; 279(1). PMID: 26465058
Assessment methodologies and statistical issues for computer-aided diagnosis of lung nodules in computed tomography: contemporary research topics relevant to the lung image database consortium.
Lori E Dodd, Robert F Wagner, +10 authors, Lung Image Database Consortium Research Group.
Acad Radiol, 2004 Apr 28; 11(4). PMID: 15109018
Biomedical informatics research network: building a national collaboratory to hasten the derivation of new understanding and treatment of disease.
Jeffrey S Grethe, Chaitan Baru, +9 authors, Mark H Ellisman.
Stud Health Technol Inform, 2005 Jun 01; 112. PMID: 15923720
Reengineering Workflow for Curation of DICOM Datasets.
William Bennett, Kirk Smith, +2 authors, Walter Bosch.
J Digit Imaging, 2018 Jun 17; 31(6). PMID: 29907888    Free PMC article.
A Deep Convolutional Neural Network for segmenting and classifying epithelial and stromal regions in histopathological images.
Jun Xu, Xiaofei Luo, +2 authors, Anant Madabhushi.
Neurocomputing, 2017 Feb 06; 191. PMID: 28154470    Free PMC article.
Highly Cited.
OpenSlide: A vendor-neutral software foundation for digital pathology.
Adam Goode, Benjamin Gilbert, +2 authors, Mahadev Satyanarayanan.
J Pathol Inform, 2013 Nov 19; 4. PMID: 24244884    Free PMC article.
Deep Learning in Medical Image Analysis.
Dinggang Shen, Guorong Wu, Heung-Il Suk.
Annu Rev Biomed Eng, 2017 Mar 17; 19. PMID: 28301734    Free PMC article.
Highly Cited. Review.
An integrative approach for in silico glioma research.
Lee A D Cooper, Jun Kong, +13 authors, Joel H Saltz.
IEEE Trans Biomed Eng, 2010 Jul 27; 57(10). PMID: 20656651    Free PMC article.
A Query Tool for Investigator Access to the Data and Images of the National Lung Screening Trial.
Paul K Commean, Joshua M Rathmell, +2 authors, Fred W Prior.
J Digit Imaging, 2015 Mar 06; 28(4). PMID: 25739345    Free PMC article.
The FAIR Guiding Principles for scientific data management and stewardship.
Mark D Wilkinson, Michel Dumontier, +50 authors, Barend Mons.
Sci Data, 2016 Mar 16; 3. PMID: 26978244    Free PMC article.
Highly Cited.
The public cancer radiology imaging collections of The Cancer Imaging Archive.
Fred Prior, Kirk Smith, +6 authors, John Freymann.
Sci Data, 2017 Sep 20; 4. PMID: 28925987    Free PMC article.
Identifying in vivo DCE MRI markers associated with microvessel architecture and gleason grades of prostate cancer.
Asha Singanamalli, Mirabela Rusu, +7 authors, Anant Madabhushi.
J Magn Reson Imaging, 2015 Jun 26; 43(1). PMID: 26110513    Free PMC article.
A U.S. "Cancer Moonshot" to accelerate cancer research.
Dinah S Singer, Tyler Jacks, Elizabeth Jaffee.
Science, 2016 Sep 09; 353(6304). PMID: 27605537
Can masses of non-experts train highly accurate image classifiers? A crowdsourcing approach to instrument segmentation in laparoscopic images.
Lena Maier-Hein, Sven Mersmann, +6 authors, Stefanie Speidel.
Med Image Comput Comput Assist Interv, 2014 Dec 09; 17(Pt 2). PMID: 25485409
Highly accurate model for prediction of lung nodule malignancy with CT scans.
Jason L Causey, Junyu Zhang, +6 authors, Xiuzhen Huang.
Sci Rep, 2018 Jun 20; 8(1). PMID: 29915334    Free PMC article.
Towards Generation, Management, and Exploration of Combined Radiomics and Pathomics Datasets for Cancer Research.
Joel Saltz, Jonas Almeida, +6 authors, Tahsin Kurc.
AMIA Jt Summits Transl Sci Proc, 2017 Aug 18; 2017. PMID: 28815113    Free PMC article.
Radiomics: the process and the challenges.
Virendra Kumar, Yuhua Gu, +13 authors, Robert J Gillies.
Magn Reson Imaging, 2012 Aug 18; 30(9). PMID: 22898692    Free PMC article.
Highly Cited. Review.
Robust Radiomics feature quantification using semiautomatic volumetric segmentation.
Chintan Parmar, Emmanuel Rios Velazquez, +9 authors, Hugo J W L Aerts.
PLoS One, 2014 Jul 16; 9(7). PMID: 25025374    Free PMC article.
Highly Cited.
Implementing Machine Learning in Radiology Practice and Research.
Marc Kohli, Luciano M Prevedello, Ross W Filice, J Raymond Geis.
AJR Am J Roentgenol, 2017 Jan 27; 208(4). PMID: 28125274
Highly Cited. Review.
Predicting cancer outcomes from histology and genomics using convolutional networks.
Pooya Mobadersany, Safoora Yousefi, +5 authors, Lee A D Cooper.
Proc Natl Acad Sci U S A, 2018 Mar 14; 115(13). PMID: 29531073    Free PMC article.
Highly Cited.
QuPath: Open source software for digital pathology image analysis.
Peter Bankhead, Maurice B Loughrey, +10 authors, Peter W Hamilton.
Sci Rep, 2017 Dec 06; 7(1). PMID: 29203879    Free PMC article.
Highly Cited.
Comparison of standard and double reading and computer-aided detection (CAD) of interval cancers at prior negative screening mammograms: blind review.
S Ciatto, M Rosselli Del Turco, +3 authors, M Zappa.
Br J Cancer, 2003 Oct 30; 89(9). PMID: 14583763    Free PMC article.
Computer-aided diagnosis in radiology. A research plan.
G S Lodwick.
Invest Radiol, 1966 Jan 01; 1(1). PMID: 5910559
Sharing heterogeneous data: the national database for autism research.
Dan Hall, Michael F Huerta, Matthew J McAuliffe, Gregory K Farber.
Neuroinformatics, 2012 May 25; 10(4). PMID: 22622767    Free PMC article.
Will machine learning end the viability of radiology as a thriving medical specialty?
Stephen Chan, Eliot L Siegel.
Br J Radiol, 2018 Oct 17; 92(1094). PMID: 30325645    Free PMC article.
Quantitative imaging of cancer in the postgenomic era: Radio(geno)mics, deep learning, and habitats.
Sandy Napel, Wei Mu, +2 authors, Robert J Gillies.
Cancer, 2018 Nov 02; 124(24). PMID: 30383900    Free PMC article.
Machine learning approaches in medical image analysis: From detection to diagnosis.
Marleen de Bruijne.
Med Image Anal, 2016 Aug 03; 33. PMID: 27481324
OMERO: flexible, model-driven data management for experimental biology.
Chris Allan, Jean-Marie Burel, +21 authors, Jason R Swedlow.
Nat Methods, 2012 Mar 01; 9(3). PMID: 22373911    Free PMC article.
Highly Cited.
Radiomics: extracting more information from medical images using advanced feature analysis.
Philippe Lambin, Emmanuel Rios-Velazquez, +8 authors, Hugo J W L Aerts.
Eur J Cancer, 2012 Jan 20; 48(4). PMID: 22257792    Free PMC article.
Highly Cited. Review.
International data-sharing for radiotherapy research: an open-source based infrastructure for multicentric clinical data mining.
Erik Roelofs, André Dekker, +3 authors, Philippe Lambin.
Radiother Oncol, 2013 Dec 07; 110(2). PMID: 24309199    Free PMC article.
Deep learning for lung cancer prognostication: A retrospective multi-cohort radiomics study.
Ahmed Hosny, Chintan Parmar, +7 authors, Hugo J W L Aerts.
PLoS Med, 2018 Dec 01; 15(11). PMID: 30500819    Free PMC article.
Highly Cited.
Personalized medicine.
James H Thrall.
Radiology, 2004 May 28; 231(3). PMID: 15163802
Decoding tumour phenotype by noninvasive imaging using a quantitative radiomics approach.
Hugo J W L Aerts, Emmanuel Rios Velazquez, +15 authors, Philippe Lambin.
Nat Commun, 2014 Jun 04; 5. PMID: 24892406    Free PMC article.
Highly Cited.
The Digital Slide Archive: A Software Platform for Management, Integration, and Analysis of Histology for Cancer Research.
David A Gutman, Mohammed Khalilia, +6 authors, Lee A D Cooper.
Cancer Res, 2017 Nov 03; 77(21). PMID: 29092945    Free PMC article.
Radiomics of Lung Nodules: A Multi-Institutional Study of Robustness and Agreement of Quantitative Imaging Features.
Jayashree Kalpathy-Cramer, Artem Mamomov, +17 authors, Dmitry Goldgof.
Tomography, 2017 Feb 06; 2(4). PMID: 28149958    Free PMC article.
Highly Cited.
A survey on deep learning in medical image analysis.
Geert Litjens, Thijs Kooi, +6 authors, Clara I Sánchez.
Med Image Anal, 2017 Aug 05; 42. PMID: 28778026
Highly Cited. Review.
Deep learning for digital pathology image analysis: A comprehensive tutorial with selected use cases.
Andrew Janowczyk, Anant Madabhushi.
J Pathol Inform, 2016 Aug 27; 7. PMID: 27563488    Free PMC article.
Highly Cited.
De-identification of Medical Images with Retention of Scientific Research Value.
Stephen M Moore, David R Maffitt, +6 authors, Fred W Prior.
Radiographics, 2015 May 15; 35(3). PMID: 25969931    Free PMC article.
Metadata matters: access to image data in the real world.
Melissa Linkert, Curtis T Rueden, +13 authors, Jason R Swedlow.
J Cell Biol, 2010 Jun 02; 189(5). PMID: 20513764    Free PMC article.
Highly Cited.
Spatial Organization and Molecular Correlation of Tumor-Infiltrating Lymphocytes Using Deep Learning on Pathology Images.
Joel Saltz, Rajarsi Gupta, +14 authors, Vésteinn Thorsson.
Cell Rep, 2018 Apr 05; 23(1). PMID: 29617659    Free PMC article.
Highly Cited.
Tryggo: Old norse for truth: The real truth about ground truth: New insights into the challenges of generating ground truth maps for WSI CAD algorithm evaluation.
Jason D Hipp, Steven C Smith, +4 authors, Ulysses J Balis.
J Pathol Inform, 2012 Apr 25; 3. PMID: 22530176    Free PMC article.
Role of Big Data and Machine Learning in Diagnostic Decision Support in Radiology.
Tanveer Syeda-Mahmood.
J Am Coll Radiol, 2018 Mar 06; 15(3 Pt B). PMID: 29502585
How Will Big Data Improve Clinical and Basic Research in Radiation Therapy?
Barry S Rosenstein, Jacek Capala, +8 authors, Ying Xiao.
Int J Radiat Oncol Biol Phys, 2016 Jan 23; 95(3). PMID: 26797542    Free PMC article.
Trends and Developments Shaping the Future of Diagnostic Medical Imaging: 2015 Annual Oration in Diagnostic Radiology.
James H Thrall.
Radiology, 2016 May 18; 279(3). PMID: 27183401
Anniversary paper: History and status of CAD and quantitative image analysis: the role of Medical Physics and AAPM.
Maryellen L Giger, Heang-Ping Chan, John Boone.
Med Phys, 2009 Jan 30; 35(12). PMID: 19175137    Free PMC article.
Crowdsourcing image annotation for nucleus detection and segmentation in computational pathology: evaluating experts, automated methods, and the crowd.
H Irshad, L Montaser-Kouhsari, +5 authors, A H Beck.
Pac Symp Biocomput, 2015 Jan 17;. PMID: 25592590    Free PMC article.
Open Access Series of Imaging Studies (OASIS): cross-sectional MRI data in young, middle aged, nondemented, and demented older adults.
Daniel S Marcus, Tracy H Wang, +3 authors, Randy L Buckner.
J Cogn Neurosci, 2007 Aug 24; 19(9). PMID: 17714011
Highly Cited.
MIRMAID: A Content Management System for Medical Image Analysis Research.
Panagiotis D Korfiatis, Timothy L Kline, +3 authors, Bradley J Erickson.
Radiographics, 2015 Aug 19; 35(5). PMID: 26284301    Free PMC article.
Informatics and data mining tools and strategies for the human connectome project.
Daniel S Marcus, John Harwell, +7 authors, David C Van Essen.
Front Neuroinform, 2011 Jul 12; 5. PMID: 21743807    Free PMC article.
Highly Cited.
NCI Workshop Report: Clinical and Computational Requirements for Correlating Imaging Phenotypes with Genomics Signatures.
Rivka Colen, Ian Foster, +13 authors, Gary Whitman.
Transl Oncol, 2014 Nov 13; 7(5). PMID: 25389451    Free PMC article.
Robust Nucleus/Cell Detection and Segmentation in Digital Pathology and Microscopy Images: A Comprehensive Review.
Fuyong Xing, Lin Yang.
IEEE Rev Biomed Eng, 2016 Jan 08; 9. PMID: 26742143    Free PMC article.
Highly Cited. Review.
CAD in questions/answers Review of the literature.
Bruno Boyer, Corinne Balleyguier, Olivier Granat, Christian Pharaboz.
Eur J Radiol, 2008 Nov 04; 69(1). PMID: 18977103
Evaluating imaging and computer-aided detection and diagnosis devices at the FDA.
Brandon D Gallas, Heang-Ping Chan, +10 authors, Margarita L Zuley.
Acad Radiol, 2012 Feb 07; 19(4). PMID: 22306064    Free PMC article.
TCIA: An information resource to enable open science.
Fred W Prior, Ken Clark, +8 authors, Guillermo Marquez.
Annu Int Conf IEEE Eng Med Biol Soc, 2013 Oct 11; 2013. PMID: 24109929    Free PMC article.
Radiomics: the bridge between medical imaging and personalized medicine.
Philippe Lambin, Ralph T H Leijenaar, +17 authors, Sean Walsh.
Nat Rev Clin Oncol, 2017 Oct 05; 14(12). PMID: 28975929
Highly Cited. Review.
PRISM: A Platform for Imaging in Precision Medicine.
Ashish Sharma, Lawrence Tarbox, +6 authors, Fred Prior.
JCO Clin Cancer Inform, 2020 Jun 02; 4. PMID: 32479186    Free PMC article.
COVID-19, AI enthusiasts, and toy datasets: radiology without radiologists.
H R Tizhoosh, Jennifer Fratesi.
Eur Radiol, 2020 Nov 13; 31(5). PMID: 33179164    Free PMC article.
Artificial Intelligence for the Future Radiology Diagnostic Service.
Seong K Mun, Kenneth H Wong, +2 authors, Shijir Bayarsaikhan.
Front Mol Biosci, 2021 Feb 16; 7. PMID: 33585563    Free PMC article.
Technologic optimization of a virtual disease focused panel during the COVID pandemic and beyond.
Mohammed Saleh, Priya Bhosale, +3 authors, Ajaykumar Morani.
Abdom Radiol (NY), 2021 Mar 17; 46(7). PMID: 33725146    Free PMC article.
Development and operation of a digital platform for sharing pathology image data.
Yunsook Kang, Yoo Jung Kim, +17 authors, Jinwook Choi.
BMC Med Inform Decis Mak, 2021 Apr 05; 21(1). PMID: 33812383    Free PMC article.
Artificial intelligence and machine learning for medical imaging: A technology review.
Ana Barragán-Montero, Umair Javaid, +11 authors, John A Lee.
Phys Med, 2021 May 13; 83. PMID: 33979715    Free PMC article.