Journal Article
. 2017 May; 22(1):244-251.
doi: 10.1109/JBHI.2017.2700722.

Deep Learning for Automated Extraction of Primary Sites From Cancer Pathology Reports

John X Qiu  Hong-Jun Yoon  Paul A Fearn  Georgia D Tourassi  
  • PMID: 28475069
  •     25 citations


Pathology reports are a primary source of information for cancer registries which process high volumes of free-text reports annually. Information extraction and coding is a manual, labor-intensive process. In this study, we investigated deep learning and a convolutional neural network (CNN), for extracting ICD-O-3 topographic codes from a corpus of breast and lung cancer pathology reports. We performed two experiments, using a CNN and a more conventional term frequency vector approach, to assess the effects of class prevalence and inter-class transfer learning. The experiments were based on a set of 942 pathology reports with human expert annotations as the gold standard. CNN performance was compared against a more conventional term frequency vector space approach. We observed that the deep learning models consistently outperformed the conventional approaches in the class prevalence experiment, resulting in micro- and macro-F score increases of up to 0.132 and 0.226, respectively, when class labels were well populated. Specifically, the best performing CNN achieved a micro-F score of 0.722 over 12 ICD-O-3 topography codes. Transfer learning provided a consistent but modest performance boost for the deep learning methods but trends were contingent on the CNN method and cancer site. These encouraging results demonstrate the potential of deep learning for automated abstraction of pathology reports.

Association of Pathological Fibrosis With Renal Survival Using Deep Neural Networks.
Vijaya B Kolachalama, Priyamvada Singh, +6 authors, Vipul C Chitalia.
Kidney Int Rep, 2018 May 05; 3(2). PMID: 29725651    Free PMC article.
Opportunities and challenges in developing deep learning models using electronic health records data: a systematic review.
Cao Xiao, Edward Choi, Jimeng Sun.
J Am Med Inform Assoc, 2018 Jun 13; 25(10). PMID: 29893864    Free PMC article.
Highly Cited. Systematic Review.
Clinical Data Extraction and Normalization of Cyrillic Electronic Health Records Via Deep-Learning Natural Language Processing.
Boyang Zhao.
JCO Clin Cancer Inform, 2019 Oct 03; 3. PMID: 31577448    Free PMC article.
AI Meets Exascale Computing: Advancing Cancer Research With Large-Scale High Performance Computing.
Tanmoy Bhattacharya, Thomas Brettin, +13 authors, George Zaki.
Front Oncol, 2019 Oct 22; 9. PMID: 31632915    Free PMC article.
The Current Research Landscape on the Artificial Intelligence Application in the Management of Depressive Disorders: A Bibliometric Analysis.
Bach Xuan Tran, Roger S McIntyre, +6 authors, Roger C M Ho.
Int J Environ Res Public Health, 2019 Jun 21; 16(12). PMID: 31216619    Free PMC article.
Artificial Intelligence-Driven Structurization of Diagnostic Information in Free-Text Pathology Reports.
Pericles S Giannaris, Zainab Al-Taie, +11 authors, Dmitriy Shin.
J Pathol Inform, 2020 Mar 14; 11. PMID: 32166042    Free PMC article.
Identifying Acute Low Back Pain Episodes in Primary Care Practice From Clinical Notes: Observational Study.
Riccardo Miotto, Bethany L Percha, +4 authors, Ismail Nabeel.
JMIR Med Inform, 2020 Mar 05; 8(2). PMID: 32130159    Free PMC article.
Use of Natural Language Processing to Extract Clinical Cancer Phenotypes from Electronic Medical Records.
Guergana K Savova, Ioana Danciu, +5 authors, Jeremy L Warner.
Cancer Res, 2019 Aug 10; 79(21). PMID: 31395609    Free PMC article.
Machine learning mortality classification in clinical documentation with increased accuracy in visual-based analyses.
Susan M Slattery, Daniel C Knight, +3 authors, Karna Murthy.
Acta Paediatr, 2019 Nov 26; 109(7). PMID: 31762098    Free PMC article.
Cross-registry neural domain adaptation to extract mutational test results from pathology reports.
Anthony Rios, Eric B Durbin, +5 authors, Ramakanth Kavuluru.
J Biomed Inform, 2019 Aug 12; 97. PMID: 31401235    Free PMC article.
Using case-level context to classify cancer pathology reports.
Shang Gao, Mohammed Alawad, +6 authors, Georgia Tourassi.
PLoS One, 2020 May 13; 15(5). PMID: 32396579    Free PMC article.
Hierarchical attention networks for information extraction from cancer pathology reports.
Shang Gao, Michael T Young, +5 authors, Arvind Ramanthan.
J Am Med Inform Assoc, 2017 Nov 21; 25(3). PMID: 29155996    Free PMC article.
From Patient Engagement to Precision Oncology: Leveraging Informatics to Advance Cancer Care.
Ashley C Griffin, Umit Topaloglu, Sean Davis, Arlene E Chung.
Yearb Med Inform, 2020 Aug 22; 29(1). PMID: 32823322    Free PMC article.
Natural Language Processing to Ascertain Cancer Outcomes From Medical Oncologist Notes.
Kenneth L Kehl, Wenxin Xu, +5 authors, Deborah Schrag.
JCO Clin Cancer Inform, 2020 Aug 07; 4. PMID: 32755459    Free PMC article.
Automatic extraction of cancer registry reportable information from free-text pathology reports using multitask convolutional neural networks.
Mohammed Alawad, Shang Gao, +7 authors, Georgia Tourassi.
J Am Med Inform Assoc, 2019 Nov 12; 27(1). PMID: 31710668    Free PMC article.
The Use of Screencasts with Embedded Whole-Slide Scans and Hyperlinks to Teach Anatomic Pathology in a Supervised Digital Environment.
Mary Wong, Joseph Frye, Stacey Kim, Alberto M Marchevsky.
J Pathol Inform, 2019 Jan 05; 9. PMID: 30607306    Free PMC article.
Integration of Cancer Registry Data into the Text Information Extraction System: Leveraging the Structured Data Import Tool.
Faina Linkov, Jonathan C Silverstein, +8 authors, Michael J Becich.
J Pathol Inform, 2019 Jan 22; 9. PMID: 30662793    Free PMC article.
Segmentation of Glomeruli Within Trichrome Images Using Deep Learning.
Shruti Kannan, Laura A Morgan, +9 authors, Vijaya B Kolachalama.
Kidney Int Rep, 2019 Jul 19; 4(7). PMID: 31317118    Free PMC article.
Scalable deep text comprehension for Cancer surveillance on high-performance computing.
John X Qiu, Hong-Jun Yoon, +6 authors, Georgia D Tourassi.
BMC Bioinformatics, 2018 Dec 24; 19(Suppl 18). PMID: 30577743    Free PMC article.
Findings from the 2019 International Medical Informatics Association Yearbook Section on Health Information Management.
Meryl Bloomrosen, Eta S Berner, Section Editors for the IMIA Yearbook Section on Health Information Management.
Yearb Med Inform, 2019 Aug 17; 28(1). PMID: 31419817    Free PMC article.
Systematic Review.
Deep active learning for classifying cancer pathology reports.
Kevin De Angeli, Shang Gao, +9 authors, Georgia Tourassi.
BMC Bioinformatics, 2021 Mar 23; 22(1). PMID: 33750288    Free PMC article.
Deep Learning Approaches Substantially Improve Automated Extraction of Information from Free-Text Medical Reports.
Tiffany Ting Liu.
Radiol Artif Intell, 2019 Aug 07; 1(5). PMID: 33939786    Free PMC article.
Accelerated training of bootstrap aggregation-based deep information extraction systems from cancer pathology reports.
Hong-Jun Yoon, Hilda B Klasky, +10 authors, Georgia D Tourassi.
J Biomed Inform, 2020 Sep 13; 110. PMID: 32919043    Free PMC article.
An unsupervised style normalization method for cytopathology images.
Xihao Chen, Jingya Yu, +7 authors, Shaoqun Zeng.
Comput Struct Biotechnol J, 2021 Jul 22; 19. PMID: 34285783    Free PMC article.
Limitations of Transformers on Clinical Text Classification.
Shang Gao, Mohammed Alawad, +9 authors, Georgia D Tourassi.
IEEE J Biomed Health Inform, 2021 Feb 27; PP. PMID: 33635801    Free PMC article.