Skip to main content

Development of an algorithm using natural language processing to identify metastatic breast cancer patients from clinical notes.


Background: Determination of the metastatic status of a patient is important for outcomes research and candidacy for clinical trials. Structured data in EMR may not always capture the metastatic status, and it is useful to extract it automatically from physician notes. Contextual understanding of the notes is important to resolve issues such as a) local vs distal metastasis b) statements involving family history of metastasis or physician instructing the patient to look for certain signs of metastasis c) text indicating suspicion of metastasis or absence of metastasis d) indirect utterances, e.g. cancer has spread to the bone. e) corrections to previous findings. Methods: We used a set of 20138 breast cancer patients from Concerto HealthAI real world oncology dataset that includes data from CancerLinQ Discovery to build & validate the set of NLP algorithms. 5300 sentences from 1500 patients were annotated & algorithms manually validated by data abstractors for 500 patients. The algorithms developed were the following: 1) Classification of a sentence into 3 classes: Distal/Local metastasis, Suspicious & Other 2) Classification of a sentence into 2 classes: Distal or Local 3) Classification of a patient into 2 classes: Distal metastasis or not distal metastasis 4) Multi label classification for detecting sites of metastasis. Sentence level algorithms were built using Deep Learning and patient level aggregation of sentence level prediction was done using ML approaches including temporal features. Pretrained ULMFiT model was fine-tuned with Concerto HealthAI’s corpus for sentence classification tasks. Results: At a sentence level, we obtained an accuracy of 0.85 for the distal/local vs suspicious vs irrelevant model and 0.97 for the distal vs not distal metastasis model. Our patient level metrics are shown in the table. The classes used for sites of metastasis are Brain, Bone, Lung, Liver, Distant Lymph nodes & Unknown sites. Subset accuracy (mean fraction of labels which match ) of 0.93 was obtained on the hold out test set at patient level. Conclusions: Metastatic status & site of metastasis can be reliably extracted automatically from clinical notes using deep learning techniques. This information will be valuable for clinical trial matching, outcomes research and other applications.
Distal Metastasis0.890.770.82
No Distal Metastasis0.920.970.94

Information & Authors


Published In

Journal of Clinical Oncology
Pages: e14056


Published in print: May 20, 2020
Published online: May 25, 2020


Request permissions for this article.



Krishna Kumar Swaminathan
Concerto HealthAI, Bangalore, India;
Emma Mendonca
Concerto HealthAI, Bengaluru, India;
Pranay Mukherjee
Concerto HealthAI, Bengaluru, India;
Karpagavalli Thirumalai
Concerto HealthAI, Bangalore, India;
Rachel Newsome
Concerto HealthAI, Boston, MA;
Babu Narayanan
Concerto HealthAI, Bangalore, India;
Concerto HealthAI, Bangalore, India; Concerto HealthAI, Bengaluru, India; Concerto HealthAI, Boston, MA


Funding Information


Metrics & Citations




Article Citation

Download Citation

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

For more information or tips please see 'Downloading to a citation manager' in the Help menu.


Download article citation data for:
Krishna Kumar Swaminathan, Emma Mendonca, Pranay Mukherjee, Karpagavalli Thirumalai, Rachel Newsome, Babu Narayanan
Journal of Clinical Oncology 2020 38:15_suppl, e14056-e14056

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Personal login Institutional Login

Purchase Options

Purchase this article to get full access to it.

Purchase this Article


Subscribe to this Journal
Renew Your Subscription
Become a Member







Share article link