Background: Determination of the metastatic status of a patient is important for outcomes research and candidacy for clinical trials. Structured data in EMR may not always capture the metastatic status, and it is useful to extract it automatically from physician notes. Contextual understanding of the notes is important to resolve issues such as a) local vs distal metastasis b) statements involving family history of metastasis or physician instructing the patient to look for certain signs of metastasis c) text indicating suspicion of metastasis or absence of metastasis d) indirect utterances, e.g. cancer has spread to the bone. e) corrections to previous findings. Methods: We used a set of 20138 breast cancer patients from Concerto HealthAI real world oncology dataset that includes data from CancerLinQ Discovery to build & validate the set of NLP algorithms. 5300 sentences from 1500 patients were annotated & algorithms manually validated by data abstractors for 500 patients. The algorithms developed were the following: 1) Classification of a sentence into 3 classes: Distal/Local metastasis, Suspicious & Other 2) Classification of a sentence into 2 classes: Distal or Local 3) Classification of a patient into 2 classes: Distal metastasis or not distal metastasis 4) Multi label classification for detecting sites of metastasis. Sentence level algorithms were built using Deep Learning and patient level aggregation of sentence level prediction was done using ML approaches including temporal features. Pretrained ULMFiT model was fine-tuned with Concerto HealthAI’s corpus for sentence classification tasks. Results: At a sentence level, we obtained an accuracy of 0.85 for the distal/local vs suspicious vs irrelevant model and 0.97 for the distal vs not distal metastasis model. Our patient level metrics are shown in the table. The classes used for sites of metastasis are Brain, Bone, Lung, Liver, Distant Lymph nodes & Unknown sites. Subset accuracy (mean fraction of labels which match ) of 0.93 was obtained on the hold out test set at patient level. Conclusions: Metastatic status & site of metastasis can be reliably extracted automatically from clinical notes using deep learning techniques. This information will be valuable for clinical trial matching, outcomes research and other applications.
If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.
For more information or tips please see 'Downloading to a citation manager' in the Help menu.