Assessing Large Language Models for Oncology Data Inference From Radiology Reports.

Chen LC; Zack T; Demirci A; Sushil M; Miao B; Kasap C; Butte A; Collisson EA; Hong JC

doi:10.1200/CCI.24.00126

← 뒤로

Assessing Large Language Models for Oncology Data Inference From Radiology Reports.

JCO clinical cancer informatics 2024 Vol.8() p. e2400126

Chen LC, Zack T, Demirci A, Sushil M, Miao B, Kasap C, Butte A, Collisson EA, Hong JC

원문 ↗ DOI ↗ BibTeX ↓ RIS ↓

📝 환자 설명용 한 줄

이 논문을 인용하기

BibTeX ↓ RIS ↓

APA Chen LC, Zack T, et al. (2024). Assessing Large Language Models for Oncology Data Inference From Radiology Reports.. JCO clinical cancer informatics, 8, e2400126. https://doi.org/10.1200/CCI.24.00126

MLA Chen LC, et al.. "Assessing Large Language Models for Oncology Data Inference From Radiology Reports.." JCO clinical cancer informatics, vol. 8, 2024, pp. e2400126.

PMID 39661914

DOI 10.1200/CCI.24.00126

Abstract

[PURPOSE] We examined the effectiveness of proprietary and open large language models (LLMs) in detecting disease presence, location, and treatment response in pancreatic cancer from radiology reports.

[METHODS] We analyzed 203 deidentified radiology reports, manually annotated for disease status, location, and indeterminate nodules needing follow-up. Using generative pre-trained transformer (GPT)-4, GPT-3.5-turbo, and open models such as Gemma-7B and Llama3-8B, we employed strategies such as ablation and prompt engineering to boost accuracy. Discrepancies between human and model interpretations were reviewed by a secondary oncologist.

[RESULTS] Among 164 patients with pancreatic tumor, GPT-4 showed the highest accuracy in inferring disease status, achieving a 75.5% correctness (F1-micro). Open models Mistral-7B and Llama3-8B performed comparably, with accuracies of 68.6% and 61.4%, respectively. Mistral-7B excelled in deriving correct inferences from objective findings directly. Most tested models demonstrated proficiency in identifying disease containing anatomic locations from a list of choices, with GPT-4 and Llama3-8B showing near-parity in precision and recall for disease site identification. However, open models struggled with differentiating benign from malignant postsurgical changes, affecting their precision in identifying findings indeterminate for cancer. A secondary review occasionally favored GPT-3.5's interpretations, indicating the variability in human judgment.

[CONCLUSION] LLMs, especially GPT-4, are proficient in deriving oncologic insights from radiology reports. Their performance is enhanced by effective summarization strategies, demonstrating their potential in clinical support and health care analytics. This study also underscores the possibility of zero-shot open model utility in environments where proprietary models are restricted. Finally, by providing a set of annotated radiology reports, this paper presents a valuable data set for further LLM research in oncology.

MeSH Terms

Humans; Natural Language Processing; Pancreatic Neoplasms; Radiology; Electronic Health Records; Algorithms

같은 제1저자의 인용 많은 논문 (5)

Breast Neurotization: Techniques, Outcomes, and Future Directions.
Seminars in plastic surgery 2026
Extraperitoneal single-port robot-assisted radical prostatectomy using da Vinci Xi platform: learning curve and outcome analysis.
Asian journal of andrology 2026
Exploring diarylheptanoid derivatives to target LIMK1 as potential agents against colorectal cancer.
Journal of enzyme inhibition and medicinal chemistry 2025
Current management of refractory overactive bladder.
Lower urinary tract symptoms 2020
Pathophysiology of refractory overactive bladder.
Lower urinary tract symptoms 2019