Artificial intelligence for immunotherapy response assessment in lung cancer using PET/CT reports.
[BACKGROUND] Accurate and timely assessment of immunotherapy response is vital for optimizing lung cancer management.
APA
Ismayilov R, Altundag O, et al. (2025). Artificial intelligence for immunotherapy response assessment in lung cancer using PET/CT reports.. Japanese journal of radiology, 43(12), 2042-2050. https://doi.org/10.1007/s11604-025-01840-3
MLA
Ismayilov R, et al.. "Artificial intelligence for immunotherapy response assessment in lung cancer using PET/CT reports.." Japanese journal of radiology, vol. 43, no. 12, 2025, pp. 2042-2050.
PMID
41091339
Abstract
[BACKGROUND] Accurate and timely assessment of immunotherapy response is vital for optimizing lung cancer management. This study evaluates the efficacy of large language models (LLMs) in automating response assessment using positron emission tomography/computed tomography (PET/CT) reports based on the European Organization for Research and Treatment of Cancer (EORTC) criteria.
[METHODS] An effective prompting strategy was developed using Google Gemini 2.5 Pro Experimental 03-25, with explicit instructions for applying EORTC criteria via few-shot prompting. This prompt was then tested with both Gemini 2.5 Pro and OpenAI ChatGPT 4o to assess cross-model performance. Pre- and post-immunotherapy PET-CT reports in text format from 36 lung cancer patients were independently classified by the LLMs and an experienced nuclear medicine specialist. Performance metrics, including precision, recall, F1-score, and support, were calculated for each response category. Inter-rater agreement was assessed using Cohen's Kappa.
[RESULTS] The nuclear medicine specialist classified 5, 21, 6, and 4 reports as complete metabolic response (CMR), progressive metabolic disease (PMD), partial metabolic response (PMR), and stable metabolic disease (SMD), respectively, while Gemini 2.5 Pro classified 4, 21, 8, and 3 of them. Gemini achieved an overall accuracy of 94% and demonstrated strong agreement with the expert (overall Cohen's Kappa: 0.907). F1-scores were 0.86 for PMR and SMD, 0.89 for CMR, and 1.00 for PMD, with per-label Kappa scores ranging from 0.824 (PMR) to 1.00 (PMD). In comparison, ChatGPT 4o achieved perfect agreement with the expert across all 36 cases (accuracy = 100%, Cohen's Kappa = 1.000).
[CONCLUSIONS] When guided by a structured and task-specific prompt, both Gemini 2.5 Pro and ChatGPT 4o demonstrated strong capability for automating accurate immunotherapy response assessment in lung cancer using PET-CT reports. These results underscore the potential of LLMs to streamline clinical workflows and improve efficiency. Validation with larger data sets is warranted to support clinical implementation.
[METHODS] An effective prompting strategy was developed using Google Gemini 2.5 Pro Experimental 03-25, with explicit instructions for applying EORTC criteria via few-shot prompting. This prompt was then tested with both Gemini 2.5 Pro and OpenAI ChatGPT 4o to assess cross-model performance. Pre- and post-immunotherapy PET-CT reports in text format from 36 lung cancer patients were independently classified by the LLMs and an experienced nuclear medicine specialist. Performance metrics, including precision, recall, F1-score, and support, were calculated for each response category. Inter-rater agreement was assessed using Cohen's Kappa.
[RESULTS] The nuclear medicine specialist classified 5, 21, 6, and 4 reports as complete metabolic response (CMR), progressive metabolic disease (PMD), partial metabolic response (PMR), and stable metabolic disease (SMD), respectively, while Gemini 2.5 Pro classified 4, 21, 8, and 3 of them. Gemini achieved an overall accuracy of 94% and demonstrated strong agreement with the expert (overall Cohen's Kappa: 0.907). F1-scores were 0.86 for PMR and SMD, 0.89 for CMR, and 1.00 for PMD, with per-label Kappa scores ranging from 0.824 (PMR) to 1.00 (PMD). In comparison, ChatGPT 4o achieved perfect agreement with the expert across all 36 cases (accuracy = 100%, Cohen's Kappa = 1.000).
[CONCLUSIONS] When guided by a structured and task-specific prompt, both Gemini 2.5 Pro and ChatGPT 4o demonstrated strong capability for automating accurate immunotherapy response assessment in lung cancer using PET-CT reports. These results underscore the potential of LLMs to streamline clinical workflows and improve efficiency. Validation with larger data sets is warranted to support clinical implementation.
MeSH Terms
Humans; Positron Emission Tomography Computed Tomography; Lung Neoplasms; Immunotherapy; Artificial Intelligence; Male; Female; Aged; Middle Aged; Treatment Outcome
같은 제1저자의 인용 많은 논문 (4)
- Methodological insights regarding the prognostic value of lncRNA PGM5P4-AS1 in breast cancer.
- Beyond linearity and static risk: re-evaluating core prognostic factors in diffuse large B-cell lymphoma.
- Staging Prostate Cancer with AI: A Comparative Study of Large Language Models and Expert Interpretation on PSMA PET-CT Reports.
- First-line immunotherapy-based regimens for metastatic non-small cell lung cancer: A network meta-analysis of landmark trials.