Staging Prostate Cancer with AI: A Comparative Study of Large Language Models and Expert Interpretation on PSMA PET-CT Reports.
[PURPOSE] Accurate staging of prostate cancer is essential for therapeutic decision-making.
- 95% CI 86.0-97.9
APA
Ismayilov R, Aktas A, et al. (2026). Staging Prostate Cancer with AI: A Comparative Study of Large Language Models and Expert Interpretation on PSMA PET-CT Reports.. Molecular imaging and biology, 28(1), 93-105. https://doi.org/10.1007/s11307-025-02072-7
MLA
Ismayilov R, et al.. "Staging Prostate Cancer with AI: A Comparative Study of Large Language Models and Expert Interpretation on PSMA PET-CT Reports.." Molecular imaging and biology, vol. 28, no. 1, 2026, pp. 93-105.
PMID
41350971
Abstract
[PURPOSE] Accurate staging of prostate cancer is essential for therapeutic decision-making. While PSMA PET-CT reports offer rich clinical data, their unstructured format hinders large-scale analysis. Recent advances in large language models (LLMs) offer new opportunities to extract structured information from narrative radiology reports. However, their ability to perform multi-step clinical reasoning, particularly for cancer staging, remains underexplored.
[METHODS] In this feasibility study, 80 anonymized, Turkish-language PSMA PET-CT reports were independently interpreted by two LLMs-Gemini 2.5 Pro (Google) and ChatGPT 4o (OpenAI). Using a structured prompt containing an embedded knowledge base (AJCC/CHAARTED criteria) and few-shot examples, both LLMs generated classifications for T, N, M, and overall clinical stage/disease volume. Outputs were benchmarked against expert classifications by a senior nuclear medicine specialist. Performance was evaluated using accuracy, precision, recall, F1-score, and Cohen's kappa.
[RESULTS] For the composite task of classifying clinical stage and disease volume, Gemini 2.5 Pro achieved an accuracy of 93.8% (95% CI: 86.0-97.9) and a Cohen's kappa of 0.910 (95% CI: 0.834-0.986), while ChatGPT 4o achieved 91.3% accuracy (95% CI: 82.8-96.4) with a kappa of 0.874 (95% CI: 0.786-0.962). For T staging, Gemini showed a higher accuracy point estimate (95.0% [95% CI: 87.7-98.6] vs. 91.3% [95% CI: 82.8-96.4]), while both models excelled at the binary N and M classifications, achieving accuracies above 95% and kappa values indicating near-perfect agreement (κ > 0.900).
[CONCLUSIONS] LLMs, when guided by expert-informed prompt engineering, can accurately stage prostate cancer from free-text PSMA PET-CT reports and may serve as a powerful assistive tool for data automation, research acceleration, and quality assurance.
[METHODS] In this feasibility study, 80 anonymized, Turkish-language PSMA PET-CT reports were independently interpreted by two LLMs-Gemini 2.5 Pro (Google) and ChatGPT 4o (OpenAI). Using a structured prompt containing an embedded knowledge base (AJCC/CHAARTED criteria) and few-shot examples, both LLMs generated classifications for T, N, M, and overall clinical stage/disease volume. Outputs were benchmarked against expert classifications by a senior nuclear medicine specialist. Performance was evaluated using accuracy, precision, recall, F1-score, and Cohen's kappa.
[RESULTS] For the composite task of classifying clinical stage and disease volume, Gemini 2.5 Pro achieved an accuracy of 93.8% (95% CI: 86.0-97.9) and a Cohen's kappa of 0.910 (95% CI: 0.834-0.986), while ChatGPT 4o achieved 91.3% accuracy (95% CI: 82.8-96.4) with a kappa of 0.874 (95% CI: 0.786-0.962). For T staging, Gemini showed a higher accuracy point estimate (95.0% [95% CI: 87.7-98.6] vs. 91.3% [95% CI: 82.8-96.4]), while both models excelled at the binary N and M classifications, achieving accuracies above 95% and kappa values indicating near-perfect agreement (κ > 0.900).
[CONCLUSIONS] LLMs, when guided by expert-informed prompt engineering, can accurately stage prostate cancer from free-text PSMA PET-CT reports and may serve as a powerful assistive tool for data automation, research acceleration, and quality assurance.
MeSH Terms
Humans; Male; Prostatic Neoplasms; Positron Emission Tomography Computed Tomography; Neoplasm Staging; Artificial Intelligence; Glutamate Carboxypeptidase II; Language; Aged; Antigens, Surface; Large Language Models
같은 제1저자의 인용 많은 논문 (4)
- Methodological insights regarding the prognostic value of lncRNA PGM5P4-AS1 in breast cancer.
- Beyond linearity and static risk: re-evaluating core prognostic factors in diffuse large B-cell lymphoma.
- Artificial intelligence for immunotherapy response assessment in lung cancer using PET/CT reports.
- First-line immunotherapy-based regimens for metastatic non-small cell lung cancer: A network meta-analysis of landmark trials.