Staging Prostate Cancer with AI: A Comparative Study of Large Language Models and Expert Interpretation on PSMA PET-CT Reports.

Ismayilov R; Aktas A; Gencoglu EA; Oguz A; Altundag O; Akcali Z

doi:10.1007/s11307-025-02072-7

← 뒤로

Staging Prostate Cancer with AI: A Comparative Study of Large Language Models and Expert Interpretation on PSMA PET-CT Reports.

Molecular imaging and biology 2026 Vol.28(1) p. 93-105

Ismayilov R, Aktas A, Gencoglu EA, Oguz A, Altundag O, Akcali Z

원문 ↗ DOI ↗ BibTeX ↓ RIS ↓

📝 환자 설명용 한 줄

[PURPOSE] Accurate staging of prostate cancer is essential for therapeutic decision-making.

🔬 핵심 임상 통계 (초록에서 자동 추출 — 원문 검증 권장)

95% CI 86.0-97.9

이 논문을 인용하기

BibTeX ↓ RIS ↓

APA Ismayilov R, Aktas A, et al. (2026). Staging Prostate Cancer with AI: A Comparative Study of Large Language Models and Expert Interpretation on PSMA PET-CT Reports.. Molecular imaging and biology, 28(1), 93-105. https://doi.org/10.1007/s11307-025-02072-7

MLA Ismayilov R, et al.. "Staging Prostate Cancer with AI: A Comparative Study of Large Language Models and Expert Interpretation on PSMA PET-CT Reports.." Molecular imaging and biology, vol. 28, no. 1, 2026, pp. 93-105.

PMID 41350971

DOI 10.1007/s11307-025-02072-7

Abstract

[PURPOSE] Accurate staging of prostate cancer is essential for therapeutic decision-making. While PSMA PET-CT reports offer rich clinical data, their unstructured format hinders large-scale analysis. Recent advances in large language models (LLMs) offer new opportunities to extract structured information from narrative radiology reports. However, their ability to perform multi-step clinical reasoning, particularly for cancer staging, remains underexplored.

[METHODS] In this feasibility study, 80 anonymized, Turkish-language PSMA PET-CT reports were independently interpreted by two LLMs-Gemini 2.5 Pro (Google) and ChatGPT 4o (OpenAI). Using a structured prompt containing an embedded knowledge base (AJCC/CHAARTED criteria) and few-shot examples, both LLMs generated classifications for T, N, M, and overall clinical stage/disease volume. Outputs were benchmarked against expert classifications by a senior nuclear medicine specialist. Performance was evaluated using accuracy, precision, recall, F1-score, and Cohen's kappa.

[RESULTS] For the composite task of classifying clinical stage and disease volume, Gemini 2.5 Pro achieved an accuracy of 93.8% (95% CI: 86.0-97.9) and a Cohen's kappa of 0.910 (95% CI: 0.834-0.986), while ChatGPT 4o achieved 91.3% accuracy (95% CI: 82.8-96.4) with a kappa of 0.874 (95% CI: 0.786-0.962). For T staging, Gemini showed a higher accuracy point estimate (95.0% [95% CI: 87.7-98.6] vs. 91.3% [95% CI: 82.8-96.4]), while both models excelled at the binary N and M classifications, achieving accuracies above 95% and kappa values indicating near-perfect agreement (κ > 0.900).

[CONCLUSIONS] LLMs, when guided by expert-informed prompt engineering, can accurately stage prostate cancer from free-text PSMA PET-CT reports and may serve as a powerful assistive tool for data automation, research acceleration, and quality assurance.

MeSH Terms

Humans; Male; Prostatic Neoplasms; Positron Emission Tomography Computed Tomography; Neoplasm Staging; Artificial Intelligence; Glutamate Carboxypeptidase II; Language; Aged; Antigens, Surface; Large Language Models

같은 제1저자의 인용 많은 논문 (4)

Methodological insights regarding the prognostic value of lncRNA PGM5P4-AS1 in breast cancer.
Cancer biology & therapy 2026
Beyond linearity and static risk: re-evaluating core prognostic factors in diffuse large B-cell lymphoma.
Expert review of hematology 2026
Artificial intelligence for immunotherapy response assessment in lung cancer using PET/CT reports.
Japanese journal of radiology 2025
First-line immunotherapy-based regimens for metastatic non-small cell lung cancer: A network meta-analysis of landmark trials.
Tuberkuloz ve toraks 2025