The In-depth Comparative Analysis of Four Large Language AI Models for Risk Assessment and Information Retrieval from Multi-Modality Prostate Cancer Work-up Reports.

Yuan LH; Huang SW; Chou D; Tsai CY

doi:10.5534/wjmh.240173

← 뒤로

The In-depth Comparative Analysis of Four Large Language AI Models for Risk Assessment and Information Retrieval from Multi-Modality Prostate Cancer Work-up Reports.

1/5 보강

The world journal of men's health 📖 저널 OA 66.7% 2021~2026 2025 Vol.43(4) p. 918-933

PICO 자동 추출 (휴리스틱, conf 2/4)

유사 논문

P · Population 대상 환자/모집단

환자: simulated reports, 115 (32

I · Intervention 중재 / 시술

추출되지 않음

C · Comparison 대조 / 비교

추출되지 않음

O · Outcome 결과 / 결론

However, the risks of misinterpretation impacting decision-making cannot be overlooked. Further research is necessary to validate these findings in other cancers.

Yuan LH, Huang SW, Chou D, Tsai CY

📖 무료 전문 🟢 PMC 전문 PMC12505481

PubMed ↗ DOI ↗ BibTeX ↓ RIS ↓

📝 환자 설명용 한 줄

[PURPOSE] Information retrieval (IR) and risk assessment (RA) from multi-modality imaging and pathology reports are critical to prostate cancer (PC) treatment.

이 논문을 인용하기

↓ .bib ↓ .ris

APA Yuan LH, Huang SW, et al. (2025). The In-depth Comparative Analysis of Four Large Language AI Models for Risk Assessment and Information Retrieval from Multi-Modality Prostate Cancer Work-up Reports.. The world journal of men's health, 43(4), 918-933. https://doi.org/10.5534/wjmh.240173

MLA Yuan LH, et al.. "The In-depth Comparative Analysis of Four Large Language AI Models for Risk Assessment and Information Retrieval from Multi-Modality Prostate Cancer Work-up Reports.." The world journal of men's health, vol. 43, no. 4, 2025, pp. 918-933.

PMID 39743220 ↗

DOI 10.5534/wjmh.240173

Abstract

[PURPOSE] Information retrieval (IR) and risk assessment (RA) from multi-modality imaging and pathology reports are critical to prostate cancer (PC) treatment. This study aims to evaluate the performance of four general-purpose large language model (LLMs) in IR and RA tasks.

[MATERIALS AND METHODS] We conducted a study using simulated text reports from computed tomography, magnetic resonance imaging, bone scans, and biopsy pathology on stage IV PC patients. We assessed four LLMs (ChatGPT-4-turbo, Claude-3-opus, Gemini-Pro-1.0, ChatGPT-3.5-turbo) on three RA tasks (LATITUDE, CHAARTED, TwNHI) and seven IR tasks. It included TNM staging, and the detection and quantification of bone and visceral metastases, providing a broad evaluation of their capabilities in handling diverse clinical data. We queried LLMs with multi-modality reports using zero-shot chain-of-thought prompting via application programming interface. With three adjudicators' consensus as the gold standard, these models' performances were assessed through repeated single-round queries and ensemble voting methods, using 6 outcome metrics.

[RESULTS] Among 350 stage IV PC patients with simulated reports, 115 (32.9%), 128 (36.6%), and 94 (26.9%) belonged to LATITUDE, CHAARTED, and TwNHI high-risk, respectively. Ensemble voting, based on three repeated single-round queries, consistently enhances accuracy with a higher likelihood of achieving non-inferior results compared to a single query. Four models showed minimal differences in IR tasks with high accuracy (87.4%-94.2%) and consistency (ICC>0.8) in TNM staging. However, there were significant differences in RA performance, with the ranking as follows: ChatGPT-4-turbo, Claude-3-opus, Gemini-Pro-1.0, and ChatGPT-3.5-turbo, respectively. ChatGPT-4-turbo achieved the highest accuracy (90.1%, 90.7%,91.6%), and consistency (ICC 0.86, 0.93, 0.76) across 3 RA tasks.

[CONCLUSIONS] ChatGPT-4-turbo demonstrated satisfactory accuracy and outcomes in RA and IR for stage IV PC, suggesting its potential for clinical decision support. However, the risks of misinterpretation impacting decision-making cannot be overlooked. Further research is necessary to validate these findings in other cancers.

🏷️ 키워드 / MeSH 📖 같은 키워드 OA만

🏷️ 같은 키워드 · 무료전문 — 이 논문 MeSH/keyword 기반

Comparison of artificial intelligence and multidisciplinary team recommendations in the management of colorectal cancer liver metastases.
Scientific reports 2026 Yılmaz M 외 📖 OA
Why aren't they used? Systematic review of barriers to implementation of clinical decision support systems for early cancer detection in primary care.
The British journal of general practice : the journal of the Royal College of General Practitioners 2026 Derksen C 외 📖 OA
Evaluation of Artificial Intelligence as a Decision-Support Tool in Urological Tumor Boards: A Study in Real Clinical Practice.
Journal of clinical medicine 2026 De la Torre-Trillo J 외 📖 OA
Comparative informative capacity of artificial intelligence (AI)-powered chatbots in colorectal cancer: ChatGPT-4 versus DeepSeek.
Digital health 2026 Kızıltoprak N 외 📖 OA
Enhancement of Patient-Centered Lung Cancer Screening: The MyLungHealth Randomized Clinical Trial.
JAMA oncology 2026 Kukhareva PV 외 📖 OA
Performance of latest AI models, RAG, and MCP on lung cancer-related questions.
Digital health 2026 Zhao X 외 📖 OA

이 논문을 인용하기

Abstract 한글 요약

🏷️ 키워드 / MeSH 📖 같은 키워드 OA만

🏷️ 같은 키워드 · 무료전문 — 이 논문 MeSH/keyword 기반

Abstract