Limitations of Large Language Models in Assisting PI-RADS Scoring on Prostate Biparametric MRI Text Reports.

Siying Zhang; Zhenping Wu; Mingyang Guo; Chang Liu; Mingyong Cui; Shaojun Yang; Feng Chen

doi:10.1016/j.acra.2025.12.020

← 뒤로

Limitations of Large Language Models in Assisting PI-RADS Scoring on Prostate Biparametric MRI Text Reports.

2/5 보강

Academic radiology 📖 저널 OA 6.4% 2023~2026 2026 Vol.33(4) p. 1565-1576 Prostate Cancer Diagnosis and Treatm

TL;DR While LLMs demonstrated high sensitivity in detecting PCa and csPCa, they had significant limitations in specificity and PPV, particularly in the transition and peripheral zones, and the superior clinical utility of the PI-RADS ≥4 threshold was confirmed.

PICO 자동 추출 (휴리스틱, conf 3/4)

유사 논문

P · Population 대상 환자/모집단

210 patients who underwent transperineal cognitive fusion-targeted biopsy for clinically suspected prostate cancer between December 2024 and July 2025.

I · Intervention 중재 / 시술

transperineal cognitive fusion-targeted biopsy for clinically suspected prostate cancer between December 2024 and July 2025

C · Comparison 대조 / 비교

추출되지 않음

O · Outcome 결과 / 결론

Experienced radiologists achieved better diagnostic performance, highlighting the need for cautious clinical application of LLMs. Future research should focus on optimizing LLMs to improve specificity and reliability, and combining them with human radiologists' expertise to enhance diagnostic accuracy and efficiency.

OpenAlex 토픽 · Prostate Cancer Diagnosis and Treatment Artificial Intelligence in Healthcare and Education Machine Learning in Healthcare

Zhang S, Wu Z, Guo M, Liu C, Cui M, Yang S

PubMed ↗ DOI ↗ BibTeX ↓ RIS ↓

📝 환자 설명용 한 줄

While LLMs demonstrated high sensitivity in detecting PCa and csPCa, they had significant limitations in specificity and PPV, particularly in the transition and peripheral zones, and the superior clin

🔬 핵심 임상 통계 (초록에서 자동 추출 — 원문 검증 권장)

p-value P<0.001
95% CI 14.89-1000.00
OR 109.49

이 논문을 인용하기

↓ .bib ↓ .ris

APA Siying Zhang, Zhenping Wu, et al. (2026). Limitations of Large Language Models in Assisting PI-RADS Scoring on Prostate Biparametric MRI Text Reports.. Academic radiology, 33(4), 1565-1576. https://doi.org/10.1016/j.acra.2025.12.020

MLA Siying Zhang, et al.. "Limitations of Large Language Models in Assisting PI-RADS Scoring on Prostate Biparametric MRI Text Reports.." Academic radiology, vol. 33, no. 4, 2026, pp. 1565-1576.

PMID 41521112 ↗

DOI 10.1016/j.acra.2025.12.020

Abstract

[BACKGROUND] Prostate cancer (PCa) is a significant global health challenge, and the prostate imaging reporting and data system (PI-RADS) is crucial for risk stratification using MRI. However, inter-reader variability, especially in the transition zone and among practitioners with differing experience levels, compromises diagnostic consistency. Large language models (LLMs) show potential in medical image analysis, particularly in standardizing reports to improve diagnostic consistency and efficiency.

[OBJECTIVE] To evaluate the performance of LLMs in assisting PI-RADS scoring based on biparametric MRI text reports and compare them with radiologists of varying experience levels. Additionally, to identify independent predictors of PCa and csPCa using multivariable logistic regression analysis.

[METHODS] This retrospective single-center study included 210 patients who underwent transperineal cognitive fusion-targeted biopsy for clinically suspected prostate cancer between December 2024 and July 2025. Three radiologists and two LLMs (DeepSeek and ChatGPT-4.1) independently reviewed anonymized reports and assigned PI-RADS v2.1 scores. Diagnostic performance was assessed using biopsy pathological results as the gold standard. Sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and area under the receiver operating characteristic curve (AUC) were calculated at both lesion-level (PI-RADS ≥3 as positive) and participant-level (PI-RADS ≥3 and ≥4 as positive thresholds). Decision curve analysis was performed to evaluate clinical utility. Subgroup analyses were conducted based on lesion location (peripheral zone vs. transition zone). Multivariable logistic regression analysis identified independent predictors of PCa and csPCa.

[RESULTS] The senior radiologist demonstrated the highest diagnostic performance, with AUC values of 0.847 for PCa and 0.859 for csPCa. The attending physician achieved perfect sensitivity but had the lowest specificity and PPV. The resident physician had comparable sensitivity but lower specificity and PPV, resulting in the lowest AUC values. Both LLMs exhibited high sensitivity but extremely low specificity, leading to lower PPV than human readers. DeepSeek outperformed ChatGPT-4.1 in AUC but still fell short of the senior radiologist's performance. In region-specific analyses, the senior radiologist significantly outperformed LLMs in the transition zone, while LLMs showed high sensitivity but low specificity in the peripheral zone. At the participant level, raising the threshold to PI-RADS ≥4 substantially improved specificity for all readers. Decision curve analysis confirmed the superior clinical utility of the PI-RADS ≥4 threshold, with the senior radiologist's ratings achieving the highest net benefit. Multivariable logistic regression analysis identified PSA density as the strongest independent predictor for both PCa (OR = 109.49, 95% CI: 14.89-1000.00, P<0.001) and csPCa (OR = 152.16, 95% CI: 21.06-1000.00, P<0.001). Among all PI-RADS ratings, only the senior radiologist's scores retained independent predictive value for both PCa (OR = 17.94, P<0.001) and csPCa (OR = 22.69, P = 0.001).

[CONCLUSION] While LLMs demonstrated high sensitivity in detecting PCa and csPCa, they had significant limitations in specificity and PPV, particularly in the transition and peripheral zones. The optimal utilization strategy involves deploying LLMs as adjuncts for indeterminate cases or when using higher diagnostic thresholds (PI-RADS ≥4). Experienced radiologists achieved better diagnostic performance, highlighting the need for cautious clinical application of LLMs. Future research should focus on optimizing LLMs to improve specificity and reliability, and combining them with human radiologists' expertise to enhance diagnostic accuracy and efficiency.

🏷️ 키워드 / MeSH 📖 같은 키워드 OA만

같은 제1저자의 인용 많은 논문 (5)

An Easy and Cost-Effective Method to Perform the "No-Touch" Technique in Saline Breast Augmentation.
Aesthetic surgery journal 2015 cited 1
Establishing a knowledge-based planning model for left-sided breast cancer patients receiving hypofractionated postmastectomy and regional nodal irradiation.
Medical dosimetry : official journal of the American Association of Medical Dosimetrists 2026
Copper-enriched zinc peroxides induced cuproptosis through concurrent metabolic and oxidative dysregulation for boosting immunotherapy in colorectal cancer.
Materials today. Bio 2026
Identifying Low-Risk Patients with Cirrhosis and Acute Gastrointestinal Bleeding That May Not Require Urgent Endoscopy.
Advances in therapy 2026
Esophageal cancer: from pathogenesis to precision therapies.
Signal transduction and targeted therapy 2026

🏷️ 같은 키워드 · 무료전문 — 이 논문 MeSH/keyword 기반

A Phase I Study of Hydroxychloroquine and Suba-Itraconazole in Men with Biochemical Relapse of Prostate Cancer (HITMAN-PC): Dose Escalation Results.
Cancer research communications 2026 Talmor B 외 📖 OA
Self-management of male urinary symptoms: qualitative findings from a primary care trial.
The British journal of general practice : the journal of the Royal College of General Practitioners 2026 Wheeler JR 외 📖 OA
Clinical and Liquid Biomarkers of 20-Year Prostate Cancer Risk in Men Aged 45 to 70 Years.
JAMA network open 2026 Lindholz M 외 📖 OA
Diagnostic accuracy of Ga-PSMA PET/CT versus multiparametric MRI for preoperative pelvic invasion in the patients with prostate cancer.
Science progress 2026 Qin Z 외 📖 OA
Clinical Presentation and Outcomes of Patients Undergoing Surgery for Thyroid Cancer.
Journal of the College of Physicians and Surgeons--Pakistan : JCPSP 2026 Khan MMU 외 📖 OA
Association of patient health education with the postoperative health related quality of life in low- intermediate recurrence risk differentiated thyroid cancer patients.
Scientific reports 2026 Li S 외 📖 OA

이 논문을 인용하기

Abstract 한글 요약

🏷️ 키워드 / MeSH 📖 같은 키워드 OA만

같은 제1저자의 인용 많은 논문 (5)

🏷️ 같은 키워드 · 무료전문 — 이 논문 MeSH/keyword 기반

Abstract