Clinical utility of large language models in metastatic prostate cancer: A multicenter expert validation for decision support.
2/5 보강
TL;DR
Current LLMs can support mPCa care as drafting assistants, but ∼15%-21% of outputs breached safety thresholds under a strict gate, precluding unsupervised use at initial treatment-planning encounters.
OpenAlex 토픽 ·
Artificial Intelligence in Healthcare and Education
Topic Modeling
Machine Learning in Healthcare
ℹ️ 이 논문은 무료 전문이 아직 없습니다. 코퍼스 전체의 43.9%는 무료 가능 (통계 →) · 🏥 기관 EZproxy로 시도
Current LLMs can support mPCa care as drafting assistants, but ∼15%-21% of outputs breached safety thresholds under a strict gate, precluding unsupervised use at initial treatment-planning encounters.
APA
Yiqun Chen, Kai He, et al. (2026). Clinical utility of large language models in metastatic prostate cancer: A multicenter expert validation for decision support.. European journal of cancer (Oxford, England : 1990), 238, 116667. https://doi.org/10.1016/j.ejca.2026.116667
MLA
Yiqun Chen, et al.. "Clinical utility of large language models in metastatic prostate cancer: A multicenter expert validation for decision support.." European journal of cancer (Oxford, England : 1990), vol. 238, 2026, pp. 116667.
PMID
41831267 ↗
Abstract 한글 요약
[BACKGROUND] Initial systemic treatment planning for metastatic prostate cancer (mPCa) requires rapid synthesis of heterogeneous clinical and biomarker information. Large language models (LLMs) could assist clinicians, but their safety and acceptability in this high-stakes setting remain uncertain.
[METHODS] We conducted a multicenter retrospective evaluation of 238 consecutive mPCa cases from three tertiary centers (2018-2025). Five contemporary LLMs were tested via publicly available web interfaces under a locked, zero-shot prompting protocol to generate a clinical summary, a first-line systemic treatment recommendation, and a rationale. Outputs underwent two-stage assessment: (1) multidisciplinary team (MDT) binary safety adjudication using a one-strike gate with a prespecified taxonomy of critical errors; unsafe outputs were assigned a Likert score of 1 for all domains; (2) three senior medical oncologists independently rated safety-passed outputs on 5-point Likert scales for summary accuracy, guideline-concordant and patient-tailored recommendations, and rationale quality. Paired ordinal outcomes were analyzed with Friedman tests and Holm-adjusted post hoc comparisons, and binary safety outcomes with Cochran's Q and McNemar tests.
[RESULTS] Safety rates ranged from 79.0% to 84.9%. Among safety-passed outputs, mean utility scores (5-point Likert) were in the low-to-mid 4 range. Between-model differences were most apparent for summarization, whereas treatment recommendations and rationales showed modest separation after multiplicity adjustment. Failures clustered in hard cases with incomplete documentation and were dominated by missingness-related extraction errors, disease-state/pathway errors, guideline logic deviations, and safety-check omissions.
[CONCLUSIONS] Current LLMs can support mPCa care as drafting assistants, but ∼15%-21% of outputs breached safety thresholds under a strict gate, precluding unsupervised use at initial treatment-planning encounters.
[METHODS] We conducted a multicenter retrospective evaluation of 238 consecutive mPCa cases from three tertiary centers (2018-2025). Five contemporary LLMs were tested via publicly available web interfaces under a locked, zero-shot prompting protocol to generate a clinical summary, a first-line systemic treatment recommendation, and a rationale. Outputs underwent two-stage assessment: (1) multidisciplinary team (MDT) binary safety adjudication using a one-strike gate with a prespecified taxonomy of critical errors; unsafe outputs were assigned a Likert score of 1 for all domains; (2) three senior medical oncologists independently rated safety-passed outputs on 5-point Likert scales for summary accuracy, guideline-concordant and patient-tailored recommendations, and rationale quality. Paired ordinal outcomes were analyzed with Friedman tests and Holm-adjusted post hoc comparisons, and binary safety outcomes with Cochran's Q and McNemar tests.
[RESULTS] Safety rates ranged from 79.0% to 84.9%. Among safety-passed outputs, mean utility scores (5-point Likert) were in the low-to-mid 4 range. Between-model differences were most apparent for summarization, whereas treatment recommendations and rationales showed modest separation after multiplicity adjustment. Failures clustered in hard cases with incomplete documentation and were dominated by missingness-related extraction errors, disease-state/pathway errors, guideline logic deviations, and safety-check omissions.
[CONCLUSIONS] Current LLMs can support mPCa care as drafting assistants, but ∼15%-21% of outputs breached safety thresholds under a strict gate, precluding unsupervised use at initial treatment-planning encounters.
🏷️ 키워드 / MeSH 📖 같은 키워드 OA만
- Humans
- Male
- Prostatic Neoplasms
- Retrospective Studies
- Aged
- Middle Aged
- Decision Support Techniques
- Neoplasm Metastasis
- Decision Support Systems
- Clinical
- Clinical Decision-Making
- Large Language Models
- clinical decision support
- expert validation
- guideline concordance
- large language models
- metastatic prostate cancer
- real-world evaluation
- safety gating
같은 제1저자의 인용 많은 논문 (5)
- A New Algorithm for Secondary Repair of Unilateral Cleft Lip Nasal Deformity.
- Machine-Learning Prediction of Capsular Contraction after Two-Stage Breast Reconstruction.
- DIAPH3 is a multifaceted prognostic biomarker that links immunotherapy response to tumor microenvironment in prostate cancer.
- The Exosome-Lactate-Lactylation Axis: A Metabolic-Epigenetic Circuit Driving Tumor Immune Evasion.
- LncRNAs: key regulators and molecular mechanisms in lung cancer radiosensitivity.
🏷️ 같은 키워드 · 무료전문 — 이 논문 MeSH/keyword 기반
- A Phase I Study of Hydroxychloroquine and Suba-Itraconazole in Men with Biochemical Relapse of Prostate Cancer (HITMAN-PC): Dose Escalation Results.
- Self-management of male urinary symptoms: qualitative findings from a primary care trial.
- Clinical and Liquid Biomarkers of 20-Year Prostate Cancer Risk in Men Aged 45 to 70 Years.
- Diagnostic accuracy of Ga-PSMA PET/CT versus multiparametric MRI for preoperative pelvic invasion in the patients with prostate cancer.
- Association of patient health education with the postoperative health related quality of life in low- intermediate recurrence risk differentiated thyroid cancer patients.
- Early local immune activation following intra-operative radiotherapy in human breast tissue.