Evaluating the Clinical Competence of Large Language Models in Prostate Cancer Management: A Comparative Study of DeepSeek-R1 and ChatGPT.

Li R; Zhao A; Peng L; Shi H; Zhao J; Li Z; Liang R; Wang H

doi:10.1245/s10434-025-18492-2

← 뒤로

Evaluating the Clinical Competence of Large Language Models in Prostate Cancer Management: A Comparative Study of DeepSeek-R1 and ChatGPT.

1/5 보강

Annals of surgical oncology 📖 저널 OA 24.7% 2021~2026 2026 Vol.33(2) p. 1858-1869

Li R, Zhao A, Peng L, Shi H, Zhao J, Li Z

PubMed ↗ DOI ↗ BibTeX ↓ RIS ↓

📝 환자 설명용 한 줄

[BACKGROUND] Large language models (LLMs) have gained prominence in medical applications, yet their performance in specialized clinical tasks remains underexplored.

🔬 핵심 임상 통계 (초록에서 자동 추출 — 원문 검증 권장)

p-value p < 0.05

이 논문을 인용하기

↓ .bib ↓ .ris

APA Li R, Zhao A, et al. (2026). Evaluating the Clinical Competence of Large Language Models in Prostate Cancer Management: A Comparative Study of DeepSeek-R1 and ChatGPT.. Annals of surgical oncology, 33(2), 1858-1869. https://doi.org/10.1245/s10434-025-18492-2

MLA Li R, et al.. "Evaluating the Clinical Competence of Large Language Models in Prostate Cancer Management: A Comparative Study of DeepSeek-R1 and ChatGPT.." Annals of surgical oncology, vol. 33, no. 2, 2026, pp. 1858-1869.

PMID 41094286 ↗

DOI 10.1245/s10434-025-18492-2

Abstract

[BACKGROUND] Large language models (LLMs) have gained prominence in medical applications, yet their performance in specialized clinical tasks remains underexplored. Prostate cancer, a complex malignancy requiring guideline-based management, presents a rigorous testbed for evaluating artificial intelligence (AI)-assisted decision-making. This study compared the clinical accuracy, reasoning ability, and language quality of DeepSeek-R1 and ChatGPT variants in addressing prostate cancer diagnosis and treatment.

[METHODS] A dataset of 98 prostate cancer multiple-choice questions from MedQA, MedMCQA, and China's National Medical Licensing Examination was constructed, alongside three real-world clinical cases. Responses were generated by five LLMs (DeepSeek-V3, DeepSeek-R1, ChatGPT-4o, -o3, -o4-mini) and evaluated for accuracy across three repeated runs. For case-based simulations, only R1 and o3 were compared with practicing urologists. A Clinical Decision Quality Assessment Scale (CDQAS) assessed outputs across four domains: readability, medical knowledge accuracy, diagnostic test appropriateness, and logical coherence. Blinded scoring was performed by senior urologic oncologists. Statistical analyses used one-way ANOVA with GraphPad Prism v10.1.2, Boston, Massachusetts, USA.

[RESULTS] DeepSeek-R1 achieved the highest accuracy (96.60 %) on multiple-choice tasks, significantly outperforming the other models (p < 0.05 to <0.0001). In simulated case evaluations, both R1 and o3 performed comparably with physicians in overall readability and diagnostic appropriateness. Whereas R1 demonstrated superior guideline compliance and evidence-based reasoning, o3 showed advantages in workflow clarity, sequencing, and response fluency. However, o3 generated fewer explicit errors than R1. Human clinicians maintained strengths in terminology precision and logical reasoning.

[CONCLUSION] DeepSeek-R1 and ChatGPT-o3 exhibit complementary strengths in prostate cancer clinical decision-making, with R1 favoring factual accuracy and o3 excelling in expressive clarity. Although both models approach human-level performance in structured evaluations, human oversight and continued domain-specific optimization remain essential for their safe and effective integration into clinical workflows.

🏷️ 키워드 / MeSH 📖 같은 키워드 OA만

같은 제1저자의 인용 많은 논문 (5)

A brief review and case report of pheochromocytoma misdiagnosed as allergic vasculitis with bilateral lower extremity ulcers: a 24-year clinical course.
Frontiers in endocrinology 2026
From numerical amplification to functional metamorphosis: the MDSC-driven therapeutic resistance in tumor.
International journal of surgery (London, England) 2026
Symptom Clusters in Children With Leukemia Receiving Chemotherapy: A Scoping Review.
Cancer nursing 2026
Ginkgetin inhibits non-small cell lung cancer via the HSP90-AKT signaling pathway.
Naunyn-Schmiedeberg's archives of pharmacology 2026
Global trends and inequities in childhood cancer burden from 1990 to 2021, with projections to 2040: a Global Burden of Disease study.
International journal of surgery (London, England) 2026

🏷️ 같은 키워드 · 무료전문 — 이 논문 MeSH/keyword 기반

A Phase I Study of Hydroxychloroquine and Suba-Itraconazole in Men with Biochemical Relapse of Prostate Cancer (HITMAN-PC): Dose Escalation Results.
Cancer research communications 2026 Talmor B 외 📖 unpaywall
Self-management of male urinary symptoms: qualitative findings from a primary care trial.
The British journal of general practice : the journal of the Royal College of General Practitioners 2026 Wheeler JR 외 📖 unpaywall
Clinical and Liquid Biomarkers of 20-Year Prostate Cancer Risk in Men Aged 45 to 70 Years.
JAMA network open 2026 Lindholz M 외 📖 unpaywall
Diagnostic accuracy of Ga-PSMA PET/CT versus multiparametric MRI for preoperative pelvic invasion in the patients with prostate cancer.
Science progress 2026 Qin Z 외 📖 unpaywall
Comprehensive analysis of androgen receptor splice variant target gene expression in prostate cancer.
Biochimica et biophysica acta. Molecular cell research 2026 Wüstmann N 외 📖 unpaywall
Clinical Presentation and Outcomes of Patients Undergoing Surgery for Thyroid Cancer.
Journal of the College of Physicians and Surgeons--Pakistan : JCPSP 2026 Khan MMU 외 📖 unpaywall

이 논문을 인용하기

Abstract 한글 요약

🏷️ 키워드 / MeSH 📖 같은 키워드 OA만

같은 제1저자의 인용 많은 논문 (5)

🏷️ 같은 키워드 · 무료전문 — 이 논문 MeSH/keyword 기반

Abstract