St. Gallen International Breast Cancer Consensus-Based Clinical Decision Validation: Concordance Assessment Between Deep Large Language Model Outputs and Global Expert Panel Recommendations.

Yi Pan; Chenglong Duan; Jinsui Du; Jianing Zhang; Keyuan Du; Chenrong Zhang; Zhihao Liu; Wei Zhang; Bin Wang; Yu Ren; Zhao Sun; Lizhe Zhu

doi:10.1245/s10434-026-19176-1

← 뒤로

St. Gallen International Breast Cancer Consensus-Based Clinical Decision Validation: Concordance Assessment Between Deep Large Language Model Outputs and Global Expert Panel Recommendations.

Annals of surgical oncology 2026 Vol.33(5) p. 4518-4529 🌐 cited 1 🔓 OA Radiomics and Machine Learning in Me

TL;DR DeepSeek models showed moderate concordance in following the consensus of breast cancer expert panel and showed significant advantages in answer robustness, suggesting that DeepSeek has great application potential in the field of clinical decision-making for breast cancer.

OpenAlex 토픽 · Radiomics and Machine Learning in Medical Imaging Explainable Artificial Intelligence (XAI) Artificial Intelligence in Healthcare and Education

Pan Y, Duan C, Du J, Zhang J, Du K, Zhang C, Liu Z, Zhang W, Wang B, Ren Y, Sun Z, Zhu L

PMC 전문 ↗ 원문 ↗ DOI ↗ BibTeX ↓ RIS ↓

📝 환자 설명용 한 줄

DeepSeek models showed moderate concordance in following the consensus of breast cancer expert panel and showed significant advantages in answer robustness, suggesting that DeepSeek has great applicat

🔬 핵심 임상 통계 (초록에서 자동 추출 — 원문 검증 권장)

p-value p < 0.001
p-value p = 0.005

이 논문을 인용하기

BibTeX ↓ RIS ↓

APA Yi Pan, Chenglong Duan, et al. (2026). St. Gallen International Breast Cancer Consensus-Based Clinical Decision Validation: Concordance Assessment Between Deep Large Language Model Outputs and Global Expert Panel Recommendations.. Annals of surgical oncology, 33(5), 4518-4529. https://doi.org/10.1245/s10434-026-19176-1

MLA Yi Pan, et al.. "St. Gallen International Breast Cancer Consensus-Based Clinical Decision Validation: Concordance Assessment Between Deep Large Language Model Outputs and Global Expert Panel Recommendations.." Annals of surgical oncology, vol. 33, no. 5, 2026, pp. 4518-4529.

PMID 41667891

DOI 10.1245/s10434-026-19176-1

Abstract

[BACKGROUND] The newly developed large language model (LLM) DeepSeek has shown potential for application in other medical fields. However, few systematic studies have assessed its concordance with international expert consensus or compared its performance with leading models such as Gemini 2.0 Pro and ChatGPT-4o in breast cancer.

[MATERIALS AND METHODS] A total of 139 consensus questions from the 19th St. Gallen International Breast Cancer Conference (SG-BCC) were included into analysis. Each model was trained to answer each consensus question five times. The DeepSeek model was compared with the expert panel consensus in terms of concordance rate, robustness of the answers, Pearson correlation coefficient r for non-binary questions, and absolute proportion difference for binary questions. At the same time, a horizontal comparison was made with the previous LLMs Gemini 2.0 Pro and ChatGPT-4o.

[RESULTS] The overall concordance rate between DeepSeek-V3 and the expert panel consensus was 63.31%, and the average answer robustness (i.e., its self-consistency across repeated queries) of DeepSeek-V3 was 86.69%. In addition, DeepSeek-V3 performed similarly to Gemini 2.0 Pro and ChatGPT-4o in terms of concordance rate of the most frequent answers (p = 0.849). In terms of model robustness, there were significant statistical differences among the models (p < 0.001), with DeepSeek-V3 significantly outperforming Gemini 2.0 Pro (p = 0.005) and ChatGPT-4o (p < 0.001).

[CONCLUSIONS] DeepSeek models showed moderate concordance in following the consensus of breast cancer expert panel and showed significant advantages in answer robustness, suggesting that DeepSeek has great application potential in the field of clinical decision-making for breast cancer.

MeSH Terms

Humans; Breast Neoplasms; Female; Consensus; Clinical Decision-Making; Deep Learning; Practice Guidelines as Topic; Large Language Models

같은 제1저자의 인용 많은 논문 (5)

Pathologic response and nodal status guide adjuvant immunotherapy in non-small cell lung cancer after neoadjuvant chemoimmunotherapy: An eastern Asian cohort study.
The Journal of thoracic and cardiovascular surgery 2026
A screening strategy based on machine learning for diagnostic biomarkers in small cell lung cancer.
PloS one 2026
Multimodal treatment of radiation-associated laryngeal angiosarcoma: A case report and literature review.
Medicine 2026
p38 inhibition restores chemosensitivity of tumor cells by disrupting oligomerized breast cancer resistance protein membrane trafficking.
iScience 2026
Inhibition of glycosphingolipid synthesis overcomes the steric hindrance of CD30 N-glycans to augment CD30-targeted immunotherapeutic efficacy.
Cellular & molecular immunology 2026