DeepSeek-R1 vs. OpenAI o1 in colorectal cancer screening: a binational evaluation.
1/5 보강
[BACKGROUND] Large language models (LLMs), such as OpenAI o1 and DeepSeek-R1, demonstrate promising applications in healthcare through structured reasoning and decision support.
APA
Chen Y, Deng S, et al. (2026). DeepSeek-R1 vs. OpenAI o1 in colorectal cancer screening: a binational evaluation.. BMC gastroenterology, 26(1). https://doi.org/10.1186/s12876-026-04810-9
MLA
Chen Y, et al.. "DeepSeek-R1 vs. OpenAI o1 in colorectal cancer screening: a binational evaluation.." BMC gastroenterology, vol. 26, no. 1, 2026.
PMID
41957577
Abstract
[BACKGROUND] Large language models (LLMs), such as OpenAI o1 and DeepSeek-R1, demonstrate promising applications in healthcare through structured reasoning and decision support. This study evaluates the responses and chain-of-thought (CoT) outputs of OpenAI o1and DeepSeek-R1 in answering questions about colorectal cancer (CRC) screening.
[METHODS] Fifteen questions about CRC screening were posed to OpenAI o1 and DeepSeek-R1. Four experts rated the responses for accuracy and comprehensiveness and three further experts evaluated the CoT reasoning output for logical-coherence and error-types and handling, using the National Comprehensive Cancer Network (NCCN) guidelines as the primary reference standard.
[RESULTS] Both LLMs demonstrated high accuracy without significant differences (median accuracy scores: OpenAI o1 = 4.5, DeepSeek-R1 = 5; = 0.5243). However, DeepSeek-R1 significantly outperformed OpenAI o1 in comprehensiveness ( < 0.0001), logical coherence ( = 0.0001), and error types and handling ( = 0.0149). DeepSeek-R1 generated more detailed responses (word count: 110 ± 40 vs. 57 ± 24, = 0.0001), with longer response times (25 ± 10s vs. 7 ± 4s, < 0.0001).
[CONCLUSION] DeepSeek-R1 and OpenAI o1 both offer high accuracy for CRC screening guidance, with DeepSeek-R1 providing more comprehensive responses with logically more coherent, and robust error-handling reasoning process, compared with OpenAI o1. Context-specific evaluation is critical for practical clinical integration.
[SUPPLEMENTARY INFORMATION] The online version contains supplementary material available at 10.1186/s12876-026-04810-9.
[METHODS] Fifteen questions about CRC screening were posed to OpenAI o1 and DeepSeek-R1. Four experts rated the responses for accuracy and comprehensiveness and three further experts evaluated the CoT reasoning output for logical-coherence and error-types and handling, using the National Comprehensive Cancer Network (NCCN) guidelines as the primary reference standard.
[RESULTS] Both LLMs demonstrated high accuracy without significant differences (median accuracy scores: OpenAI o1 = 4.5, DeepSeek-R1 = 5; = 0.5243). However, DeepSeek-R1 significantly outperformed OpenAI o1 in comprehensiveness ( < 0.0001), logical coherence ( = 0.0001), and error types and handling ( = 0.0149). DeepSeek-R1 generated more detailed responses (word count: 110 ± 40 vs. 57 ± 24, = 0.0001), with longer response times (25 ± 10s vs. 7 ± 4s, < 0.0001).
[CONCLUSION] DeepSeek-R1 and OpenAI o1 both offer high accuracy for CRC screening guidance, with DeepSeek-R1 providing more comprehensive responses with logically more coherent, and robust error-handling reasoning process, compared with OpenAI o1. Context-specific evaluation is critical for practical clinical integration.
[SUPPLEMENTARY INFORMATION] The online version contains supplementary material available at 10.1186/s12876-026-04810-9.
같은 제1저자의 인용 많은 논문 (5)
- A New Algorithm for Secondary Repair of Unilateral Cleft Lip Nasal Deformity.
- Machine-Learning Prediction of Capsular Contraction after Two-Stage Breast Reconstruction.
- DIAPH3 is a multifaceted prognostic biomarker that links immunotherapy response to tumor microenvironment in prostate cancer.
- The Exosome-Lactate-Lactylation Axis: A Metabolic-Epigenetic Circuit Driving Tumor Immune Evasion.
- LncRNAs: key regulators and molecular mechanisms in lung cancer radiosensitivity.