본문으로 건너뛰기
← 뒤로

DeepSeek-R1 vs. OpenAI o1 in colorectal cancer screening: a binational evaluation.

1/5 보강
BMC gastroenterology 2026 Vol.26(1)
Retraction 확인
출처

Chen Y, Deng S, Hakenberg P, Li C, Zhan T, Fan D, Lin X, Li X, Yang C, Hu J

📝 환자 설명용 한 줄

[BACKGROUND] Large language models (LLMs), such as OpenAI o1 and DeepSeek-R1, demonstrate promising applications in healthcare through structured reasoning and decision support.

이 논문을 인용하기

BibTeX ↓ RIS ↓
APA Chen Y, Deng S, et al. (2026). DeepSeek-R1 vs. OpenAI o1 in colorectal cancer screening: a binational evaluation.. BMC gastroenterology, 26(1). https://doi.org/10.1186/s12876-026-04810-9
MLA Chen Y, et al.. "DeepSeek-R1 vs. OpenAI o1 in colorectal cancer screening: a binational evaluation.." BMC gastroenterology, vol. 26, no. 1, 2026.
PMID 41957577

Abstract

[BACKGROUND] Large language models (LLMs), such as OpenAI o1 and DeepSeek-R1, demonstrate promising applications in healthcare through structured reasoning and decision support. This study evaluates the responses and chain-of-thought (CoT) outputs of OpenAI o1and DeepSeek-R1 in answering questions about colorectal cancer (CRC) screening.

[METHODS] Fifteen questions about CRC screening were posed to OpenAI o1 and DeepSeek-R1. Four experts rated the responses for accuracy and comprehensiveness and three further experts evaluated the CoT reasoning output for logical-coherence and error-types and handling, using the National Comprehensive Cancer Network (NCCN) guidelines as the primary reference standard.

[RESULTS] Both LLMs demonstrated high accuracy without significant differences (median accuracy scores: OpenAI o1 = 4.5, DeepSeek-R1 = 5;  = 0.5243). However, DeepSeek-R1 significantly outperformed OpenAI o1 in comprehensiveness ( < 0.0001), logical coherence ( = 0.0001), and error types and handling ( = 0.0149). DeepSeek-R1 generated more detailed responses (word count: 110 ± 40 vs. 57 ± 24,  = 0.0001), with longer response times (25 ± 10s vs. 7 ± 4s,  < 0.0001).

[CONCLUSION] DeepSeek-R1 and OpenAI o1 both offer high accuracy for CRC screening guidance, with DeepSeek-R1 providing more comprehensive responses with logically more coherent, and robust error-handling reasoning process, compared with OpenAI o1. Context-specific evaluation is critical for practical clinical integration.

[SUPPLEMENTARY INFORMATION] The online version contains supplementary material available at 10.1186/s12876-026-04810-9.

같은 제1저자의 인용 많은 논문 (5)