DeepSeek-R1 vs. OpenAI o1 in colorectal cancer screening: a binational evaluation.

Chen Y; Deng S; Hakenberg P; Li C; Zhan T; Fan D; Lin X; Li X; Yang C; Hu J

doi:10.1186/s12876-026-04810-9

← 뒤로

DeepSeek-R1 vs. OpenAI o1 in colorectal cancer screening: a binational evaluation.

1/5 보강

BMC gastroenterology 2026 Vol.26(1)

Chen Y, Deng S, Hakenberg P, Li C, Zhan T, Fan D, Lin X, Li X, Yang C, Hu J

PMC 전문 ↗ 원문 ↗ DOI ↗ BibTeX ↓ RIS ↓

📝 환자 설명용 한 줄

[BACKGROUND] Large language models (LLMs), such as OpenAI o1 and DeepSeek-R1, demonstrate promising applications in healthcare through structured reasoning and decision support.

이 논문을 인용하기

BibTeX ↓ RIS ↓

APA Chen Y, Deng S, et al. (2026). DeepSeek-R1 vs. OpenAI o1 in colorectal cancer screening: a binational evaluation.. BMC gastroenterology, 26(1). https://doi.org/10.1186/s12876-026-04810-9

MLA Chen Y, et al.. "DeepSeek-R1 vs. OpenAI o1 in colorectal cancer screening: a binational evaluation.." BMC gastroenterology, vol. 26, no. 1, 2026.

PMID 41957577

DOI 10.1186/s12876-026-04810-9

Abstract

[BACKGROUND] Large language models (LLMs), such as OpenAI o1 and DeepSeek-R1, demonstrate promising applications in healthcare through structured reasoning and decision support. This study evaluates the responses and chain-of-thought (CoT) outputs of OpenAI o1and DeepSeek-R1 in answering questions about colorectal cancer (CRC) screening.

[METHODS] Fifteen questions about CRC screening were posed to OpenAI o1 and DeepSeek-R1. Four experts rated the responses for accuracy and comprehensiveness and three further experts evaluated the CoT reasoning output for logical-coherence and error-types and handling, using the National Comprehensive Cancer Network (NCCN) guidelines as the primary reference standard.

[RESULTS] Both LLMs demonstrated high accuracy without significant differences (median accuracy scores: OpenAI o1 = 4.5, DeepSeek-R1 = 5; = 0.5243). However, DeepSeek-R1 significantly outperformed OpenAI o1 in comprehensiveness ( < 0.0001), logical coherence ( = 0.0001), and error types and handling ( = 0.0149). DeepSeek-R1 generated more detailed responses (word count: 110 ± 40 vs. 57 ± 24, = 0.0001), with longer response times (25 ± 10s vs. 7 ± 4s, < 0.0001).

[CONCLUSION] DeepSeek-R1 and OpenAI o1 both offer high accuracy for CRC screening guidance, with DeepSeek-R1 providing more comprehensive responses with logically more coherent, and robust error-handling reasoning process, compared with OpenAI o1. Context-specific evaluation is critical for practical clinical integration.

[SUPPLEMENTARY INFORMATION] The online version contains supplementary material available at 10.1186/s12876-026-04810-9.

같은 제1저자의 인용 많은 논문 (5)

A New Algorithm for Secondary Repair of Unilateral Cleft Lip Nasal Deformity.
The Laryngoscope 2024 cited 1
Machine-Learning Prediction of Capsular Contraction after Two-Stage Breast Reconstruction.
JPRAS open 2023 cited 1
DIAPH3 is a multifaceted prognostic biomarker that links immunotherapy response to tumor microenvironment in prostate cancer.
Discover oncology 2026
The Exosome-Lactate-Lactylation Axis: A Metabolic-Epigenetic Circuit Driving Tumor Immune Evasion.
Comprehensive Physiology 2026
LncRNAs: key regulators and molecular mechanisms in lung cancer radiosensitivity.
Open medicine (Warsaw, Poland) 2026