Guideline adherence in surgical decisions for T1 colorectal cancer after endoscopic resection: large language models vs clinicians.
[BACKGROUND] Patients with T1 colorectal cancer (CRC) often show poor adherence to guideline-recommended treatment strategies after endoscopic resection.
APA
Zeng L, Cao Q, et al. (2026). Guideline adherence in surgical decisions for T1 colorectal cancer after endoscopic resection: large language models vs clinicians.. International journal of surgery (London, England), 112(1), 1886-1890. https://doi.org/10.1097/JS9.0000000000003492
MLA
Zeng L, et al.. "Guideline adherence in surgical decisions for T1 colorectal cancer after endoscopic resection: large language models vs clinicians.." International journal of surgery (London, England), vol. 112, no. 1, 2026, pp. 1886-1890.
PMID
40928382
Abstract
[BACKGROUND] Patients with T1 colorectal cancer (CRC) often show poor adherence to guideline-recommended treatment strategies after endoscopic resection. To address this challenge and improve clinical decision-making, this study aims to compare the accuracy of surgical management recommendations between large language models (LLMs) and clinicians.
[METHODS] This retrospective study enrolled 202 patients with T1 CRC who underwent endoscopic resection at three hospitals. We compared the decision-making accuracy of two representative LLMs (ChatGPT-4o and DeepSeek) with that of 29 clinicians in determining whether additional surgery was required after endoscopic resection of T1 CRC. To optimize the inputs for the LLMs, we applied a prompt-engineering strategy that combined role prompting, in-context learning, and few-shot learning.
[RESULTS] In clinical practice, we found that the guideline adherence rate after endoscopic resection for T1 CRC was below 80%. We analyzed the pathology reports of 200 patients with T1 CRC, and the results showed that both ChatGPT-4o and DeepSeek significantly outperformed clinical physicians. Subgroup analyses demonstrated that LLMs outperformed doctors regardless of years of experience or professional background. Additionally, the use of Chinese or English input had no significant impact on the performance of the LLMs.
[CONCLUSION] This study highlights the potential of LLMs to improve guideline adherence in the post-endoscopic resection management of T1 CRC.
[METHODS] This retrospective study enrolled 202 patients with T1 CRC who underwent endoscopic resection at three hospitals. We compared the decision-making accuracy of two representative LLMs (ChatGPT-4o and DeepSeek) with that of 29 clinicians in determining whether additional surgery was required after endoscopic resection of T1 CRC. To optimize the inputs for the LLMs, we applied a prompt-engineering strategy that combined role prompting, in-context learning, and few-shot learning.
[RESULTS] In clinical practice, we found that the guideline adherence rate after endoscopic resection for T1 CRC was below 80%. We analyzed the pathology reports of 200 patients with T1 CRC, and the results showed that both ChatGPT-4o and DeepSeek significantly outperformed clinical physicians. Subgroup analyses demonstrated that LLMs outperformed doctors regardless of years of experience or professional background. Additionally, the use of Chinese or English input had no significant impact on the performance of the LLMs.
[CONCLUSION] This study highlights the potential of LLMs to improve guideline adherence in the post-endoscopic resection management of T1 CRC.
MeSH Terms
Humans; Colorectal Neoplasms; Guideline Adherence; Retrospective Studies; Female; Male; Aged; Middle Aged; Clinical Decision-Making; Aged, 80 and over; Language; Practice Guidelines as Topic; Large Language Models
같은 제1저자의 인용 많은 논문 (5)
- Cadonilimab (PD-1/CTLA-4 bispecific antibody) combination therapy for driver gene-negative advanced NSCLC: a single-center retrospective real-world study.
- Propensity-matched study of liposomal doxorubicin vs. doxorubicin in first-line DLBCL treatment: efficacy and safety.
- NSUN6 deficiency drives immune suppression in pancreatic cancer via the KDM5A-CCL2-macrophage axis.
- Comparative study on the functions of LDHA and LDHC in triple-negative breast cancer.
- Nicotine and tar-multiple targets synergize to alter the immune micro-environment to induce prostate cancer.