본문으로 건너뛰기
← 뒤로

Generative artificial intelligence for patient education material on gastric cancer prevention.

Endoscopy 2026

Rizkala T, Muench N, Hassan C, Dinis-Ribeiro M

📝 환자 설명용 한 줄

[BACKGROUND]  This study assessed the effectiveness of large language models (LLMs) in generating lay summaries for patient education on the management of precancerous lesions and early neoplasia in t

이 논문을 인용하기

BibTeX ↓ RIS ↓
APA Rizkala T, Muench N, et al. (2026). Generative artificial intelligence for patient education material on gastric cancer prevention.. Endoscopy. https://doi.org/10.1055/a-2780-0664
MLA Rizkala T, et al.. "Generative artificial intelligence for patient education material on gastric cancer prevention.." Endoscopy, 2026.
PMID 41688051
DOI 10.1055/a-2780-0664

Abstract

[BACKGROUND]  This study assessed the effectiveness of large language models (LLMs) in generating lay summaries for patient education on the management of precancerous lesions and early neoplasia in the stomach.

[METHODS]  In this pilot study, we used a two-period, crossover, blinded design to compare a ChatGPT-4o summary versus a Digestive Cancers Europe (DiCE) summary. Two panels rated the materials: expert physicians and DiCE Patient Advisory Committee members. Experts scored accuracy, completeness, comprehensibility, and satisfaction (across five sections); patients rated overall completeness, comprehensibility, and satisfaction. Paired comparisons used mixed-effects estimates. Readability was assessed with Flesch-Kincaid grade level (FKGL) and SMOG index.

[RESULTS]  Median expert ratings were similar between materials across metrics. For the overall summary, median (range; IQR) scores were: accuracy 5 (4-6; 1) for ChatGPT-4o vs. 5 (3-6; 1) for DiCE ( = 0.10); completeness 4 (3-5; 1) vs. 4 (2-5; 1;  = 0.27); comprehensibility 4 (3-5; 1) vs. 4 (2-5; 1;  = 0.33); and satisfaction 4 (2-5; 1) vs. 3 (1-5; 2;  = 0.53). Patient ratings mirrored experts, with very similar results. Readability failed to meet guideline recommendations for both summaries on both FKGL and SMOG scores.

[CONCLUSION]  ChatGPT-4o produced patient materials comparable to DiCE, but both require readability optimization; a human-in-the-loop workflow and future tests across prompts and models are warranted.