Generative artificial intelligence for patient education material on gastric cancer prevention.

Rizkala T; Muench N; Hassan C; Dinis-Ribeiro M

doi:10.1055/a-2780-0664

← 뒤로

Generative artificial intelligence for patient education material on gastric cancer prevention.

Endoscopy 2026

Rizkala T, Muench N, Hassan C, Dinis-Ribeiro M

원문 ↗ DOI ↗ BibTeX ↓ RIS ↓

📝 환자 설명용 한 줄

[BACKGROUND] This study assessed the effectiveness of large language models (LLMs) in generating lay summaries for patient education on the management of precancerous lesions and early neoplasia in t

이 논문을 인용하기

BibTeX ↓ RIS ↓

APA Rizkala T, Muench N, et al. (2026). Generative artificial intelligence for patient education material on gastric cancer prevention.. Endoscopy. https://doi.org/10.1055/a-2780-0664

MLA Rizkala T, et al.. "Generative artificial intelligence for patient education material on gastric cancer prevention.." Endoscopy, 2026.

PMID 41688051

DOI 10.1055/a-2780-0664

Abstract

[BACKGROUND] This study assessed the effectiveness of large language models (LLMs) in generating lay summaries for patient education on the management of precancerous lesions and early neoplasia in the stomach.

[METHODS] In this pilot study, we used a two-period, crossover, blinded design to compare a ChatGPT-4o summary versus a Digestive Cancers Europe (DiCE) summary. Two panels rated the materials: expert physicians and DiCE Patient Advisory Committee members. Experts scored accuracy, completeness, comprehensibility, and satisfaction (across five sections); patients rated overall completeness, comprehensibility, and satisfaction. Paired comparisons used mixed-effects estimates. Readability was assessed with Flesch-Kincaid grade level (FKGL) and SMOG index.

[RESULTS] Median expert ratings were similar between materials across metrics. For the overall summary, median (range; IQR) scores were: accuracy 5 (4-6; 1) for ChatGPT-4o vs. 5 (3-6; 1) for DiCE ( = 0.10); completeness 4 (3-5; 1) vs. 4 (2-5; 1; = 0.27); comprehensibility 4 (3-5; 1) vs. 4 (2-5; 1; = 0.33); and satisfaction 4 (2-5; 1) vs. 3 (1-5; 2; = 0.53). Patient ratings mirrored experts, with very similar results. Readability failed to meet guideline recommendations for both summaries on both FKGL and SMOG scores.

[CONCLUSION] ChatGPT-4o produced patient materials comparable to DiCE, but both require readability optimization; a human-in-the-loop workflow and future tests across prompts and models are warranted.