Evaluating the Quality and Reliability of Large Language Models for Plastic Surgery Patient Education: A Comparative Analysis of ChatGPT and OpenEvidence.

Aesthetic surgery journal 2026 Vol.46(2) p. 160-167

Perez Rivera LR, Gursky AK, Elmer N, Boyd CJ, Karp NS

Abstract

[BACKGROUND] Concerns regarding information inaccuracy when using general-purpose large language models have prompted the quest for alternative tools. OpenEvidence has emerged as a healthcare-focused large language model trained exclusively on data from peer-reviewed medical literature.

[OBJECTIVES] This study compared the quality, accuracy, and readability of aesthetic surgery patient education materials generated by OpenEvidence and ChatGPT.

[METHODS] A standardized prompt requesting comprehensive postoperative discharge instructions for 20 of the most common aesthetic surgery procedures was entered into OpenEvidence and ChatGPT-5. Outputs were evaluated using 4 validated assessment tools: the DISCERN instrument for information quality (1-5), the Patient Education Materials Assessment Tool for Printable Materials (PEMAT-P) for information understandability and actionability (0-100), the Flesch-Kincaid scale for estimated grade level (fifth grade to professional level) and reading ease (0-100), and a Likert scale for citation accuracy (1-4).

[RESULTS] OpenEvidence scored significantly higher than ChatGPT-5 in DISCERN (3.3 ± 0.4 vs 1.7 ± 0.4, P < .001) and the citation accuracy scale (2.4 ± 1.3 vs 1.5 ± 0.7, P = .007). Scores were comparable among both tools in PEMAT-P understandability (71 ± 5 vs 69 ± 0, P = .3) and actionability (52 ± 12 vs 54 ± 5, P = .6), as well as on the Flesch Kincaid Grade Level (9.3 ± 1.0 vs 9.2 ± 0.6, P = .8) and the Flesch Reading Ease Score (40.0 ± 6.6 vs 41.0 ± 5.5, P = .6).

[CONCLUSIONS] OpenEvidence generated materials of significantly higher quality and reliability than ChatGPT, suggesting it may serve as a more reliable alternative for patient education in aesthetic surgery practice.

추출된 의학 개체 (NER)

유형영어 표현한국어 / 풀이UMLS CUI출처등장
약물 ChatGPT scispacy 1
약물 [BACKGROUND] Concerns scispacy 1
약물 [RESULTS] OpenEvidence scispacy 1
약물 [CONCLUSIONS] OpenEvidence scispacy 1
질환 healthcare-focused scispacy 1
질환 Language scispacy 1
질환 PEMAT-P → Patient Education Materials Assessment Tool for Printable Materials scispacy 1

MeSH Terms

Humans; Patient Education as Topic; Reproducibility of Results; Comprehension; Language; Plastic Surgery Procedures; Surgery, Plastic; Large Language Models; Generative Artificial Intelligence