Evaluation of ChatGPT-4o, Claude 3.5 Sonnet, and Google Gemini 2.0 Flash as Patient Education Resources for Upper Blepharoplasty Patients.

The Journal of craniofacial surgery 2025 Vol.36(8) p. e1261-e1264

Demir S, Türkeş İC

관련 도메인

Abstract

[PURPOSE] This study aimed to evaluate the effectiveness, accuracy, and readability of the leading large language models (LLMs) from 3 different companies, ChatGPT-4o, Claude 3.5 Sonnet, and Google Gemini 2.0 Flash, as patient education resources for upper blepharoplasty.

[METHODS] Twenty frequently asked questions about upper blepharoplasty were posed to the 3 LLMs. Two ophthalmologists recorded the responses to the questions and independently evaluated the accuracy of the LLMs using a 5-point Likert scale with scores ranging from 1 to 5. The readability of the analyzed texts was assessed using the SMOG index and the Coleman-Liau index.

[RESULTS] All models demonstrated high accuracy, with mean Likert scores exceeding 4.5. No statistically significant difference in Likert scores was observed among the 3 models ( P =0.097). Claude 3.5 Sonnet generated the most complex responses (Coleman-Liau index: 17.34; SMOG index: 23.82 points), whereas Google Gemini 2.0 Flash produced the most comprehensible texts (Coleman-Liau index: 13.27; SMOG index: 15.04 points).

[CONCLUSION] Large language models hold great promise as tools to educate patients about upper blepharoplasty. Future research should focus on simplifying language models without compromising accuracy, keeping models up-to-date, and minimizing bias to improve patient care and safety.

추출된 의학 개체 (NER)

유형영어 표현한국어 / 풀이UMLS CUI출처등장
시술 upper blepharoplasty 안검성형술 dict 4
약물 Gemini scispacy 1
약물 [CONCLUSION] Large scispacy 1

MeSH Terms

Humans; Blepharoplasty; Patient Education as Topic; Comprehension; Internet; Generative Artificial Intelligence

🔗 함께 등장하는 도메인

이 논문이 속한 카테고리와 같은 논문에서 자주 함께 다뤄지는 카테고리들

관련 논문