Evaluation of ChatGPT-4o, Claude 3.5 Sonnet, and Google Gemini 2.0 Flash as Patient Education Resources for Upper Blepharoplasty Patients.
Abstract
[PURPOSE] This study aimed to evaluate the effectiveness, accuracy, and readability of the leading large language models (LLMs) from 3 different companies, ChatGPT-4o, Claude 3.5 Sonnet, and Google Gemini 2.0 Flash, as patient education resources for upper blepharoplasty.
[METHODS] Twenty frequently asked questions about upper blepharoplasty were posed to the 3 LLMs. Two ophthalmologists recorded the responses to the questions and independently evaluated the accuracy of the LLMs using a 5-point Likert scale with scores ranging from 1 to 5. The readability of the analyzed texts was assessed using the SMOG index and the Coleman-Liau index.
[RESULTS] All models demonstrated high accuracy, with mean Likert scores exceeding 4.5. No statistically significant difference in Likert scores was observed among the 3 models ( P =0.097). Claude 3.5 Sonnet generated the most complex responses (Coleman-Liau index: 17.34; SMOG index: 23.82 points), whereas Google Gemini 2.0 Flash produced the most comprehensible texts (Coleman-Liau index: 13.27; SMOG index: 15.04 points).
[CONCLUSION] Large language models hold great promise as tools to educate patients about upper blepharoplasty. Future research should focus on simplifying language models without compromising accuracy, keeping models up-to-date, and minimizing bias to improve patient care and safety.
[METHODS] Twenty frequently asked questions about upper blepharoplasty were posed to the 3 LLMs. Two ophthalmologists recorded the responses to the questions and independently evaluated the accuracy of the LLMs using a 5-point Likert scale with scores ranging from 1 to 5. The readability of the analyzed texts was assessed using the SMOG index and the Coleman-Liau index.
[RESULTS] All models demonstrated high accuracy, with mean Likert scores exceeding 4.5. No statistically significant difference in Likert scores was observed among the 3 models ( P =0.097). Claude 3.5 Sonnet generated the most complex responses (Coleman-Liau index: 17.34; SMOG index: 23.82 points), whereas Google Gemini 2.0 Flash produced the most comprehensible texts (Coleman-Liau index: 13.27; SMOG index: 15.04 points).
[CONCLUSION] Large language models hold great promise as tools to educate patients about upper blepharoplasty. Future research should focus on simplifying language models without compromising accuracy, keeping models up-to-date, and minimizing bias to improve patient care and safety.
추출된 의학 개체 (NER)
| 유형 | 영어 표현 | 한국어 / 풀이 | UMLS CUI | 출처 | 등장 |
|---|---|---|---|---|---|
| 시술 | upper blepharoplasty
|
안검성형술 | dict | 4 | |
| 약물 | Gemini
|
scispacy | 1 | ||
| 약물 | [CONCLUSION] Large
|
scispacy | 1 |
MeSH Terms
Humans; Blepharoplasty; Patient Education as Topic; Comprehension; Internet; Generative Artificial Intelligence
🔗 함께 등장하는 도메인
이 논문이 속한 카테고리와 같은 논문에서 자주 함께 다뤄지는 카테고리들
관련 논문
- Implications of Dermatologic Disorders in Facial Cosmetic Surgery: A Systematic Review.
- Mohs Surgery Defect Closure Using Blepharoplasty.
- Red and near-infrared photobiomodulation for burn, hypertrophic, and post-surgical scars: a scoping review of clinical trials.
- Correction of tear trough deformity in young patients without eyebags using orbital fat reposition and release of tear trough ligament.
- The Efficacy of Lower Blepharoplasty With Subobicularis Oculi Fat Lift and Fat Pad Transposition in Middle-Aged Patients.