Accuracy of ChatGPT, Gemini, Copilot, and Claude to Blepharoplasty-Related Questions.
TL;DR
ChatGPT demonstrated superior performance in both medical accuracy and clinical relevance among evaluated LLMs regarding upper eyelid blepharoplasty, particularly excelling in postoperative monitoring and follow-up categories.
📈 연도별 인용 (2025–2026) · 합계 3
OpenAlex 토픽 ·
Artificial Intelligence in Healthcare and Education
Meta-analysis and systematic reviews
Pain Management and Placebo Effect
ChatGPT demonstrated superior performance in both medical accuracy and clinical relevance among evaluated LLMs regarding upper eyelid blepharoplasty, particularly excelling in postoperative monitoring
APA
Seher Köksaldı, Mustafa Kayabaşı, et al. (2025). Accuracy of ChatGPT, Gemini, Copilot, and Claude to Blepharoplasty-Related Questions.. Aesthetic plastic surgery, 49(17), 4775-4785. https://doi.org/10.1007/s00266-025-05071-9
MLA
Seher Köksaldı, et al.. "Accuracy of ChatGPT, Gemini, Copilot, and Claude to Blepharoplasty-Related Questions.." Aesthetic plastic surgery, vol. 49, no. 17, 2025, pp. 4775-4785.
PMID
40691658
Abstract
[BACKGROUND] This study aimed to evaluate the performance of four large language models (LLMs)-ChatGPT, Gemini, Copilot, and Claude-in responding to upper eyelid blepharoplasty-related questions, focusing on medical accuracy, clinical relevance, response length, and readability.
[METHODS] A set of queries regarding upper eyelid blepharoplasty, covering six categories (anatomy, surgical procedure, additional intraoperative procedures, postoperative monitoring, follow-up, and postoperative complications) were posed to each LLM. An identical prompt establishing clinical context was provided before each question. Responses were evaluated by three ophthalmologists using a 5-point Likert scale for medical accuracy and a 3-point Likert scale for clinical relevance. The length of the responses was assessed. Readability was also evaluated using the Flesch Reading Ease Score, Flesch-Kincaid Grade Level, Coleman-Liau Index, Gunning Fog Index, and Simple Measure of Gobbledygook grade.
[RESULTS] A total of 30 standardized questions were presented to each LLM. None of the responses from any LLM received a score of 1 regarding medical accuracy for any question. ChatGPT achieved an 80% 'highly accurate' response rate, followed by Claude (60%), Gemini (40%), and Copilot (20%). None of the responses from ChatGPT and Claude received a score of 1 regarding clinical relevance, whereas 10% of Gemini's responses and 26.7% of Copilot's responses received a score of 1. ChatGPT also provided the most clinically 'relevant' responses (86.7%), outperforming the other LLMs. Copilot generated the shortest responses, while ChatGPT generated the longest. Readability analyses revealed that all responses required advanced reading skills at a 'college graduate' level or higher, with Copilot's responses being the most complex.
[CONCLUSION] ChatGPT demonstrated superior performance in both medical accuracy and clinical relevance among evaluated LLMs regarding upper eyelid blepharoplasty, particularly excelling in postoperative monitoring and follow-up categories. While all models generated complex texts requiring advanced literacy, ChatGPT's detailed responses offer valuable guidance for ophthalmologists managing upper eyelid blepharoplasty cases.
[LEVEL OF EVIDENCE V] This journal requires that authors assign a level of evidence to each article. For a full description of these Evidence-Based Medicine ratings, please refer to the Table of Contents or the online Instructions to Authors www.springer.com/00266 .
[METHODS] A set of queries regarding upper eyelid blepharoplasty, covering six categories (anatomy, surgical procedure, additional intraoperative procedures, postoperative monitoring, follow-up, and postoperative complications) were posed to each LLM. An identical prompt establishing clinical context was provided before each question. Responses were evaluated by three ophthalmologists using a 5-point Likert scale for medical accuracy and a 3-point Likert scale for clinical relevance. The length of the responses was assessed. Readability was also evaluated using the Flesch Reading Ease Score, Flesch-Kincaid Grade Level, Coleman-Liau Index, Gunning Fog Index, and Simple Measure of Gobbledygook grade.
[RESULTS] A total of 30 standardized questions were presented to each LLM. None of the responses from any LLM received a score of 1 regarding medical accuracy for any question. ChatGPT achieved an 80% 'highly accurate' response rate, followed by Claude (60%), Gemini (40%), and Copilot (20%). None of the responses from ChatGPT and Claude received a score of 1 regarding clinical relevance, whereas 10% of Gemini's responses and 26.7% of Copilot's responses received a score of 1. ChatGPT also provided the most clinically 'relevant' responses (86.7%), outperforming the other LLMs. Copilot generated the shortest responses, while ChatGPT generated the longest. Readability analyses revealed that all responses required advanced reading skills at a 'college graduate' level or higher, with Copilot's responses being the most complex.
[CONCLUSION] ChatGPT demonstrated superior performance in both medical accuracy and clinical relevance among evaluated LLMs regarding upper eyelid blepharoplasty, particularly excelling in postoperative monitoring and follow-up categories. While all models generated complex texts requiring advanced literacy, ChatGPT's detailed responses offer valuable guidance for ophthalmologists managing upper eyelid blepharoplasty cases.
[LEVEL OF EVIDENCE V] This journal requires that authors assign a level of evidence to each article. For a full description of these Evidence-Based Medicine ratings, please refer to the Table of Contents or the online Instructions to Authors www.springer.com/00266 .
추출된 의학 개체 (NER)
| 유형 | 영어 표현 | 한국어 / 풀이 | UMLS CUI | 출처 | 등장 |
|---|---|---|---|---|---|
| 시술 | blepharoplasty
|
안검성형술 | dict | 5 | |
| 해부 | upper eyelid
|
눈꺼풀 | dict | 4 | |
| 합병증 | eyelid
|
scispacy | 1 | ||
| 약물 | ChatGPT
|
scispacy | 1 | ||
| 약물 | Claude-in
|
scispacy | 1 | ||
| 약물 | [RESULTS] A
|
scispacy | 1 | ||
| 약물 | Gemini
|
scispacy | 1 | ||
| 질환 | LLM
|
scispacy | 1 | ||
| 기타 | Gemini
|
scispacy | 1 | ||
| 기타 | ChatGPT
|
scispacy | 1 | ||
| 기타 | Copilot
|
scispacy | 1 |
MeSH Terms
Blepharoplasty; Humans; Surveys and Questionnaires; Female; Male; Language; Comprehension; Generative Artificial Intelligence
🔗 함께 등장하는 도메인
이 논문이 속한 카테고리와 같은 논문에서 자주 함께 다뤄지는 카테고리들
같은 제1저자의 인용 많은 논문 (1)
관련 논문
- Penetrating globe injury following periocular hyaluronic acid filler injection: A case report.
- Implications of Dermatologic Disorders in Facial Cosmetic Surgery: A Systematic Review.
- Mohs Surgery Defect Closure Using Blepharoplasty.
- Are large language models consistent with the ASPS and AAPS guidelines? A comparison of AI chatbot recommendations and plastic surgery clinical guidance.
- Application of the SCIA-Pure Skin Perforator Flap in Bilateral Upper Eyelid Reconstruction: A Case Report and Review of the Literature.