본문으로 건너뛰기
← 뒤로

Accuracy of ChatGPT, Gemini, Copilot, and Claude to Blepharoplasty-Related Questions.

Aesthetic plastic surgery 2025 Vol.49(17) p. 4775-4785 🌐 cited 3 Artificial Intelligence in Healthcar
TL;DR ChatGPT demonstrated superior performance in both medical accuracy and clinical relevance among evaluated LLMs regarding upper eyelid blepharoplasty, particularly excelling in postoperative monitoring and follow-up categories.
📈 연도별 인용 (2025–2026) · 합계 3
OpenAlex 토픽 · Artificial Intelligence in Healthcare and Education Meta-analysis and systematic reviews Pain Management and Placebo Effect

Köksaldı S, Kayabaşı M, Durmaz Engin C, Grzybowski A

관련 도메인

📝 환자 설명용 한 줄

ChatGPT demonstrated superior performance in both medical accuracy and clinical relevance among evaluated LLMs regarding upper eyelid blepharoplasty, particularly excelling in postoperative monitoring

이 논문을 인용하기

BibTeX ↓ RIS ↓
APA Seher Köksaldı, Mustafa Kayabaşı, et al. (2025). Accuracy of ChatGPT, Gemini, Copilot, and Claude to Blepharoplasty-Related Questions.. Aesthetic plastic surgery, 49(17), 4775-4785. https://doi.org/10.1007/s00266-025-05071-9
MLA Seher Köksaldı, et al.. "Accuracy of ChatGPT, Gemini, Copilot, and Claude to Blepharoplasty-Related Questions.." Aesthetic plastic surgery, vol. 49, no. 17, 2025, pp. 4775-4785.
PMID 40691658

Abstract

[BACKGROUND] This study aimed to evaluate the performance of four large language models (LLMs)-ChatGPT, Gemini, Copilot, and Claude-in responding to upper eyelid blepharoplasty-related questions, focusing on medical accuracy, clinical relevance, response length, and readability.

[METHODS] A set of queries regarding upper eyelid blepharoplasty, covering six categories (anatomy, surgical procedure, additional intraoperative procedures, postoperative monitoring, follow-up, and postoperative complications) were posed to each LLM. An identical prompt establishing clinical context was provided before each question. Responses were evaluated by three ophthalmologists using a 5-point Likert scale for medical accuracy and a 3-point Likert scale for clinical relevance. The length of the responses was assessed. Readability was also evaluated using the Flesch Reading Ease Score, Flesch-Kincaid Grade Level, Coleman-Liau Index, Gunning Fog Index, and Simple Measure of Gobbledygook grade.

[RESULTS] A total of 30 standardized questions were presented to each LLM. None of the responses from any LLM received a score of 1 regarding medical accuracy for any question. ChatGPT achieved an 80% 'highly accurate' response rate, followed by Claude (60%), Gemini (40%), and Copilot (20%). None of the responses from ChatGPT and Claude received a score of 1 regarding clinical relevance, whereas 10% of Gemini's responses and 26.7% of Copilot's responses received a score of 1. ChatGPT also provided the most clinically 'relevant' responses (86.7%), outperforming the other LLMs. Copilot generated the shortest responses, while ChatGPT generated the longest. Readability analyses revealed that all responses required advanced reading skills at a 'college graduate' level or higher, with Copilot's responses being the most complex.

[CONCLUSION] ChatGPT demonstrated superior performance in both medical accuracy and clinical relevance among evaluated LLMs regarding upper eyelid blepharoplasty, particularly excelling in postoperative monitoring and follow-up categories. While all models generated complex texts requiring advanced literacy, ChatGPT's detailed responses offer valuable guidance for ophthalmologists managing upper eyelid blepharoplasty cases.

[LEVEL OF EVIDENCE V] This journal requires that authors assign a level of evidence to each article. For a full description of these Evidence-Based Medicine ratings, please refer to the Table of Contents or the online Instructions to Authors www.springer.com/00266 .

추출된 의학 개체 (NER)

유형영어 표현한국어 / 풀이UMLS CUI출처등장
시술 blepharoplasty 안검성형술 dict 5
해부 upper eyelid 눈꺼풀 dict 4
합병증 eyelid scispacy 1
약물 ChatGPT scispacy 1
약물 Claude-in scispacy 1
약물 [RESULTS] A scispacy 1
약물 Gemini scispacy 1
질환 LLM scispacy 1
기타 Gemini scispacy 1
기타 ChatGPT scispacy 1
기타 Copilot scispacy 1

MeSH Terms

Blepharoplasty; Humans; Surveys and Questionnaires; Female; Male; Language; Comprehension; Generative Artificial Intelligence

🔗 함께 등장하는 도메인

이 논문이 속한 카테고리와 같은 논문에서 자주 함께 다뤄지는 카테고리들

같은 제1저자의 인용 많은 논문 (1)

관련 논문