Evaluating Plastic Surgery Chatbot Performance: Insights into Medical Triage, Classification Accuracy, and Escalation Trends.
Abstract
[BACKGROUND] The integration of AI chatbots into plastic surgery websites is now standard, providing asynchronous, real-time engagement for patients. Although promoted as scheduling and medical guidance tools, their contribution to clinical workflow improvement and patient satisfaction remains unclear.
[OBJECTIVES] The aim of this study was to evaluate the accuracy of AI chatbot performance in clinical triage of plastic surgery patients, focusing on triage accuracy and quality of patient interactions.
[METHODS] The responses of chatbots on top-ranking plastic surgery websites, identified by search engine optimization (SEO) rankings, were analyzed with standardized clinical scenarios representing emergent, urgent, and elective patient inquiries. Responses were analyzed by the chatbot's triage sensitivity and specificity, classification accuracy, escalation metrics, and content quality. Patient experience was quantified with a chatbot usability questionnaire and a visual analog scale. Subgroup analysis by chatbot platform and thematic analysis was performed to identify tonal patterns in chatbot language.
[RESULTS] Performance varied significantly across 60 clinical scenarios, particularly in urgency classification. Emergent classifications were most mislabeled as urgent, with a low sensitivity (20%), negative predictive value (0.71), and high false negative rate (80.0%). Agreement with physician-determined classifications was moderate (Cohen's kappa = 0.47), and over half of conversations required human-provider escalation. Misclassified interactions were associated with lower patient usability scores compared to correct classifications (49.1 vs 60.8, P < .05). Thematic analysis revealed reliance on templated, administrative language.
[CONCLUSIONS] Chatbots are practical and useful tools for managing elective plastic surgery inquiries but are ill-equipped to handle urgent and emergent patient needs. To move beyond utilization as basic administrative assistants, deployment of more clinically adept chatbots is needed.
[OBJECTIVES] The aim of this study was to evaluate the accuracy of AI chatbot performance in clinical triage of plastic surgery patients, focusing on triage accuracy and quality of patient interactions.
[METHODS] The responses of chatbots on top-ranking plastic surgery websites, identified by search engine optimization (SEO) rankings, were analyzed with standardized clinical scenarios representing emergent, urgent, and elective patient inquiries. Responses were analyzed by the chatbot's triage sensitivity and specificity, classification accuracy, escalation metrics, and content quality. Patient experience was quantified with a chatbot usability questionnaire and a visual analog scale. Subgroup analysis by chatbot platform and thematic analysis was performed to identify tonal patterns in chatbot language.
[RESULTS] Performance varied significantly across 60 clinical scenarios, particularly in urgency classification. Emergent classifications were most mislabeled as urgent, with a low sensitivity (20%), negative predictive value (0.71), and high false negative rate (80.0%). Agreement with physician-determined classifications was moderate (Cohen's kappa = 0.47), and over half of conversations required human-provider escalation. Misclassified interactions were associated with lower patient usability scores compared to correct classifications (49.1 vs 60.8, P < .05). Thematic analysis revealed reliance on templated, administrative language.
[CONCLUSIONS] Chatbots are practical and useful tools for managing elective plastic surgery inquiries but are ill-equipped to handle urgent and emergent patient needs. To move beyond utilization as basic administrative assistants, deployment of more clinically adept chatbots is needed.
추출된 의학 개체 (NER)
| 유형 | 영어 표현 | 한국어 / 풀이 | UMLS CUI | 출처 | 등장 |
|---|---|---|---|---|---|
| 약물 | [BACKGROUND]
|
scispacy | 1 | ||
| 약물 | [OBJECTIVES]
|
scispacy | 1 | ||
| 약물 | [CONCLUSIONS]
|
scispacy | 1 | ||
| 기타 | patients
|
scispacy | 1 | ||
| 기타 | patient
|
scispacy | 1 |
MeSH Terms
Humans; Triage; Surgery, Plastic; Patient Satisfaction; Internet; Artificial Intelligence; Surveys and Questionnaires; Plastic Surgery Procedures; Workflow; Generative Artificial Intelligence