A comprehensive evaluation of ChatGPT consultation quality for augmentation mammoplasty: A comparative analysis between plastic surgeons and laypersons.
Abstract
[OBJECTIVES] ChatGPT has gained significant popularity as a source of healthcare information among the general population. Evaluating the quality of chatbot responses is crucial, requiring comprehensive and qualitative analysis. This study aims to assess the answers provided by ChatGPT during hypothetical breast augmentation consultations across various categories and depths. The evaluation involves the utilization of validated tools and a comparison of scores between plastic surgeons and laypersons.
[METHODS] A panel consisting of five plastic surgeons and five laypersons evaluated ChatGPT's responses to 25 questions spanning consultation, procedure, recovery, and sentiment categories. The DISCERN and PEMAT tools were employed to assess the responses, while emotional context was examined through ten specific questions. Additionally, readability was measured using the Flesch Reading Ease score. Qualitative analysis was performed to identify the overall strengths and weaknesses.
[RESULTS] Plastic surgeons generally scored lower than laypersons across most domains. Scores for each evaluation domain varied by category, with the consultation category demonstrating lower scores in terms of DISCERN reliability, information quality, and DISCERN score. Plastic surgeons assigned significantly lower overall quality ratings to the procedure category compared to other question categories. They also gave lower emotion scores in the procedure category compared to laypersons. The depth of the questions did not impact the scoring.
[CONCLUSIONS] Existing health information evaluation tools may not be entirely suitable for comprehensively evaluating the quality of individual responses generated by ChatGPT. Consequently, the development and implementation of appropriate evaluation tools to assess the appropriateness and quality of AI consultations are necessary.
[METHODS] A panel consisting of five plastic surgeons and five laypersons evaluated ChatGPT's responses to 25 questions spanning consultation, procedure, recovery, and sentiment categories. The DISCERN and PEMAT tools were employed to assess the responses, while emotional context was examined through ten specific questions. Additionally, readability was measured using the Flesch Reading Ease score. Qualitative analysis was performed to identify the overall strengths and weaknesses.
[RESULTS] Plastic surgeons generally scored lower than laypersons across most domains. Scores for each evaluation domain varied by category, with the consultation category demonstrating lower scores in terms of DISCERN reliability, information quality, and DISCERN score. Plastic surgeons assigned significantly lower overall quality ratings to the procedure category compared to other question categories. They also gave lower emotion scores in the procedure category compared to laypersons. The depth of the questions did not impact the scoring.
[CONCLUSIONS] Existing health information evaluation tools may not be entirely suitable for comprehensively evaluating the quality of individual responses generated by ChatGPT. Consequently, the development and implementation of appropriate evaluation tools to assess the appropriateness and quality of AI consultations are necessary.
추출된 의학 개체 (NER)
| 유형 | 영어 표현 | 한국어 / 풀이 | UMLS CUI | 출처 | 등장 |
|---|---|---|---|---|---|
| 시술 | augmentation mammoplasty
|
유방성형술 | dict | 1 | |
| 시술 | breast augmentation
|
유방성형술 | dict | 1 | |
| 해부 | breast
|
유방 | dict | 1 | |
| 약물 | ChatGPT
|
scispacy | 1 | ||
| 약물 | [OBJECTIVES] ChatGPT
|
scispacy | 1 | ||
| 약물 | [CONCLUSIONS]
|
scispacy | 1 |
MeSH Terms
Humans; Female; Mammaplasty; Surgeons; Referral and Consultation; Surgery, Plastic; Adult; Surveys and Questionnaires; Generative Artificial Intelligence
🔗 함께 등장하는 도메인
이 논문이 속한 카테고리와 같은 논문에서 자주 함께 다뤄지는 카테고리들
관련 논문
- The impact of three-dimensional simulation and virtual reality technologies on surgical decision-making and postoperative satisfaction in aesthetic surgery: a preliminary study.
- Cutaneous fistula of the breast: A complication of cosmetic autologous fat transfer.
- Epidermal inclusion cyst after breast reduction mammoplasty.
- Clinical outcomes of synthetic absorbable mesh use in breast surgery: First case series in reconstruction and aesthetic mastopexy.
- Implant-based versus autologous mastopexy after massive weight loss: Complications and patient satisfaction.