A Comparative Study on the Use of DeepSeek-R1 and ChatGPT-4.5 in Different Aspects of Plastic Surgery.
Abstract
[BACKGROUND] Artificial intelligence (AI) has the potential to enhance medical practice, but its application in plastic surgery remains underexplored. DeepSeek-R1 and ChatGPT-4.5 are AI models that can assist with clinical tasks, but their performance in plastic surgery-related queries needs evaluation. This study compares the two models in providing clinically relevant, detailed, and accurate responses.
[OBJECTIVE] The objective of this study is to evaluate and compare the performance of DeepSeek-R1 and ChatGPT-4.5 across 10 plastic surgery-related tasks, focusing on accuracy, detail, and clinical relevance.
[METHODS] This comparative evaluation was conducted by having two senior plastic surgeons review the AI-generated responses for each task. The responses were rated on a 1-10 scale based on their accuracy, completeness, and clinical relevance. The tasks involved both general knowledge questions and more complex, clinically relevant tasks such as medical history notes and hospital admission/discharge slips. After scoring, the mean and standard deviation (SD) were calculated for each model to evaluate their overall performance and consistency.
[RESULTS] The results revealed that DeepSeek-R1 consistently outperformed ChatGPT-4.5 across all tasks, with higher average scores for both evaluators. DeepSeek-R1 excelled in tasks requiring high clinical detail, comprehensive explanations, and professional-level accuracy, particularly in tasks involving botulinum toxin, medical documentation, and novel research topics. In contrast, ChatGPT-4.5 was rated higher for tasks requiring concise responses, providing accurate but less detailed overviews. The mean scores for DeepSeek-R1 were significantly higher, with lower standard deviations, indicating greater consistency in its responses. ChatGPT-4.5, though performing well for general inquiries, showed more variability and scored lower in complex clinical tasks.
[CONCLUSION] DeepSeek-R1 is better suited for tasks needing clinical detail and professional-level accuracy, while ChatGPT-4.5 excels in providing quick, concise responses. Both models show promise in supporting plastic surgery practice and education, but should complement, not replace, human expertise.
[LEVEL OF EVIDENCE V] This journal requires that authors assign a level of evidence to each article. For a full description of these Evidence-Based Medicine ratings, please refer to the Table of Contents or the online Instructions to Authors www.springer.com/00266 .
[OBJECTIVE] The objective of this study is to evaluate and compare the performance of DeepSeek-R1 and ChatGPT-4.5 across 10 plastic surgery-related tasks, focusing on accuracy, detail, and clinical relevance.
[METHODS] This comparative evaluation was conducted by having two senior plastic surgeons review the AI-generated responses for each task. The responses were rated on a 1-10 scale based on their accuracy, completeness, and clinical relevance. The tasks involved both general knowledge questions and more complex, clinically relevant tasks such as medical history notes and hospital admission/discharge slips. After scoring, the mean and standard deviation (SD) were calculated for each model to evaluate their overall performance and consistency.
[RESULTS] The results revealed that DeepSeek-R1 consistently outperformed ChatGPT-4.5 across all tasks, with higher average scores for both evaluators. DeepSeek-R1 excelled in tasks requiring high clinical detail, comprehensive explanations, and professional-level accuracy, particularly in tasks involving botulinum toxin, medical documentation, and novel research topics. In contrast, ChatGPT-4.5 was rated higher for tasks requiring concise responses, providing accurate but less detailed overviews. The mean scores for DeepSeek-R1 were significantly higher, with lower standard deviations, indicating greater consistency in its responses. ChatGPT-4.5, though performing well for general inquiries, showed more variability and scored lower in complex clinical tasks.
[CONCLUSION] DeepSeek-R1 is better suited for tasks needing clinical detail and professional-level accuracy, while ChatGPT-4.5 excels in providing quick, concise responses. Both models show promise in supporting plastic surgery practice and education, but should complement, not replace, human expertise.
[LEVEL OF EVIDENCE V] This journal requires that authors assign a level of evidence to each article. For a full description of these Evidence-Based Medicine ratings, please refer to the Table of Contents or the online Instructions to Authors www.springer.com/00266 .
추출된 의학 개체 (NER)
| 유형 | 영어 표현 | 한국어 / 풀이 | UMLS CUI | 출처 | 등장 |
|---|---|---|---|---|---|
| 시술 | botulinum toxin
|
보툴리눔독소 주사 | dict | 1 | |
| 해부 | ChatGPT-4.5
|
scispacy | 1 | ||
| 약물 | [BACKGROUND] Artificial
|
scispacy | 1 | ||
| 약물 | [OBJECTIVE]
|
scispacy | 1 | ||
| 질환 | DeepSeek-R1
|
scispacy | 1 | ||
| 질환 | ChatGPT-4.5
|
scispacy | 1 | ||
| 기타 | human
|
scispacy | 1 |
MeSH Terms
Humans; Surgery, Plastic; Artificial Intelligence; Plastic Surgery Procedures; Female; Clinical Competence; Male; Generative Artificial Intelligence
📑 인용 관계
이 논문이 참조한 문헌 20
- Artificial Intelligence in Plastic Surgery: ChatGPT as a Tool to Address Disparities in Health Liter…
- OpenAI's ChatGPT and Its Role in Plastic Surgery Research.
- Evaluation of Artificial Intelligence-generated Responses to Common Plastic Surgery Questions.
- Artificial Intelligence in Facial Plastic Surgery: A Review of Current Applications, Future Applicat…
- A Systematic Review of Artificial Intelligence Applications in Plastic Surgery: Looking to the Futur…
- Ethics in Plastic Surgery: Applying the Four Common Principles to Practice.
🔗 함께 등장하는 도메인
이 논문이 속한 카테고리와 같은 논문에서 자주 함께 다뤄지는 카테고리들
관련 논문
- Local therapeutic strategies for neurocutaneous dysesthesia: from capsaicin to cannabinoids.
- Comparative efficacy of intralesional therapies for keloid scars: a network meta-analysis.
- Adverse neurological events following botulinum toxin type A: A case series of post-injection seizures and paralysis.
- Decreased utilization of component separation techniques over time in complex abdominal wall reconstruction following introduction of preoperative botulinum toxin A.
- Current Perspectives on Pectoralis Minor Syndrome: A Narrative Review.