Performance of ChatGPT-4 on the French Board of Plastic Reconstructive and Aesthetic Surgery written exam: a descriptive study.

Journal of educational evaluation for health professions 2025 Vol.22() p. 27

Dejean-Bouyer E, Kanlagna A, Thuau F, Perrot P, Lancien U

Abstract

[PURPOSE] This study aims to evaluate the performance of Chat Generative Pre-Trained Transformer 4 (ChatGPT-4) on the French Board of Plastic, Reconstructive, and Aesthetic Surgery written examination and to assess its role as a supplementary resource in helping medical students prepare for the qualification examination in plastic surgery.

[METHODS] This descriptive study evaluated ChatGPT-4's performance on 213 items from the October 2024 French Board of Plastic, Reconstructive, and Aesthetic Surgery written examination. Responses were assessed for accuracy, logical reasoning, internal and external information use, and were categorized for fallacies by independent reviewers. Statistical analyses included chi-square tests and Fisher's exact test for significance.

[RESULTS] ChatGPT-4 answered all questions across the 10 modules, achieving an overall accuracy rate of 77.5%. The model applied logical reasoning in 98.1% of the questions, utilized internal information in 94.4%, and incorporated external information in 91.1%.

[CONCLUSION] ChatGPT-4 performs satisfactorily on the French Board of Plastic, Reconstructive, and Aesthetic Surgery written examination. Its accuracy met the minimum passing standards for the exam. While responses generally align with expected knowledge, careful verification remains necessary, particularly for questions involving image interpretation. As artificial intelligence continues to evolve, ChatGPT-4 is expected to become an increasingly reliable tool for medical education. At present, it remains a valuable resource for assisting plastic surgery residents in their training.

MeSH Terms

Humans; Educational Measurement; Surgery, Plastic; France; Plastic Surgery Procedures; Students, Medical; Education, Medical, Undergraduate; Clinical Competence; Generative Artificial Intelligence