Token-splitting improves GPT-4.1 performance on plastic surgery exams: implications for AI-Assisted medical education.

Medical education online 2025 Vol.30(1) p. 2602788

Lei YH, Chen CC, Shen CJ

Abstract

Large language models (LLMs), such as ChatGPT, have demonstrated impressive performance on general medical examinations; however, their effectiveness significantly declines in specialized board examinations due to limited domain-specific training data and computational constraints inherent to their self-attention mechanisms. This study investigates a novel token-splitting strategy informed by Cognitive Load Theory (CLT), aimed at overcoming these limitations by optimizing cognitive processing and enhancing knowledge retention in specialized educational contexts. We implemented a token-splitting approach by segmenting Taiwan plastic surgery board examination materials and associated textbook content into cognitively manageable segments ranging from 4,000 to 20,000 tokens. These segmented inputs were provided to GPT-4.1 via its standard ChatGPT web interface. Model performance was rigorously evaluated, comparing accuracy and efficiency across various token lengths and question complexities classified according to Bloom's taxonomy.The GPT-4.1 model utilizing the token-splitting strategy significantly outperformed the baseline (unmodified) model, achieving notably higher accuracy. The optimal segmentation length was determined to be 6,000 tokens, effectively balancing cognitive coherence with information retention and model attention. Errors observed at this optimal length primarily resulted from content absent from textual materials or requiring multimodal interpretation (e.g., image-based reasoning). Provided relevant textual content was adequately segmented, GPT-4.1 consistently demonstrated high accuracy (From 75.88% to 92.93%). The findings highlight that a token-splitting approach, grounded in Cognitive Load Theory, significantly enhances LLM performance on specialized medical board examinations. This accessible, user-friendly strategy provides educators and clinicians with a practical means to improve AI-assisted education outcomes without requiring complex technical skills or infrastructure. Future research and development integrating multimodal capabilities and adaptive segmentation strategies promise to further optimize educational applications and clinical decision-making support.

추출된 의학 개체 (NER)

유형영어 표현한국어 / 풀이UMLS CUI출처등장
약물 ChatGPT scispacy 1
질환 Bloom scispacy 1
질환 LLM scispacy 1

MeSH Terms

Humans; Taiwan; Educational Measurement; Surgery, Plastic; Artificial Intelligence; Cognition; Education, Medical