Thyroid nodule segmentation in ultrasound images using transformer models with masked autoencoder pre-training.
1/5 보강
[INTRODUCTION] Thyroid nodule segmentation in ultrasound (US) images is a valuable yet challenging task, playing a critical role in diagnosing thyroid cancer.
APA
Xiang Y, Acharya R, et al. (2025). Thyroid nodule segmentation in ultrasound images using transformer models with masked autoencoder pre-training.. Frontiers in artificial intelligence, 8, 1618426. https://doi.org/10.3389/frai.2025.1618426
MLA
Xiang Y, et al.. "Thyroid nodule segmentation in ultrasound images using transformer models with masked autoencoder pre-training.." Frontiers in artificial intelligence, vol. 8, 2025, pp. 1618426.
PMID
40777517
Abstract
[INTRODUCTION] Thyroid nodule segmentation in ultrasound (US) images is a valuable yet challenging task, playing a critical role in diagnosing thyroid cancer. The difficulty arises from factors such as the absence of prior knowledge about the thyroid region, low contrast between anatomical structures, and speckle noise, all of which obscure boundary detection and introduce variability in nodule appearance across different images.
[METHODS] To address these challenges, we propose a transformer-based model for thyroid nodule segmentation. Unlike traditional convolutional neural networks (CNNs), transformers capture global context from the first layer, enabling more comprehensive image representation, which is crucial for identifying subtle nodule boundaries. In this study, We first pre-train a Masked Autoencoder (MAE) to reconstruct masked patches, then fine-tune on thyroid US data, and further explore a cross-attention mechanism to enhance information flow between encoder and decoder.
[RESULTS] Our experiments on the public AIMI, TN3K, and DDTI datasets show that MAE pre-training accelerates convergence. However, overall improvements are modest: the model achieves Dice Similarity Coefficient (DSC) scores of 0.63, 0.64, and 0.65 on AIMI, TN3K, and DDTI, respectively, highlighting limitations under small-sample conditions. Furthermore, adding cross-attention did not yield consistent gains, suggesting that data volume and diversity may be more critical than additional architectural complexity.
[DISCUSSION] MAE pre-training notably reduces training time and helps themodel learn transferable features, yet overall accuracy remains constrained by limited data and nodule variability. Future work will focus on scaling up data, pre-training cross-attention layers, and exploring hybrid architectures to further boost segmentation performance.
[METHODS] To address these challenges, we propose a transformer-based model for thyroid nodule segmentation. Unlike traditional convolutional neural networks (CNNs), transformers capture global context from the first layer, enabling more comprehensive image representation, which is crucial for identifying subtle nodule boundaries. In this study, We first pre-train a Masked Autoencoder (MAE) to reconstruct masked patches, then fine-tune on thyroid US data, and further explore a cross-attention mechanism to enhance information flow between encoder and decoder.
[RESULTS] Our experiments on the public AIMI, TN3K, and DDTI datasets show that MAE pre-training accelerates convergence. However, overall improvements are modest: the model achieves Dice Similarity Coefficient (DSC) scores of 0.63, 0.64, and 0.65 on AIMI, TN3K, and DDTI, respectively, highlighting limitations under small-sample conditions. Furthermore, adding cross-attention did not yield consistent gains, suggesting that data volume and diversity may be more critical than additional architectural complexity.
[DISCUSSION] MAE pre-training notably reduces training time and helps themodel learn transferable features, yet overall accuracy remains constrained by limited data and nodule variability. Future work will focus on scaling up data, pre-training cross-attention layers, and exploring hybrid architectures to further boost segmentation performance.
같은 제1저자의 인용 많은 논문 (5)
- Co-delivery nanoparticle targeting CAF for simultaneous activating T cell plus NKT cell attack in solid tumor.
- In-depth Evaluation of the Olink Target 48 Cytokine Panel: Inter-Laboratory Evaluation of Performance and Reliability for Biomarker Studies in Oncology.
- Short-chain fatty acids in the tumor microenvironment: from molecular mechanisms to cancer therapy.
- Construction of circadian clock signature for tumor microenvironment in predicting survival of esophageal squamous cell carcinoma.
- A prospective phase II trial of 10-fraction whole-breast radiotherapy following breast-conserving surgery.