본문으로 건너뛰기
← 뒤로

Multimodal sparse fusion transformer network with spatio-temporal decoupling for breast tumor classification.

Medical image analysis 2026 Vol.110() p. 103966 🌐 cited 1 AI in cancer detection
TL;DR A novel Multimodal Sparse Fusion Transformer Network (MSFT-Net), which achieves superior performance in multimodal breast tumor classification compared to state-of-the-art methods, providing fast and reliable support for radiologists in diagnostic tasks.
OpenAlex 토픽 · AI in cancer detection Ultrasound Imaging and Elastography Brain Tumor Detection and Classification

Xu J, Zhuang S, He Y, Wang H, Zhuang Z, Zeng H

📝 환자 설명용 한 줄

A novel Multimodal Sparse Fusion Transformer Network (MSFT-Net), which achieves superior performance in multimodal breast tumor classification compared to state-of-the-art methods, providing fast and

이 논문을 인용하기

BibTeX ↓ RIS ↓
APA Jiahao Xu, Shuxin Zhuang, et al. (2026). Multimodal sparse fusion transformer network with spatio-temporal decoupling for breast tumor classification.. Medical image analysis, 110, 103966. https://doi.org/10.1016/j.media.2026.103966
MLA Jiahao Xu, et al.. "Multimodal sparse fusion transformer network with spatio-temporal decoupling for breast tumor classification.." Medical image analysis, vol. 110, 2026, pp. 103966.
PMID 41643362

Abstract

Accurate analysis of tumor morphology, vascularity, and tissue stiffness under multimodal ultrasound imaging plays a critical role in the diagnosis of breast cancer. However, manual interpretation across multiple modalities is time-consuming and heavily dependent on the radiologist's expertise. Computer-aided classification offers an efficient alternative, yet remains challenging due to significant modality heterogeneity, inconsistent image quality, and redundant information across modalities. To address these issues, we propose a novel Multimodal Sparse Fusion Transformer Network (MSFT-Net). First, a Spatio-Temporal Decoupling Attention architecture (STDA) is introduced to disentangle and extract dynamic and static features from different modalities along spatial and temporal dimensions, capturing modality-specific motion and morphological characteristics independently. Second, the Mixed-Scale Convolution Module (MSCM) obtains tumor features at multiple scales, enhancing geometric detail representation and improving receptive field coverage. Third, the Sparse Cross-Attention Module (SCAM) adaptively retains the most effective query-key interactions between modalities, thereby facilitating the aggregation of high-quality features for robust multimodal information fusion. MSFT-Net is trained and tested on a curated dataset comprising multimodal breast tumor videos collected from 458 patients, including ultrasound (US), superb microvascular imaging (SMI), and strain elastography (SE), and its generalizability is further validated on the public BraTS'21 MRI dataset. Extensive experiments demonstrate that MSFT-Net achieves superior performance in multimodal breast tumor classification compared to state-of-the-art methods, providing fast and reliable support for radiologists in diagnostic tasks.

MeSH Terms

Humans; Breast Neoplasms; Female; Multimodal Imaging; Ultrasonography, Mammary; Image Interpretation, Computer-Assisted; Neural Networks, Computer

같은 제1저자의 인용 많은 논문 (5)