Multimodal sparse fusion transformer network with spatio-temporal decoupling for breast tumor classification.
TL;DR
A novel Multimodal Sparse Fusion Transformer Network (MSFT-Net), which achieves superior performance in multimodal breast tumor classification compared to state-of-the-art methods, providing fast and reliable support for radiologists in diagnostic tasks.
OpenAlex 토픽 ·
AI in cancer detection
Ultrasound Imaging and Elastography
Brain Tumor Detection and Classification
A novel Multimodal Sparse Fusion Transformer Network (MSFT-Net), which achieves superior performance in multimodal breast tumor classification compared to state-of-the-art methods, providing fast and
APA
Jiahao Xu, Shuxin Zhuang, et al. (2026). Multimodal sparse fusion transformer network with spatio-temporal decoupling for breast tumor classification.. Medical image analysis, 110, 103966. https://doi.org/10.1016/j.media.2026.103966
MLA
Jiahao Xu, et al.. "Multimodal sparse fusion transformer network with spatio-temporal decoupling for breast tumor classification.." Medical image analysis, vol. 110, 2026, pp. 103966.
PMID
41643362
Abstract
Accurate analysis of tumor morphology, vascularity, and tissue stiffness under multimodal ultrasound imaging plays a critical role in the diagnosis of breast cancer. However, manual interpretation across multiple modalities is time-consuming and heavily dependent on the radiologist's expertise. Computer-aided classification offers an efficient alternative, yet remains challenging due to significant modality heterogeneity, inconsistent image quality, and redundant information across modalities. To address these issues, we propose a novel Multimodal Sparse Fusion Transformer Network (MSFT-Net). First, a Spatio-Temporal Decoupling Attention architecture (STDA) is introduced to disentangle and extract dynamic and static features from different modalities along spatial and temporal dimensions, capturing modality-specific motion and morphological characteristics independently. Second, the Mixed-Scale Convolution Module (MSCM) obtains tumor features at multiple scales, enhancing geometric detail representation and improving receptive field coverage. Third, the Sparse Cross-Attention Module (SCAM) adaptively retains the most effective query-key interactions between modalities, thereby facilitating the aggregation of high-quality features for robust multimodal information fusion. MSFT-Net is trained and tested on a curated dataset comprising multimodal breast tumor videos collected from 458 patients, including ultrasound (US), superb microvascular imaging (SMI), and strain elastography (SE), and its generalizability is further validated on the public BraTS'21 MRI dataset. Extensive experiments demonstrate that MSFT-Net achieves superior performance in multimodal breast tumor classification compared to state-of-the-art methods, providing fast and reliable support for radiologists in diagnostic tasks.
MeSH Terms
Humans; Breast Neoplasms; Female; Multimodal Imaging; Ultrasonography, Mammary; Image Interpretation, Computer-Assisted; Neural Networks, Computer
같은 제1저자의 인용 많은 논문 (5)
- CRISPR/Cas9 Screening Reveals that UBE2L3 Modulates Autophagic Flux through TSC2 Ubiquitination and Potentiates PD-1 Blockade in Triple-Negative Breast Cancer.
- The TRIM3/TLR3 axis overrides IFN-β feedback inhibition to suppress NSCLC progression.
- Tumor cell-intrinsic PD-1 in malignant ascites drives ovarian cancer progression via MAPK/ERK signaling.
- Molecular Imaging of Hepatocellular Carcinoma with Third-Generation US Contrast Agents: Toward Clinical Translation.
- Pulmonary sclerosing pneumocytoma with lymph node metastasis and high FDG uptake in PET/CT: a rare case report and literature review.