본문으로 건너뛰기
← 뒤로

Hybrid Multi-View MRI Fusion for csPCa Diagnosis via Intra- and Inter-View Transformers.

1/5 보강
IEEE journal of biomedical and health informatics 📖 저널 OA 2.4% 2025: 0/11 OA 2026: 1/30 OA 2025~2026 2026 Vol.PP()
Retraction 확인
출처

Zhao Y, Li D, Zhang T, Wang X, Xu J, Xuan K

📝 환자 설명용 한 줄

Accurate diagnosis of clinically significant prostate cancer (csPCa) from multi-view MRI scans (axial, sagittal, and coronal) is essential for effective treatment planning and improved outcomes.

이 논문을 인용하기

↓ .bib ↓ .ris
APA Zhao Y, Li D, et al. (2026). Hybrid Multi-View MRI Fusion for csPCa Diagnosis via Intra- and Inter-View Transformers.. IEEE journal of biomedical and health informatics, PP. https://doi.org/10.1109/JBHI.2026.3672887
MLA Zhao Y, et al.. "Hybrid Multi-View MRI Fusion for csPCa Diagnosis via Intra- and Inter-View Transformers.." IEEE journal of biomedical and health informatics, vol. PP, 2026.
PMID 41805501 ↗

Abstract

Accurate diagnosis of clinically significant prostate cancer (csPCa) from multi-view MRI scans (axial, sagittal, and coronal) is essential for effective treatment planning and improved outcomes. Although deep learning has advanced prostate MRI analysis, many existing approaches adopt late fusion strategies that aggregate one-dimensional feature vectors extracted independently from each view, resulting in loss of spatial information and anatomical correspondence across views, ultimately limiting diagnostic performance. While Vision Transformers offer flexibility in processing multi-view patches, their memory requirements scale quadratically with the number of patches, hindering efficient concurrent processing. In contrast, Swin Transformers efficiently capture local features but are typically restricted to single-view processing due to their reliance on regular-grid input constraints. To overcome these limitations, we propose a hybrid fusion framework that decomposes multi-view information integration into iterative intra-view and inter-view interactions across multiple resolutions. The framework preserves spatial coherence and enables fine-grained feature integration while maintaining computational efficiency. Specifically, the inter-view feature exchange module, based on the Vision Transformer, employs bridge tokens to summarize information from localized patch windows, reducing memory usage while preserving spatial relationships across views. The intra-view feature extraction module, built on the Swin Transformer, facilitates dynamic, attention-driven interactions among image patches and bridge tokens within each window. Moreover, shared positional embeddings are explicitly incorporated to enhance spatial correspondence across views. Extensive experiments on a public dataset demonstrate the superiority of our method in csPCa classification. Ablation studies highlight contributions of different components, while attention map visualizations validate integration of anatomical structures across views.

같은 제1저자의 인용 많은 논문 (5)