본문으로 건너뛰기
← 뒤로

Multiparametric MRI-based habitat analysis integrating deep learning and radiomics for predicting preoperative Ki-67 expression level in breast cancer.

1/5 보강
BMC medical imaging 📖 저널 OA 100% 2022: 3/3 OA 2023: 2/2 OA 2024: 3/3 OA 2025: 37/37 OA 2026: 44/44 OA 2022~2026 2026 Vol.26(1) p. 80
Retraction 확인
출처

PICO 자동 추출 (휴리스틱, conf 2/4)

유사 논문
P · Population 대상 환자/모집단
142 patients), internal validation set (60 patients), and external test set (52 patients).
I · Intervention 중재 / 시술
추출되지 않음
C · Comparison 대조 / 비교
추출되지 않음
O · Outcome 결과 / 결론
[CONCLUSION] The integrated model combining traditional radiomics and deep learning from MRI significantly predicts Ki-67 expression in breast cancer, enhancing preoperative prediction accuracy and interpretability for personalized treatment. [SUPPLEMENTARY INFORMATION] The online version contains supplementary material available at 10.1186/s12880-026-02151-3.

Wang Y, Zhang Y, Liu Z, Xiong Y, Li M, Zhang L

📝 환자 설명용 한 줄

[BACKGROUND] Breast cancer (BC) is the most common malignant tumor in women globally.

이 논문을 인용하기

↓ .bib ↓ .ris
APA Wang Y, Zhang Y, et al. (2026). Multiparametric MRI-based habitat analysis integrating deep learning and radiomics for predicting preoperative Ki-67 expression level in breast cancer.. BMC medical imaging, 26(1), 80. https://doi.org/10.1186/s12880-026-02151-3
MLA Wang Y, et al.. "Multiparametric MRI-based habitat analysis integrating deep learning and radiomics for predicting preoperative Ki-67 expression level in breast cancer.." BMC medical imaging, vol. 26, no. 1, 2026, pp. 80.
PMID 41545974 ↗

Abstract

[BACKGROUND] Breast cancer (BC) is the most common malignant tumor in women globally. Ki-67, a vital marker for prognosis, is currently detected invasively. Non-invasive magnetic resonance imaging (MRI) prediction faces challenges due to intratumoral heterogeneity.

[MATERIALS AND METHODS] This retrospective study included 254 breast cancer patients from two centers, divided into training set (142 patients), internal validation set (60 patients), and external test set (52 patients). T2-weighted imaging (T2WI) and dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) were analyzed. Traditional radiomics features were extracted from intratumoral, habitat subregions, 5/10-mm peritumoral rings, image fusion. A pre-trained ResNet-50 model extracted 2.5D deep learning features. Feature selection used intraclass correlation coefficient (ICC), Z-score normalization, T-tests, Pearson correlations, and the least absolute shrinkage and selection operator (LASSO). A baseline clinical model was constructed using clinical and qualitative MRI semantic features. Models were built using Support Vector Machine (SVM), Random Forest (RF), and Extra-Trees (ET). Model performance was evaluated via the area under the curve (AUC), accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and F-1 score. Gradient-weighted Class Activation Mapping (Grad-CAM) was applied to the final convolutional layer of ResNet50 to spatially localize the decision-critical regions. Shapley Additive Explanations (SHAP) analysis enhanced interpretability.

[RESULTS] The best clinical model achieved an AUC of 0.666 in the validation set. The best-performing traditional radiomics model achieved an AUC of 0.825 in the internal validation set. The optimal deep learning model obtained an AUC of 0.804 in the internal validation set. The combined model, utilizing the best features from both traditional radiomics and deep learning, demonstrated superior performance with an AUC of 0.885 in the internal validation set and 0.839 in the external test set.

[CONCLUSION] The integrated model combining traditional radiomics and deep learning from MRI significantly predicts Ki-67 expression in breast cancer, enhancing preoperative prediction accuracy and interpretability for personalized treatment.

[SUPPLEMENTARY INFORMATION] The online version contains supplementary material available at 10.1186/s12880-026-02151-3.

🏷️ 키워드 / MeSH 📖 같은 키워드 OA만

같은 제1저자의 인용 많은 논문 (5)

📖 전문 본문 읽기 PMC JATS · ~69 KB · 영문

Introduction

Introduction
Breast cancer(BC) is the most common and malignant tumor in women worldwide [1]. Accurate risk stratification and timely treatment are essential for improving prognosis. The routinely used immunohistochemistry (IHC) includes Progesterone Receptor(PR), Estrogen Receptor(ER), Human Epidermal Growth Factor Receptor 2 (HER-2), and Ki-67. Among them, the proliferation index Ki-67 has been demonstrated to correlate strongly with tumor invasiveness [2], early metastasis [3], recurrence rate [4], and overall survival [5, 6]. However, Ki-67 assessment currently relies on core-needle or surgical biopsy followed by IHC, an invasive procedure that is painful, may delay therapy, and occasionally provokes adverse reactions [7]. Consequently, magnetic resonance imaging (MRI) has become the preferred imaging modality for breast cancer evaluation. T2-weighted imaging (T2WI) provides high-resolution anatomical information and reflects the tumor microenvironment, whereas dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) reveals detailed hemodynamic characteristics [8, 9]. Nevertheless, conventional qualitative interpretation fails to exploit the rich quantitative information latent in these images, particularly the quantitative features related to the Ki-67 proliferation index [10].
Radiomics can extract high-throughput imaging features to build predictive models, yet traditional whole-tumor radiomics analyses ignore intratumoral heterogeneity (ITH) [11–13]. Habitat analysis partitions tumors into biologically distinct subregions, thereby explicitly quantifying ITH and improving model performance. Deep learning, particularly convolutional neural networks (CNNs), offers another powerful approach by automatically capturing complex imaging patterns. Although fusion of radiomics and deep learning is promising, the relative merits of image-level versus feature-level fusion for Ki-67 prediction remain unexplored [14]. Furthermore, model interpretability remains a major barrier to clinical translation; the Shapley Additive Explanations (SHAP) analysis has demonstrated utility in identifying key features and explaining single-sample predictions across multiple medical scenarios [15–17].
Therefore, the aim of this study was two-fold: (1) to develop habitat-based radiomics and convolutional neural network (CNN) models, and (2) to systematically compare their efficacy—alongside image- and feature-level fusion strategies—for the non-invasive prediction of Ki-67 expression in breast cancer.

Materials and methods

Materials and methods

Study sample
We retrospectively included 254 patients with histologically confirmed breast cancer from two institutions, allocated to a training set (n = 142) and internal validation set (n = 60) from Center A, and an external test set (n = 52 from Center B, after applying inclusion/exclusion criteria (Fig. 1). MRI protocols differed between centers (T2WI: slice thickness 4.0 vs. 5 mm; DCE : 9 vs. 16 phases, slice thickness 1.5 vs. 2 mm), with detailed parameters in Appendix 1. Pathological assessment used IHC for ER, PR, HER-2, and Ki-67 (> 20% threshold). Two radiologists (5- and 10-year experience), blinded to clinical data, independently evaluated MRI features (tumor size, morphology, enhancement) per BI-RADS guidelines; discrepancies were resolved by a senior radiologist(25-year experience).

Image acquisition and masks segmentation
All images underwent N4 bias field correction and resampling to 1 × 1 × 1 mm isotropic voxels prior to segmentation. Tumor regions of interest (ROI) were manually delineated slice-by-slice by an attending radiologist (5-year experience) and quality-controlled by a senior radiologist (25-year experience) who independently re-segmented 30 random cases, retaining only radiomics features with intraclass correlation coefficient (ICC) ≥ 0.80. For radiomics analysis, cohort-level habitat clustering was performed by pooling multi-parametric voxel intensities (T2WI and DCE-MRI) from all training patients and applying K-means clustering (scikit-learn v1.2.0, K = 3–10); the optimal K was determined by the maximum Calinski-Harabasz (C-H) index to generate patient-specific habitat masks. Peritumoral ROIs were expanded by 5 mm and 10 mm in 3D using Onekey AI’s Mask Filling Tool. Image fusion in radiomics referred to concatenating intratumoral and peritumoral regions for unified feature extraction. For deep learning, two strategies were employed: (1) multi-slice 2D analysis using ROIs cropped from the largest tumor cross-section and adjacent superior/inferior slices from a single sequence; and (2) image fusion via stacking multi-parametric sequences as multichannel inputs after Z-score normalization. The complete workflow is shown in Fig. 2.

Feature extraction and selection
Per the IBSI (Image Biomarker Standardisation Initiative) guidelines, traditional radiomics features were extracted from five types of ROI (intratumoral, habitat subregions, 5-mm/10-mm peritumoral rings, and fused regions) using Pyradiomics (v3.0.1) [18]. With a fixed bin width of 5 and the application of LoG and wavelet filters, a total of 1,197 features were initially generated. These features were subsequently screened based on an intra-class correlation coefficient (ICC) threshold of ≥ 0.80 to ensure stability.For 2.5D deep learning, an ImageNet-pretrained ResNet-50 processed three adjacent slices (224 × 224 × 3) [19]. This 2.5D approach effectively integrates spatial information while maintaining the computational efficiency of 2D architectures [20]. Following ImageNet-based normalization, we performed full-parameter fine-tuning using a Stochastic Gradient Descent (SGD) optimizer. After fine-tuning, 2,048-dimensional features were extracted and reduced to 8 principal components per slice via PCA. All feature sets were Z-score normalized and filtered using t-test (p < 0.05) and Pearson correlation (r > 0.9). The least absolute shrinkage and selection operator (LASSO) regression was subsequently applied only to radiomics and deep learning features, with clinical variables excluded from this final selection step. The fusion model concatenated clinical, radiomics (post-ICC) and deep learning (post-PCA) features, reprocessed through the same pipeline.

Model construction and interpretability
A baseline clinical model from all variables was constructed using three algorithms—Support Vector Machine (SVM), Random Forest (RF), and Extra-Trees (ET)—optimized via grid search and five-fold cross-validation. This framework was extended to develop 48 radiomics models across regions/sequences and 12 deep learning models using Principal Component Analysis (PCA)-reduced features; Gradient-weighted Class Activation Mapping (Grad-CAM) applied to ResNet-50’s fourth layer localized decision-critical regions [21]. Discriminative features from optimal sub-models were integrated into three ensemble fusion predictors. The best configuration from internal validation underwent independent external testing for multicenter generalizability, with SHAP analysis quantifying feature contributions to enhance interpretability.

Statistical analysis
Statistical analysis used Python (v3.9.13). Clinical characteristics were compared using independent t-tests or Mann-Whitney U tests for continuous variables and Chi-square or Fisher’s exact tests for categorical data. Model performance was quantified via the area under the curve (AUC), 95% CI, sensitivity, specificity, and F-1 score. DeLong’s test compared AUCs across models, while calibration curves and decision curve analysis (DCA) evaluated model reliability and clinical utility. A two-tailed P < 0.05 denoted statistical significance.

Results

Results

Patient characteristics
The clinical and MRI characteristics of the 254 patients are summarized in Table 1. In the internal training and internal validation sets, PR, ER, and HER-2 status showed significant differences between high and low Ki-67 groups (p < 0.05); in the external test set, only PR and HER-2 status differed (p < 0.05).

Habitat identification
Unsupervised K-means clustering was applied to pooled multi-parametric voxel intensities from all training set patients to identify three optimal intratumoral habitats (k = 3) based on the peak C-H index (Fig. 3A). Using the cohort-derived cluster centroids, each patient’s tumor volume was partitioned into three subregions: Habitat 1 comprising a mean of 58.87% of voxels, Habitat 2 comprising 12.50%, and Habitat 3 comprising 28.63% across the cohort (Fig. 3B). These habitats effectively quantified intratumoral heterogeneity and were visualized as three ROIs on both T2WI and DCE-MRI sequences (Fig. 3C). Standalone habitat-based models achieved internal validation AUCs of 0.770 for T2WI and 0.791 for DCE-MRI, outperforming whole-tumor ROI models (AUCs: 0.762 and 0.772).

Feature selection
The multi-stage selection pipeline for the optimal fusion model refined an initial pool of 3,474 features—consisting of 18 clinical features, 3,432 traditional radiomics features and 24 deep learning features—down to 12 key predictors. For the radiomics component, 858 stable features (ICC ≥ 0.80) were retained from the original 1,197 features extracted per ROI. Simultaneously, deep learning features extracted from T2WI multi-slice inputs were reduced from 2,048 dimensions to 8 per slice via PCA, totaling 24 features across three slices. Following Z-score normalization and two-sample t-tests (p < 0.05), the multi-modal pool was narrowed to 195 significant features. Redundancy reduction via Pearson correlation (r > 0.9) further distilled the selection to 54 features. Finally, LASSO regression identified 12 discriminative predictors: 10 traditional radiomics features (primarily from DCE-MRI habitat and peritumoral regions) and 2 deep learning features from T2WI. Detailed reduction pathways for each best sub-model are documented in Appendix 1.
The development of the optimal fusion model utilized a systematic, multi-stage selection pipeline. First, 1,197 traditional radiomics features were extracted from four regions (three habitats and the 5-mm peritumoral zone). After stability assessment via ICC, 3,432 stable features (858 per region) were retained. Simultaneously, 2,048-dimensional features from three T2WI slices were reduced to 24 (8 components per slice) using PCA. This established a 3,474-feature multimodal pool, incorporating 18 clinical variables. Following Z-score normalization, two-sample t-tests (p < 0.05) identified 195 significant features. Redundancy reduction through Pearson correlation (r > 0.9) further narrowed the selection to 54 candidates. Finally, LASSO regression determined 12 key predictors: 10 traditional radiomics features (primarily from DCE-MRI Habitat and the peritumoral region) and 2 deep learning features from T2WI. This systematic approach ensures effective dimensionality reduction while prioritizing model interpretability. Detailed reduction pathways are documented in Appendix 1.

Model evaluation
The best clinical baseline model achieved a validation AUC of 0.666. The performance of 48 traditional radiomics and 12 deep learning sub-models across various sequences and ROIs is summarized in the Fig. 4 heatmap, with exhaustive metrics provided in Appendix 1. Among standalone configurations, the ET algorithm consistently outperformed SVM and RF. The top-performing traditional model was the DCE-MRI habitat + 5 mm peritumoral (ET) (AUC: 0.825), while the T2WI multi-slice model led the deep learning category (AUC: 0.804).The optimal performance achieved by each model category is detailed in Table 2. The multi-modal fusion model outperformed all individual modalities, reaching an internal validation AUC of 0.885 (95% CI: 0.787–0.984) and maintaining robust generalizability in the external cohort (AUC: 0.839, 95% CI: 0.727–0.951). Decision curve analysis demonstrated superior clinical utility across 30–95% threshold probabilities (Fig. 5), and calibration curves showed strong agreement between predicted and actual Ki-67 expression levels. DeLong tests (Fig. 6) confirmed that the fusion model outperformed all individual sub-models in the training set (all p < 0.05). In the internal validation set, the fusion model maintained a significant advantage over the clinical model (P = 0.039), but showed no statistically significant difference compared to the standalone radiomics (P = 0.301) and deep learning (P = 0.327) models.

Interpretability analysis
SHAP analysis identified 12 key features (10 radiomics, 2 deep learning) driving the fusion model (Fig. 7). Radiomics features originated primarily from DCE-MRI habitat and peritumoral regions, with Gray Level Non-Uniformity contributing most, followed by peritumoral Size Zone Non-Uniformity and Kurtosis. Habitat features (Kurtosis, Busyness, Contrast) and three additional radiomics features captured intratumoral heterogeneity and enhanced stability. Two deep learning features from T2WI multi-slice inputs (DL 3.1, DL 3.2) ranked among the top five, providing complementary CNN-derived patterns. Grad-CAM visualization of the T2WI deep learning model confirmed spatial overlap between its decision-critical regions and these radiomic-defined habitats/peritumoral zones, validating that the fusion model prioritized heterogeneous subregions for Ki-67 prediction.

Discussion

Discussion
In this study, traditional radiomics models outperformed the deep learning models in predicting Ki-67 expression within the internal validation set. This may be due to the direct pathophysiological relevance of hand-designed imaging features, which capture spatial heterogeneity of the tumor microenvironment linked to Ki-67 activity. Deep learning models might lose key spatial information during feature extraction, especially with limited data [22, 23].
Imaging models significantly outperformed the clinical baseline. Previous studies highlight the importance of peritumoral features in tumor behavior and treatment response. This study employed three machine learning algorithms to integrate radiomics features from varying tumor region sizes, revealing spatial heterogeneity characteristics. Models combining intratumoral and 5 mm peritumoral regions outperformed those with 10 mm extensions. For instance, in T2WI, the best model integrating intratumoral and 5 mm peritumoral features achieved an AUC of 0.811 (95% CI: 0.680–0.942) versus 0.786 (95% CI: 0.652–0.921) for 10 mm peritumoral models. Wang et al. [24] reported that a 5 mm peritumoral model achieved an AUC of 0.820 (95% CI: 0.760–0.880), compared to an AUC of 0.798 (95% CI: 0.730–0.866) for a 10 mm peritumoral model. Additionally, Zhang et al. [25] found that 5 mm peritumoral models also outperformed 10 mm models in predicting HER2 status. These findings suggest that between 5 mm and 10 mm, the shorter distance appears to better capture the heterogeneous distribution of the tumor microenvironment. Future studies should systematically evaluate incremental distances to identify an optimal, potentially tumor-size-dependent, threshold.
In this study, a K-means clustering method optimized by C-H index was used for habitat analysis, identifying three subregions likely reflecting differences in tumor metabolism, blood flow, and cell density. The model based solely on intra-tumor habitat features achieved AUCs of 0.770 (95% CI: 0.621–0.919) for T2WI and 0.791 (95% CI: 0.667–0.914) for DCE-MRI on the internal validation set, outperforming the entire tumor internal ROI (AUC: 0.762 and 0.772). Ye et al. [26] found the habitat region model for predicting pCR in Non-Small Cell Lung Cancer(NSCLC) had an AUC of 0.781 compared to 0.723 for the whole-tumor model. Similarly, Wang et al. [27] showed the habitat region model for High-Grade Serous Ovarian Cancer(HGSOC) achieved an AUC of 0.808 versus 0.749 for the whole-tumor model.
We systematically compared feature fusion and image fusion strategies for predicting Ki-67 expression in invasive breast cancer. Feature fusion models achieved a mean validation AUC of 0.791, a 10.8% improvement over image fusion’s 0.714. This aligns with the ensemble learning principle demonstrated by Liu et al. [28], where classifier fusion improved breast cancer grading accuracy. The training-validation gap decreased from 0.077 to 0.017, showcasing enhanced generalization. Specifically, for T2WI “intra-tumoral + 10 mm peritumoral,” ET demonstrated superior performance: feature fusion achieved a validation AUC of 0.786 (95% CI: 0.652–0.921) and training AUC of 0.839 (95% CI: 0.771–0.906), while image fusion had a validation AUC of 0.650 (95% CI: 0.474–0.826) and training AUC of 0.831 (95% CI: 0.761–0.902), representing a 20.9% improvement. Overfitting reduced from 0.181 to 0.053. In DCE-MRI, ET achieved a validation AUC of 0.816 (95% CI: 0.653–0.978) with feature fusion, compared to 0.746 (95% CI: 0.584–0.909) with image fusion—a 9.4% improvement .
Deep learning features, capturing tumor microstructure and high-dimensional nonlinear patterns, comprised two of the top nine key features, enhancing prediction robustness and accuracy. This aligns with Bhuiyan et al. [29] ‘s glioma research, where T2-FLAIR deep learning features achieved 0.91 prediction accuracy for Ki-67. Traditional DCE-MRI radiomics features, such as volume ratios and texture, are crucial for assessing tumor invasiveness and treatment response, as shown by Xv et al. [30] in clear cell renal cell carcinoma studies, where their integrated model achieved AUCs of 0.858 and 0.849. Similarly, our integrated model demonstrated superior performance with an internal validation AUC of 0.885 and an external test AUC of 0.839, confirming its clinical potential.
This study had some limitations. The external test cohort (n = 52) was relatively small, yielding wide 95% confidence intervals (0.727–0.951) that reflect statistical uncertainty. Despite using a multicenter dataset, bias and patient heterogeneity persist, impacting model accuracy and generalization. Manual tumor segmentation, though precise, is time-consuming. Future work should focus on developing semi-automated or fully automated ROI delineation algorithms and integrating deep learning-based adaptive noise reduction to enhance accuracy. Prospective validation in larger multicenter trials (≥ 200 patients) is warranted to confirm clinical utility and narrow confidence intervals. Deep learning models require extensive training data; thus, incorporating more patient imaging data is crucial for better generalization. The retrospective design limits causal inference: prospective studies are needed for stronger evidence. Standardizing image quality across multi-center MRI data with varied scanner settings remains challenging, potentially affecting radiomics feature consistency.
In conclusion, we developed and validated an explainable machine learning model integrating traditional radiomics and deep learning features. This model accurately predicts Ki-67 expression in breast cancer patients, enhancing clinicians’ understanding of its decision-making process and aiding in personalized treatment planning.

Supplementary Information

Supplementary Information
Below is the link to the electronic supplementary material.

출처: PubMed Central (JATS). 라이선스는 원 publisher 정책을 따릅니다 — 인용 시 원문을 표기해 주세요.

🏷️ 같은 키워드 · 무료전문 — 이 논문 MeSH/keyword 기반

🟢 PMC 전문 열기