본문으로 건너뛰기
← 뒤로

A deep learning model based on multiphase DCE-MRI for preoperative prediction of Ki-67 expression in breast cancer.

1/5 보강
Frontiers in oncology 📖 저널 OA 100% 2021: 15/15 OA 2022: 98/98 OA 2023: 60/60 OA 2024: 189/189 OA 2025: 1004/1004 OA 2026: 620/620 OA 2021~2026 2026 Vol.16() p. 1776121
Retraction 확인
출처

Fu XM, Zhang WG, Wen L, Li W, Yang Y, Zhang D

📝 환자 설명용 한 줄

[OBJECTIVE] This retrospective study was to develop and validate a deep learning model based on multi-phase Dynamic Contrast-Enhanced Magnetic Resonance Imaging (DCE-MRI) for non-invasive and accurate

🔬 핵심 임상 통계 (초록에서 자동 추출 — 원문 검증 권장)
  • 표본수 (n) 282

이 논문을 인용하기

↓ .bib ↓ .ris
APA Fu XM, Zhang WG, et al. (2026). A deep learning model based on multiphase DCE-MRI for preoperative prediction of Ki-67 expression in breast cancer.. Frontiers in oncology, 16, 1776121. https://doi.org/10.3389/fonc.2026.1776121
MLA Fu XM, et al.. "A deep learning model based on multiphase DCE-MRI for preoperative prediction of Ki-67 expression in breast cancer.." Frontiers in oncology, vol. 16, 2026, pp. 1776121.
PMID 41924598 ↗

Abstract

[OBJECTIVE] This retrospective study was to develop and validate a deep learning model based on multi-phase Dynamic Contrast-Enhanced Magnetic Resonance Imaging (DCE-MRI) for non-invasive and accurate prediction of Ki-67 expression, a key proliferation biomarker critical for treatment decision-making and prognostic evaluation in breast cancer.

[METHODS] 404 breast cancer patients who underwent preoperative DCE-MRI within 1 week of surgery were enrolled and randomly split into training (n = 282) and test (n = 122) sets in a 7:3 ratio. Multi-phase DCE-MRI sequences at 3.0T: pre-contrast phase, early phase (64 seconds), peak phase (128 seconds), and late phase (320 seconds) after contrast agent administration. DenseNet-121 was used to build four single-phase deep learning models (SP_DL1-SP_DL4). Their output probabilities (DL signatures) were combined using gradient boosting decision trees (GBDT) to create a multi-phase model (MP_GBDT). Clinical predictors were integrated with DL signatures to build a fused model (CMP_GBDT). Model interpretability was assessed using Grad-CAM and SHAP. Independent samples t-test or Mann-Whitney test for continuous variables; test or Fisher's exact test for categorical variables; DeLong test for AUC comparisons. ≤ 0.05 was considered statistically significant.

[RESULTS] In the test set, single-phase DL models achieved AUCs of 0.712 (SP_DL1), 0.671 (SP_DL2), 0.761 (SP_DL3), and 0.664 (SP_DL4). The multi-phase DL model (MP_GBDT) achieved an AUC of 0.810, outperforming all single-phase models. The fused model (CMP_GBDT) reached a comparable AUC of 0.814, demonstrating no statistically significant improvement over MP_GBDT. SHAP identified SP_DL3 signature as the top contributor in both MP_GBDT and CMP_GBDT models.

[CONCLUSIONS] The MP_GBDT model accurately and non-invasively predicted Ki-67 expression in breast cancer, with SP_DL3 signature being the main contributor.

🏷️ 키워드 / MeSH 📖 같은 키워드 OA만

📖 전문 본문 읽기 PMC JATS · ~80 KB · 영문

Introduction

1
Introduction
Breast cancer remains the most commonly diagnosed malignancy and the leading cause of cancer-related mortality among women worldwide (1–3), exhibiting marked heterogeneity that leads to considerable variations in biological behavior, clinical prognosis and response to treatment. As a well-established marker of cell proliferation, Ki-67 is closely linked to tumor proliferative activity and aggressiveness in breast cancer (4). Consistent evidence has demonstrated elevated Ki-67 expression is correlated with poorer clinical outcomes, increased risk of recurrence, and adverse pathological characteristics, including poor differentiation and axillary lymph node metastasis (5–7). The St Gallen International Expert Consensus (8) and Chinese Anti-Cancer Association (CACA) Guidelines for Breast Cancer Diagnosis and Treatment (2024 Edition) emphasize its clinical relevance by designating Ki-67 as a key biomarker for guiding personalized therapy, enabling stratification of “low-risk” tumors suitable for milder treatments versus “high-risk” tumors requiring intensive interventions. A widely adopted threshold of 20% is used to distinguish low and high Ki-67 expression, where tumors with Ki-67 ≤20% are classified as low-risk, while those with Ki-67 >20% are considered high-risk (9). For low-risk tumors, overtreatment should be avoided. Endocrine monotherapy is often sufficient for those with hormone receptor-positive disease, and adjuvant chemotherapy may be omitted in early-stage cases; for high-risk tumors, more intensive interventions are necessary to reduce the risk of recurrence. These may include combined chemoendocrine therapy, neoadjuvant chemotherapy, or dose-dense regimens. High Ki-67 expression (Ki-67 >20%) is also significantly associated with increased risk of endocrine therapy resistance and disease recurrence (10, 11), which further supports intensive treatment for this subgroup.
However, the assessment of Ki-67 expression primarily relies on core needle biopsy (CNB) and postoperative histopathological examination. The former is invasive and prone to sampling bias, while the latter provides delayed results, thereby failing to offer timely clinical guidance (12). Consequently, the accurate preoperative determination of Ki-67 expression levels is crucial for evaluating the aggressiveness of breast cancer. Although histopathology is the gold standard for Ki-67 detection, it is a postoperative invasive examination that cannot provide real-time preoperative guidance for treatment planning. In contrast, DCE-MRI-based Ki-67 prediction is a non-invasive preoperative imaging method that not only avoids the sampling bias and invasiveness of core needle biopsy (CNB) but also enables timely and personalized treatment decision-making for breast cancer patients before surgery, such as the formulation of neoadjuvant therapy regimens. This assessment provides vital information for developing personalized treatment strategies and has the potential to improve patient outcomes.
Magnetic resonance imaging (MRI) occupies a well-established role in personalizing breast cancer management (13, 14). Specifically, dynamic contrast-enhanced MRI (DCE-MRI) provides a non-invasive means to visualize tumor hemodynamics by tracking tissue perfusion and microvascular permeability. This functional imaging technique yields critical information for characterizing lesions and evaluating treatment response, thereby complementing clinical assessments (15). Numerous studies have applied machine learning (ML) methods to develop models based on DCE-MRI for predicting Ki-67 expression levels, molecular subtypes, response to neoadjuvant chemotherapy (NAC), and patient prognosis (16, 17). For instance, Zhang et al. (18) developed a multitask machine learning model based on intratumoral and peritumoral radiomics features from early DCE-MRI that can effectively distinguish Luminal from non-Luminal molecular subtypes of breast cancer. Shi et al. (9) constructed a multivariable logistic-regression model integrating intratumoral heterogeneity (ITH) index extracted from multiparametric MRI, a traditional radiomics score, and clinicopathologic variables to predict pathologic complete response (pCR) after neoadjuvant chemotherapy, and it performed well in multi-center validation. However, many prior investigations have been constrained by their dependence on subjective imaging features (19), which vary with radiologist experience (20), and the features adopted are mostly manually designed radiomics features (21, 22).
Deep learning (DL) technology enables the automatic extraction of high-level, abstract data features through multi-layered neural network architectures, thereby minimizing dependence on hand-crafted feature engineering and effectively capturing complex, task-relevant patterns in visual data. In terms of Ki-67 prediction, some research has developed a radiomics-deep learning fusion model based on multi-parameter MRI, whose preoperative prediction performance is superior to that of traditional radiomics methods (23). However, the majority of existing studies rely solely on single-phase DCE-MRI, thereby limiting the comprehensive acquisition of temporal dynamics during the contrast agent’s passage, which is critical for evaluating tumor microenvironment heterogeneity (24). Therefore, further research is warranted to explore the non-invasive assessment of Ki-67 expression in breast cancer through deep learning models combined with multi-phase DCE-MRI. Among various deep learning architectures, DenseNet121 emerges as a particularly suitable framework for our study, owing to its distinctive dense connectivity design (25). In this architecture, each convolutional layer receives direct input from all previous layers, which promotes effective feature reuse, alleviates the vanishing gradient problem, and strengthens the model’s capacity to capture subtle spatiotemporal dependencies in multi-phase imaging sequences. These attributes render DenseNet121 highly advantageous in clinical scenarios characterized by limited data availability, as it can comprehensively exploit the rich spatiotemporal information inherent in multi-phase DCE-MRI without requiring excessively large datasets.
This study aims to develop a preoperative prediction model for Ki-67 expression levels in breast cancer using multi-phase DCE-MRI and the DenseNet121 deep learning architecture. By leveraging the rich spatiotemporal information inherent in multi-phase DCE-MRI sequences, we seek to establish an accurate and reliable non-invasive method for evaluating Ki-67 expression, thereby facilitating personalized treatment planning in breast cancer clinical management.

Materials and methods

2
Materials and methods
2.1
Patients
This study was conducted as a retrospective analysis approved by the Ethics Committee of the Xinqiao Hospital affiliated with Army Medical University, with informed consent waived. Female patients with breast cancer who were diagnosed between June 2017 and December 2024 were enrolled in this study. Inclusion criteria were: (1) age ≥ 18 years; (2) histopathological confirmation of breast cancer with available Ki-67 expression levels; (3) MRI examination performed within one week prior to initial treatment. Exclusion criteria: (1) poor image quality; (2) presence of other primary malignancies; (3) history of anti-cancer therapy for breast cancer before MRI, including neoadjuvant treatment and adjuvant therapy for previous or recurrent breast cancer (e.g., chemotherapy, radiotherapy, endocrine therapy, targeted therapy). A total of 404 patients were ultimately included and randomly assigned to the training and test sets at a 7:3 ratio, based on the time of their MRI examination. The patient enrollment flowchart is presented in Figure 1.
The baseline data of patients were collected, including age, menopausal status, lesion location (left or right breast), number of lesions (single or multiple), maximum tumor diameter, axillary lymph node (ALN) status.

2.2
Ki-67 assessment
The Ki-67 proliferation index was assessed using immunohistochemistry (IHC). Tissue specimens were fixed in 10% neutral buffered formalin and embedded in paraffin. Serial sections with a thickness of 4 μm were prepared. The sections were incubated with a primary anti-Ki-67 antibody (Maxim Biomedical, Inc., Rockville, Maryland, USA), followed by a horseradish peroxidase (HRP)-conjugated secondary antibody. Immunodetection was performed using 3,3’-diaminobenzidine (DAB) as the chromogen, yielding a brown nuclear precipitate. Nuclei exhibiting distinct brown staining were considered positive for Ki-67 expression. The Ki-67 index was calculated by quantifying the ratio of positively stained cells to the total number of cells. In this study, patients were stratified into low Ki-67 expression (Ki-67 index ≤ 20%) and high Ki-67 expression (Ki-67 index > 20%) groups, referring to the 2015 St. Gallen Consensus (8). For patients with multiple lesions, the histopathological assessment of the Ki-67 index was performed using tissue from the largest lesion.

2.3
MR examination and image preprocessing
MR scan were performed on a 3.0-T scanner (Philips Ingenia) with a 10-channel phased-array breast coil. Patients were placed in the prone position, with both breasts naturally pendant into the coil and the chest wall closely attached to the coil. The scanning range covered both breasts and axillae. Before contrast injection, an axial T1-weighted turbo spin-echo sequence was acquired with the following parameters: TR = 540 ms, TE = 8.0 ms, flip angle = 90°, FOV = 340 × 340 mm², matrix = 528 × 528, slice thickness = 5 mm, bandwidth = 364 Hz, acquisition time about 2 min. Dynamic contrast-enhanced imaging used a fat-suppressed eTHRIVE sequence with the following parameters: TR = 4.4 ms, TE = 2.2 ms, flip angle = 12°, FOV = 300 × 300 mm², matrix = 768 × 768, slice thickness = 2 mm, bandwidth = 538 Hz, temporal resolution = 64s, yielding 6–8 post-contrast phases. Gadopentetate dimeglumine (Kangchen, Guangzhou, China) was administered at 0.1 mmol/kg via the antecubital vein at 2 mL/s followed by a 20 mL saline flush.
Four key time points were selected from the multi-phase DCE-MRI sequences in this study: pre-contrast phase, early phase (64 seconds), peak phase (128 seconds), and late phase (320 seconds) after contrast agent administration. Following anonymization of the raw DICOM data, isotropic resampling to 1 mm³ voxels, and N4 bias field correction, image registration was performed using the pre-contrast phase as the reference. This procedure minimized variations due to scanning parameters and ensured consistency and comparability of subsequent quantitative imaging features.

2.4
Tumor segmentation
The segmentation of volumes of interest (VOIs) was performed using ITK-SNAP (http://www.itksnap.org, version 3.8.0), with the maximum cross-sectional areas selected as the model input. Two radiologists with 7 and 8 years of radiological experience respectively, manually delineated the VOIs independently while being blinded to clinical and pathological information. To test the reliability of our segmentation protocol, 30 cases were randomly chosen from the study cohort for inter-observer and intra-observer agreement assessment. One radiologist also repeated the segmentation after a 4-week interval. The Dice Similarity Coefficient (DSC) was used to evaluate the consistency between different segmentations. The DSC was 0.91 for inter-observer agreement and 0.89 for intra-observer agreement, which indicated that our VOI segmentation was stable and reproducible. When multiple suspicious breast lesions were present on MRI, only the largest one was selected for analysis.

2.5
Model development and validation
Figure 2 illustrates the overall workflow. We constructed four single-phase deep learning (SP_DL) models using DenseNet121 as the backbone network, pretrained on ImageNet. Each model processed the maximum cross-sectional lesion area from one of the four phases: SP_DL1 (pre-contrast), SP_DL2 (early-phase), SP_DL3 (peak-phase), and SP_DL4 (late-phase). To mitigate overfitting, all models were trained for 50 epochs with a batch size of 32, using data augmentation (random horizontal flipping, ± 10° rotation, Gaussian noise injection, brightness and contrast adjustment) and normalization. The optimization employed Stochastic Gradient Descent (SGD) with an initial learning rate of 0.01 and a momentum of 0.90, minimizing the Binary Cross Entropy with Logits Loss (BCEWithLogitsLoss). The output probability from each SP_DL model constituted its respective DL signature. These four signatures were then integrated to build a multi-phase predictive model using Gradient Boosting Decision Trees (GBDT) with GBDT hyperparameters set as n_estimators=100, max_depth=3, learning_rate=0.05, subsample=0.8, max_features=‘sqrt’, min_samples_leaf=10, min_samples_split=20, random_state=42. Finally, we applied Gradient-weighted Class Activation Mapping (Grad-CAM) to the DenseNet121 models to visually interpret their decision-making process and identify the image regions most critical for classification.
The diagnostic performance of the models was evaluated by analyzing receiver operating characteristic (ROC) curves. The area under the curve (AUC) and its 95% confidence interval (CI) were derived to quantify discriminatory power, with differences in AUC between models compared using the DeLong test. In addition, we computed accuracy (ACC), sensitivity (SEN), specificity (SPE), and the F1 score as supplementary performance metrics. Model robustness and clinical applicability were further assessed using calibration curves and decision curve analysis, respectively. To interpret the model decisions, we applied SHapley Additive exPlanations (SHAP) analysis. The computed SHAP values were visualized in two ways: a summary bar plot illustrating global feature importance, and a beeswarm plot showing the distribution and impact of individual feature values.

2.6
Statistical analysis
Continuous variables were summarized as mean ± standard deviation and compared using either the independent samples t-test or the Mann-Whitney U test, with the choice based on data distribution. Categorical variables, expressed as frequencies and percentages, were compared using the χ² test or Fisher’s exact test, depending on expected cell frequencies. Differences in the area under the curve (AUC) between models were evaluated with the DeLong test. A two-tailed p-value of ≤ 0.05 defined statistical significance for all tests. All analyses were conducted in Python 3.10, utilizing key libraries including pandas, numpy, and scipy.stats.

Results

3
Results
3.1
Patients characteristics
A total of 404 breast cancer patients were randomly allocated to a training set (n = 282) and a test set (n = 122). The proportion of patients with high Ki-67 expression was comparable between the training set (74.1%, 209/282) and the test set (73.8%, 90/122). As summarized in Table 1, no statistically significant differences were observed between the two sets regarding age, menopausal status, lesion location, number of lesions, maximum tumor diameter, or axillary lymph node (ALN) status (training set: p > 0.05; test set: p > 0.05). Univariate and multivariate logistic regression analyses identified ALN status as an independent predictor of high Ki-67 expression (p < 0.05; Table 2), which was subsequently used to construct the clinical model using GBDT (C_GBDT), achieving an AUC of 0.587 on the test set.

3.2
Performance of single-phase deep learning models
We conducted a comparison among three architectures, namely DenseNet121, ResNet101, and GoogLeNet, to identify the optimal model. The findings indicated that DenseNet121 exhibited the highest AUC in each single-phase of the test set and the minimal disparity between training and test performance (Table 3). Consequently, we chose it as the base architecture. The performance of SP_DL models on the test set is summarized in Table 4. Among them, SP_DL3 exhibited the most robust overall performance (all p < 0.05; Table 5), achieving the highest AUC of 0.761 (95% CI: 0.657–0.854) and the highest accuracy of 0.787, with well-balanced sensitivity (0.811) and specificity (0.719).
Grad-CAM heatmaps confirmed that the models’ decision-making process was focused on the intratumoral region. Figure 3 shows representative Grad-CAM visualizations for two patients: Patient A with high Ki-67 expression and Patient B with low Ki-67 expression.

3.3
Performance of fusion deep learning models
Four DL signatures showed a significant difference between the high and low Ki-67 expression groups (p < 0.001) (Supplementary Table 1). The integration of the four DL signatures enabled the construction of a multi-phase fusion model based on gradient boosting decision trees (MP_GBDT), which achieved an AUC of 0.810 in the independent test set. The DeLong test (Table 5) confirmed that this AUC was significantly superior to those of all SP_DL models (all p < 0.05). Subsequently, the four DL signatures were combined with ALN status to develop a clinical multi-phase GBDT (CMP_GBDT) model. As summarized in Table 4, the CMP_GBDT model attained an AUC of 0.814 (95% CI: 0.724–0.896), an accuracy of 0.787, a sensitivity of 0.800, and a specificity of 0.750 on the test set, demonstrating performance comparable to that of the MP_GBDT model.
The calibration curves demonstrated good agreement between the predictions and actual outcomes for both the CMP_GBDT (ECE = 0.045) and MP_GBDT (ECE = 0.054) models. Decision curve analysis (DCA) indicated that both models provided higher net benefits than the “treat all” and “treat none” strategies across a threshold probability range of 0.3 to 0.9. The ROC, calibration, and decision curves for the models are presented in Figure 4.
Evaluation of feature importance via SHAP analysis (Figure 5) demonstrated that the SP_DL3 signature was the most significant contributor to the predictions of both the MP_GBDT and CMP_GBDT models. The global feature importance, as ranked by the mean absolute SHAP values in the bar plot, shows the relative contribution of each feature (Figures 5a, c). Furthermore, the beeswarm plot provides a detailed view of how the value of each feature affects the model prediction, displaying the density and direction of their effects across all samples (Figures 5b, d).

Discussion

4
Discussion
This study developed and validated predictive models for preoperatively estimating Ki-67 expression in breast cancer by integrating clinical data with deep learning (DL) signatures extracted from DCE-MRI. Among the clinical variables assessed, axillary lymph node (ALN) status was identified as an independent predictor. The single-phase DL model trained on peak-phase DCE-MRI images (SP_DL3) outperformed other single-phase models. Moreover, both the multi-phase DL fusion model (MP_GBDT) and the clinical-multi-phase fusion model (CMP_GBDT) demonstrated superior performance compared to all single-phase DL models, with no significant difference in performance between the two fusion approaches. SHAP analysis revealed that the SP_DL3 signature was the most influential feature in the fusion models.
Our study employed DenseNet121 as the backbone network, leveraging its dense connectivity to promote feature reuse and mitigate gradient vanishing. These properties are especially beneficial in data-scarce scenarios (26), such as ours with limited training samples. We compared its performance against other common architectures in medical imaging, namely ResNet101 and GoogLeNet. In the context of our small sample size, DenseNet-121 achieved higher predictive performance and exhibited superior training-test stability (Table 3; Supplementary Figures 1-S8) compared to ResNet101 and GoogLeNet, which is critical for our small-sample dataset. Its dense connectivity design, which promotes feature reuse and mitigates gradient vanishing, reduces overfitting risk. This is an advantage that the alternative architectures lacked in our scenario. Thus, DenseNet-121 was selected as the backbone network for subsequent multi-phase fusion. To validate the models’ focus, we generated Grad-CAM visualizations, which confirmed that attention maps for all single-phase models were appropriately concentrated on the intratumoral region, thereby providing interpretable support for their decision-making.
Among the single-phase DL models, the model trained on peak-phase DCE-MRI images (SP_DL3) demonstrated superior performance, underscoring the distinctive information embedded within the peak enhancement phase. The enhanced predictive power of peak-phase deep learning features for Ki-67 expression can be explained by their ability to capture the maximal state of tumor microvascular permeability and perfusion, two fundamental pathological characteristics closely linked to Ki-67-mediated cellular proliferation. High Ki-67 expression reflects active tumor cell proliferation, which typically promotes vigorous angiogenesis and increased microvascular density. This pathophysiological process results in peak contrast agent uptake during the delayed phase at approximately 128 seconds. In contrast, the early phase at 64 seconds primarily represents initial contrast perfusion, where tumor enhancement is not yet fully achieved. The late phase at 320 seconds, however, coincides with contrast washout and thus fails to accurately reflect proliferation-related perfusion dynamics. Therefore, the peak phase serves as the most informative temporal window for assessing Ki-67-associated tumor biological behavior in DCE-MRI–based studies. Previous studies have also indicated that the peak phase of DCE-MRI effectively captures tumor angiogenesis. And this aligns with the physiological premise that the peak enhancement phase of DCE-MRI is particularly effective in capturing tumor angiogenesis (27). The work of Xiao et al. (28) systematically demonstrated that quantitative parameters extracted from this peak phase strongly correlate with angiogenic activity in invasive breast cancer. However, these studies mostly relied on imaging features or traditional radiomics. In contrast, this study combines this physiological mechanism with deep learning, enabling SP_DL3 to achieve higher accuracy in Ki-67 prediction.
The multi-phase deep learning model achieved a statistically significant performance improvement (AUC = 0.810) over all single-phase models (all p < 0.01). This result underscores the value of integrating temporal information, as the distinct phases collectively provide a more holistic representation of the tumor microenvironment, leading to more robust predictions. Our findings align with the growing consensus on multi-phase approaches. For example, Ma et al. (29) reported that a multi-phase DCE-MRI radiomics model outperformed single-phase versions in predicting lymphovascular invasion. Likewise, studies by Luo et al. (13) and Zhang et al. (30) demonstrated that fusing dynamic or multi-parametric data enables superior tumor characterization by capturing complementary aspects of vascularity and treatment response.
Multivariate logistic regression analysis identified axillary lymph node (ALN) status as an independent predictor of Ki-67 expression, consistent with previous studies suggesting that high proliferative activity is associated with early metastatic dissemination (5, 31). This finding further supports the biological link between tumor proliferation and metastatic potential (10). However, in our study, the clinical model based solely on ALN status (C_GBDT) demonstrated only moderate predictive performance, with an AUC of 0.587. In contrast, the fusion model integrating ALN status with four single-phase DL signatures (CMP_GBDT) achieved an AUC of 0.814, which was comparable to the model using only those signatures (MP_GBDT, AUC = 0.810). No statistically significant difference was observed between the two models (DeLong test, p = 0.915), suggesting that the inclusion of ALN status did not provide additional predictive value beyond the multi-phase DL signatures. These results imply that DL signatures derived from multi-phase DCE-MRI may have inherently captured proliferation-related biological information embedded in ALN status. Both fusion models exhibited good calibration and demonstrated superior clinical net benefit across a wide range of decision thresholds compared to the default strategy. These models effectively identified patients with high Ki-67 expression, enabling accurate risk stratification to guide treatment decisions: patients classified as low-risk by the model may be candidates for de-escalated therapy, whereas those identified as high-risk should be considered for more intensified interventions. In summary, our findings support the individualized adjustment of neoadjuvant and adjuvant treatment intensity based on predicted proliferative activity (32).
SHAP analysis quantified SP_DL3 as the primary driver of predictions in the integrated models, highlighting the dominance of peak-phase information. Based on cooperative game theory, SHAP inherently accounts for potential multicollinearity among DCE-MRI phase signatures. It estimates the marginal contribution of each deep learning signature through iterative feature permutation, separates overlapping predictive information across phases, and ensures that the resulting feature importance corresponds to the independent predictive value of each phase. The beeswarm plots further illustrate how changes in feature values influence the predicted probability of high Ki-67 expression. In these plots, high feature values marked by red dots consistently indicate a high predicted probability close to 1 for high Ki-67 expression, while blue dots represent low prediction probability near 0. Together, this SHAP explainability and the Grad-CAM attention maps create a transparent and interrogable system, effectively mitigating the “black box” problem. This comprehensive interpretability is essential for building clinician confidence and fosters the translatability of our artificial intelligence (AI) methodology into real-world practice.
DCA confirmed that both fusion models provided favorable net clinical benefit across a threshold probability range from 0.3 to 0.9. To support clinical decision-making for patients with borderline biopsy results, we determined a preliminary clinical cutoff using the maximum Youden’s Index from the training set of the CMP_GBDT model, which was 0.739 (sensitivity = 0.890, specificity = 0.849). When the model’s predicted probability of high Ki-67 expression exceeds this optimal threshold, clinicians may place greater weight on the model prediction rather than borderline CNB results. This predefined cutoff was further validated in an independent test set and showed stable performance. However, prospective multi-center validation is still needed before this approach can be adopted into routine clinical practice. Last but not least, it is important to clarify the clinical role of our multi-phase DCE-MRI deep learning model. This model is not designed to replace histopathology, the gold standard for Ki-67 expression assessment, but to serve as a valuable preoperative auxiliary tool. On the one hand, our model enables non-invasive and rapid evaluation of Ki-67 expression. On the other hand, by focusing on the largest tumor cross-section rather than localized tissue sampling, our deep learning approach provides a relatively comprehensive view of tumor heterogeneity. In this way, it helps address the invasiveness, sampling bias, and timeliness limitations of pathological detection. Our approach can improve preoperative risk stratification for breast cancer patients and support more rational decisions for personalized treatment in clinical practice. The good calibration and high net benefit of the model further confirm its potential for clinical translation.
This study has several limitations. First, the sample size was relatively small; thus, expanding the cohort in future studies is essential to improve statistical power and generalizability. Second, the study was conducted retrospectively at a single institution. The use of historical data may introduce selection bias, and future efforts should involve prospective, multi-center validation. Third, the deep learning analysis relied only on DCE-MRI images. Incorporating additional MRI sequences, such as DWI and T2WI, could provide a more comprehensive tumor characterization and potentially improve diagnostic accuracy.
In conclusion, our study shows that a deep learning fusion model based on multi-phase DCE-MRI allows non-invasive and accurate preoperative prediction of Ki-67 status in breast cancer. Two key innovations are central to our model design. First, the deep learning module was specifically tailored to extract temporal features from multi-phase DCE-MRI. Second, the automatically learned deep features were fed into a GBDT model, which retains the strong feature learning capability of deep learning while effectively mitigating overfitting and enhancing model interpretability. The integration of explainable AI techniques such as Grad-CAM and SHAP further improves the transparency of the model’s decision-making process. Overall, This model helps address some of the clinical limitations of pathological Ki-67 evaluation and has strong potential as a preoperative auxiliary tool for personalized management of breast cancer.

출처: PubMed Central (JATS). 라이선스는 원 publisher 정책을 따릅니다 — 인용 시 원문을 표기해 주세요.

🏷️ 같은 키워드 · 무료전문 — 이 논문 MeSH/keyword 기반

🟢 PMC 전문 열기