본문으로 건너뛰기
← 뒤로

Machine Learning Model Based on Multiparametric MRI for Distinguishing HER2 Expression Level in Breast Cancer.

1/5 보강
Current oncology (Toronto, Ont.) 📖 저널 OA 100% 2021: 2/2 OA 2022: 9/9 OA 2023: 10/10 OA 2024: 22/22 OA 2025: 104/104 OA 2026: 133/133 OA 2021~2026 2026 Vol.33(1)
Retraction 확인
출처

PICO 자동 추출 (휴리스틱, conf 2/4)

유사 논문
P · Population 대상 환자/모집단
678 patients from two centers were included, with Center 1 divided into training and internal test sets and Center 2 serving as an external test set.
I · Intervention 중재 / 시술
추출되지 않음
C · Comparison 대조 / 비교
추출되지 않음
O · Outcome 결과 / 결론
Higher task-specific model scores were associated with shorter DFS in Task 1 ( = 0.037) and longer DFS in Task 2 ( = 0.046). MRI-based machine learning models can noninvasively stratify HER2 expression levels, with potential for prognostic stratification and clinical application.

Chen Y, Liu W, Tang W, Kong Q, Chen S, Liu S

📝 환자 설명용 한 줄

This study aimed to develop machine learning models based on conventional MRI features to classify HER2 expression levels in invasive breast cancer and explore their association with disease-free surv

이 논문을 인용하기

↓ .bib ↓ .ris
APA Chen Y, Liu W, et al. (2026). Machine Learning Model Based on Multiparametric MRI for Distinguishing HER2 Expression Level in Breast Cancer.. Current oncology (Toronto, Ont.), 33(1). https://doi.org/10.3390/curroncol33010053
MLA Chen Y, et al.. "Machine Learning Model Based on Multiparametric MRI for Distinguishing HER2 Expression Level in Breast Cancer.." Current oncology (Toronto, Ont.), vol. 33, no. 1, 2026.
PMID 41590373 ↗

Abstract

This study aimed to develop machine learning models based on conventional MRI features to classify HER2 expression levels in invasive breast cancer and explore their association with disease-free survival (DFS). A total of 678 patients from two centers were included, with Center 1 divided into training and internal test sets and Center 2 serving as an external test set. Random Forest models were trained to distinguish HER2-positive vs. HER2-negative (Task 1) and HER2-low vs. HER2-zero tumors (Task 2) using BI-RADS-based MRI features. SHapley Additive exPlanations were applied to rank feature importance, assist feature selection, and enhance model interpretability. DFS was analyzed using Kaplan-Meier curves and log-rank tests. In Task 1, key features included tumor size, axillary lymph nodes, fibroglandular tissue, peritumoral edema, and multifocal, achieving AUCs of 0.75 and 0.73 in the internal and external test sets, respectively. In Task 2, tumor size, peritumoral edema, and multifocal yielded AUCs of 0.73 and 0.72, respectively. Higher task-specific model scores were associated with shorter DFS in Task 1 ( = 0.037) and longer DFS in Task 2 ( = 0.046). MRI-based machine learning models can noninvasively stratify HER2 expression levels, with potential for prognostic stratification and clinical application.

🏷️ 키워드 / MeSH 📖 같은 키워드 OA만

같은 제1저자의 인용 많은 논문 (5)

📖 전문 본문 읽기 PMC JATS · ~34 KB · 영문

1. Introduction

1. Introduction
Human epidermal growth factor receptor 2 (HER2) expression levels are critical in shaping treatment strategies and prognostic outcomes in invasive breast cancer. Traditionally, HER2 status has been classified into HER2-positive, treated with HER2-targeted therapies, and HER2-negative, where treatment primarily relies on hormone receptor status [1,2]. Recent studies, however, suggest a shift in this paradigm. HER2-low tumors [immunohistochemistry (IHC) score 1+ or 2+ without fluorescence in situ hybridization (FISH) amplification], which make up nearly half of invasive breast cancers and were previously grouped with HER2-negative, have shown potential for meaningful survival benefits when treated with novel antibody-drug conjugates [3,4]. As a result, accurate HER2 stratification into three categories—HER2-positive, HER2-low, and HER2-zero—has gained increasing significance in clinical practice.
HER2 status is typically evaluated using IHC and/or FISH on core needle biopsy (CNB) or surgical specimens [5,6]. However, discrepancies in HER2 expression between CNB and surgical specimens have been reported, particularly in HER2-low and HER2-zero tumors [7,8]. These inconsistencies may require repeated testing, leading to increased physical and emotional burdens for patients [6]. Thus, a non-invasive preoperative method for assessing HER2 expression levels is urgently needed.
Magnetic resonance imaging (MRI) plays an increasingly vital role in breast disease management, including diagnosis, categorization, and treatment monitoring [9,10]. Several studies have applied MRI-based radiomics to classify HER2 expression levels, achieving AUC values of 0.63 to 0.82 [11,12,13,14,15,16,17]. While radiomics shows promise, concerns remain regarding its robustness and generalizability across datasets and MRI systems [18]. Conventional MRI features, assessed by experienced radiologists using BI-RADS, have demonstrated strong performance in benign-malignant classification models [19]. Zhou et al. reported that machine learning (ML) models using conventional MRI features and Random Forest (RF) for feature selection achieved AUC values of 0.69 to 0.79 for classifying HER2 expression into three categories [20]. These findings suggest that more advanced, nonlinear feature selection methods could enhance the performance of conventional MRI models, making them comparable to radiomics-based approaches. However, the increased complexity of such methods may reduce interpretability and limit clinical trust [21,22].
SHapley Additive exPlanation (SHAP) is a game-theory-based interpretation method [23] that can be applied to a wide range of ML models, offering both local and global explanations [24]. By quantifying the marginal contribution of each feature to the model’s predictions, SHAP has the potential to provide a more refined assessment of feature importance, even when variables are not completely independent [25]. Additionally, SHAP analysis may enable more intuitive visualizations of the decision-making process, enhancing model interpretability and offering deeper insights into feature contributions [26].
Based on these findings, this study aimed to construct ML models using conventional MRI features to distinguish the HER2 triple classifications, utilize SHAP to evaluate feature contributions and improve interpretability, and explore the model’s clinical relevance through survival analysis.

2. Materials and Methods

2. Materials and Methods

2.1. Study Sample
This retrospective study enrolled 796 patients with invasive breast cancer (Center 1: n = 625; Center 2: n = 171), whose HER2 status was assessed by IHC and/or FISH. All patients underwent multiparametric MRI within two weeks prior to surgery, performed between June 2018 and March 2024 at Center 1, and between June 2022 and March 2024 at Center 2. A total of 118 patients were excluded due to (i) incomplete pathological or clinical data, (ii) prior breast-related treatments before MRI, or (iii) poor image quality or incomplete sequences. Ultimately, 678 patients were included in the analysis (Center 1: n = 534; Center 2: n = 144; Figure 1).
This study was designed for two tasks: Task 1 aimed to distinguish HER2-positive from HER2-negative tumors, and Task 2 aimed to differentiate HER2-low from HER2-zero tumors. For each task, data from Center 1 were split by time into a training set (June 2020 to March 2024) and an internal test set (June 2018 to May 2020), which was also used for survival analysis; data from Center 2 served as an independent external test set. The study design flow is shown in Figure 2.
The present study was conducted at the same two institutions as our previous radiomics-based work on HER2 prediction [14]. Using unique patient identifiers, 445 patients overlapped between the two studies, and 233 patients were newly included in the current cohort due to the extended accrual period. While the prior study combined radiomics with conventional MRI descriptors, the present study focuses on machine-learning models based solely on conventional MRI descriptors and SHAP-assisted interpretation. All analyses in the current work were performed de novo, without reusing any radiomics features, segmentation-derived radiomics outputs, model parameters, or results from the prior publication.

2.2. Clinicopathologic Data Collection
Two pathologists reviewed HER2 expression levels from both centers according to the American Society of Clinical Oncology/College of American Pathologists recommendations [6]. HER2 expression was classified as HER2-zero (IHC 0), HER2-low (IHC 1+ or 2+ and FISH-negative), or HER2-positive (IHC 3+ or IHC 2+ and FISH-positive) [3]. HER2-zero and HER2-low were considered HER2-negative.
Clinical and pathological variables, including age, menopausal status, tumor location, histological type, estrogen receptor (ER) status, progesterone receptor (PR) status, and Ki67 status, was retrieved from electronic medical records. ER and PR status were considered positive when nuclear staining was observed in ≥1% of tumor cells; otherwise, results were classified as negative. Hormone receptor (HR) positivity was defined as positivity for ER and/or PR according to these criteria. Ki67 status was classified as high when >14% and low otherwise [9].

2.3. Breast MRI Acquisition
MRI examinations were performed on different scanners at the two institutions. At Center 1, examinations were performed on a 1.5-T system (uMR 560, United Imaging, Shanghai, China) using a 4-channel dedicated breast coil. At Center 2, MRI was performed on 3.0-T systems (DiscoveryC750 and Architect, GE Healthcare, Chicago, IL, USA) with an 8-channel breast coil. The protocol comprised T1-weighted images (T1WI), fat-suppressed T2-weighted images (T2WI), axial single-shot diffusion-weighted imaging, and dynamic contrast-enhanced (DCE) images. A pre-contrast fat-suppressed T1WI acquisition was obtained prior to DCE imaging. A gadolinium-based contrast agent (Gd-DTPA, Magnevist; Bayer HealthCare, Leverkusen, Germany) was injected intravenously at a dose of 0.2 mL/kg and a rate of 1.5 mL/s, followed by a 20 mL saline flush. Additional MRI sequence parameters are summarized in Table S1.

2.4. Conventional MRI Features Assessment
MRI features were independently evaluated by two radiologists blinded to HER2 expression. Features included tumor size, fibroglandular tissue (FGT) (fatty/scattered vs. heterogeneous/extremely dense), background parenchymal enhancement (BPE) (minimal/mild vs. moderate/marked), multifocal (single vs. multiple), lesion type (NST vs. ILC and other), shape (round/oval vs. irregular), margin (circumscribed vs. not circumscribed), internal enhancement (homogeneous vs. heterogeneous), enhancement curve (ascendant and/or plateau vs. washout), peritumoral edema (absent vs. present), and abnormal axillary lymph nodes (ALNs) (absent vs. present). Tumor size was defined as the maximum diameter measured on the early phase of DCE [27]. Peritumoral edema was defined as a hyperintense signal adjacent to the tumor on axial or sagittal T2WI [20]. ALNs were considered abnormal if any of the following were present [28]: absent fatty hilum, short-axis diameter > 10 mm, long-to-short-axis ratio < 2, cortical thickening, irregular margins, or asymmetry in number or size compared to the contralateral axilla. All remaining descriptors were evaluated in accordance with the BI-RADS Atlas 5th edition. For multifocal lesions, assessments were based on the largest lesion.
Tumor size was calculated as the mean of the diameters measured by the two radiologists. For other features, disagreements were resolved first through discussion between the two radiologists. If consensus could not be reached, a third radiologist made the final determination.

2.5. Model Construction and Evaluation
For both tasks, the modeling process consisted of three steps:Step 1: Five ML models were selected, including RF, support vector machine (SVM), extreme gradient boosting (XGBoost), K-nearest neighbors (K-NN), and logistic regression (LR). Before model construction, continuous variables were standardized. To mitigate class imbalance, SMOTE (Synthetic Minority Over-sampling Technique) was applied to the training data by generating synthetic samples for the minority class. Hyperparameters were optimized using a combination of grid search and manual fine-tuning. Each model was validated using 10-fold cross-validation, and the model with the highest mean AUC was selected for the next step.

Step 2: Feature selection was based on the contribution of each feature in the selected ML model, ranking them by importance [24]. Features were progressively removed in ascending order of importance, with the AUC recalculated at each step. The process was halted when the AUC reduction became statistically significant compared to the model with all features, as determined by the DeLong test [24]. The number of features at this point was finalized for the model, balancing predictive performance and feature reduction.

Step 3: Using the selected features from Step 2, the final ML model was developed and validated through 10-fold cross-validation. Performance was evaluated using several commonly applied metrics, including the area under the receiver operating characteristic (ROC) curve, accuracy (ACC), specificity (SPE), sensitivity (SEN), positive predictive value (PPV), and negative predictive value (NPV).

2.6. SHAP-Based Interpretability Analysis
The SHAP package in Python (v3.12.3) was used to provide both global and local interpretations for the ML models in both tasks. Global interpretation aimed to assign importance values to each model feature. This was achieved by calculating SHAP values for each feature and generating SHAP summary plots, which display the mean absolute SHAP values across all patients to illustrate the overall contribution of each feature to the model. In addition, swarm plots are used to show the correlation between each feature and the model predictions. Local interpretation focused on understanding individual predictions by constructing waterfall plots, which highlight the features that contributed most significantly to the model’s predicted probability for a specific patient, emphasizing their impact on individual outcomes.

2.7. Survival Analysis
Survival analysis was performed in the internal test set to explore the association between task-specific model outputs and disease-free survival (DFS). DFS was defined as the time interval from the date of surgery to the occurrence of tumor recurrence, distant metastasis, or death [29]. Follow-up and recurrence data for the internal test set (June 2018 to May 2020) were obtained from electronic medical records, with 30 June 2023 as the data cut-off date. Consequently, follow-up duration varied among patients based on their date of surgery. Patients who did not experience recurrence or were lost to follow-up by the last follow-up date were treated as censored observations.
The model-predicted probabilities were used as task-specific model scores. The Task 1 model score was defined as the predicted probability of HER2-positive status (Task 1: HER2-positive vs. HER2-negative), and the Task 2 model score was defined as the predicted probability of HER2-low status within the pathologically HER2-negative subset (Task 2: HER2-low vs. HER2-zero). For each task, an optimal cut-off value was determined using the Youden index and used to stratify patients into high- and low-score groups. DFS was compared using Kaplan–Meier curves with the log-rank test. Associations between the model score and DFS were evaluated using univariable and multivariable Cox proportional hazards models, adjusted for age, menopausal status, tumor location, histologic type, ER status, PR status, and Ki-67.

2.8. Statistical Analysis
Analysis was conducted using Python v3.12.3 (https://www.python.org) and SPSS statistical software v26.0 (https://www.ibm.com/spss, accessed on 10 December 2025). Continuous variables were compared using the t-test or Mann–Whitney U test, while categorical variables were analyzed using the Chi-square test or Fisher’s exact test. For inter-reader agreement, the intraclass correlation coefficient (ICC) was used for continuous variables, categorized as poor (<0.75), good (0.75–0.90), or excellent (>0.90) [30]. Categorical variables were evaluated using the kappa coefficient, classified as poor (<0.00), slight (0.00–0.20), fair (0.21–0.40), moderate (0.41–0.60), substantial (0.61–0.80), or almost perfect (0.81–1.00) [31]. Model performance was evaluated using metrics such as the AUC, ACC, SEN, SPE, PPV, and NPV. The DeLong test was employed to compare differences in AUCs as features were progressively reduced during model construction. Statistical significance was defined as a two-tailed p-value of < 0.05.

3. Results

3. Results

3.1. Patients
A total of 678 patients were included in the study, comprising 377 in the training set (HER2-zero: 84 [22.3%], HER2-low: 195 [51.7%], HER2-positive: 98 [26.0%]), 157 in the internal test set (HER2-zero: 34 [21.7%], HER2-low: 68 [43.3%], HER2-positive: 55 [35.0%]), and 144 in the external test set (HER2-zero: 29 [20.1%], HER2-low: 73 [50.7%], HER2-positive: 42 [29.2%]). Clinicopathological characteristics did not differ significantly across the three datasets (p > 0.05) (Table 1). The distribution of receptor-defined subgroups is summarized in Table S2.
Table S3 presents inter-reader agreement for conventional MRI features. Tumor size demonstrated excellent agreement (ICC = 0.93), while categorical features showed almost perfect agreement, with kappa values ranging from 0.80 to 0.88.
Table 2 and Table 3 summarize the univariable analysis of conventional MRI features for Task 1 and Task 2. For Task 1, tumor size, peritumoral edema, and abnormal ALNs differed significantly between HER2-positive and HER2-negative tumors across the training, internal test, and external test sets (all p < 0.05). By contrast, in Task 2, none of the conventional MRI features showed a statistically significant difference between HER2-low and HER2-zero tumors in any sets (all p > 0.05).

3.2. Model Construction
Five ML models were evaluated for each task to identify the optimal approach. In Task 1, 10-fold cross-validation showed that RF achieved the highest mean AUC at 0.81 (95% CI: 0.78–0.83), while LR had the lowest mean AUC of 0.63 (95% CI: 0.58–0.69). Similarly, in Task 2, RF yielded the highest mean AUC of 0.77 (95% CI: 0.73–0.81), whereas LR yielded the lowest mean AUC of 0.64 (95% CI: 0.61–0.68) (Figure S1; Tables S4 and S5).
SHAP analysis was used to visualize and rank feature contributions in the RF model (Figure 3a,b). Features were progressively removed in ascending order of importance, and the corresponding AUC was calculated (Figure 3c,d). In Task 1, the model with 5 features (tumor size, abnormal ALNs, peritumoral edema, FGT, and multifocal) demonstrated similar performance to the model with 11 features (AUC: 0.77 vs. 0.79, p = 0.51). In Task 2, the model with 3 features (peritumoral edema, tumor size, and multifocal) performed similarly to the model with 11 features (AUC: 0.71 vs. 0.74, p = 0.36). The DeLong test results are provided in Tables S6 and S7.

3.3. Model Performance
For Task 1, the final RF model yielded AUCs of 0.97 (95% CI: 0.96–0.98) in the training set, 0.75 (95% CI: 0.67–0.82) in the internal test set, and 0.73 (95% CI: 0.64–0.82) in the external test set (Figure 4a–c). For Task 2, the corresponding AUCs were 0.93 (95% CI: 0.90–0.95), 0.73 (95% CI: 0.61–0.83), and 0.72 (95% CI: 0.60–0.83), respectively (Figure 4d–f). Detailed performance metrics for both tasks across the three datasets are summarized in Table 4. Confusion matrices for both tasks in the internal and external test sets are also provided in Figure S2.

3.4. Interpretability Analysis
SHAP summary and swarm plots were used to provide global explanations of the models for both tasks (Figure 5). In Task 1, tumor size and abnormal ALNs ranked as the top contributors, as shown by the long bars at the top of the summary plot, indicating their high mean impact on the model’s predictions (Figure 5a). The swarm plot further revealed that larger tumor sizes and the presence of abnormal ALNs were more strongly associated with HER2-positive tumors (Figure 5b). In Task 2, tumor size remained the most influential feature, with the swarm plot demonstrating a stronger association between larger tumor size and HER2-low tumors (Figure 5c,d). Additionally, peritumoral edema was highly correlated with HER2-positive tumors in Task 1 but showed a greater association with HER2-low tumors in Task 2.
SHAP waterfall plots were used to provide local explanations based on individual predictions (Figure 6 and Figures S3 and S4). In Figure 6, for Task 1, most features reduced the likelihood of HER2-positive classification for this patient, while peritumoral edema showed an opposite trend (predicted probability: 4.1%). In contrast, for Task 2, the presence of peritumoral edema contributed the most to the predicted HER2-low probability of 91.7% for the same patient.

3.5. Task-Specific Survival Analysis
Survival analysis was conducted in the internal test set. In Task 1, the median follow-up time was 34.0 months (IQR: 22.5–45.2 months). During the follow-up period, 28 patients (17.83%) experienced recurrence. Patients were stratified into low vs. high Task 1 model score groups using a cutoff value of 0.420. Kaplan–Meier survival curve demonstrated that a high Task 1 model score (predicted HER2-positive) was significantly associated with shorter DFS (p = 0.037; Figure 7a). In Cox regression analysis, the Task 1 model score was significantly associated with DFS in univariable analysis, and after adjustment for clinicopathologic factors in the multivariable model, a high Task 1 model score remained an independent predictor of poorer DFS (HR 2.26, 95% CI 1.06–4.84; p = 0.035; Table S8).
In Task 2, the median follow-up time was 36.0 months (IQR: 27.7–45.3 months). During the follow-up period, 17 patients (16.67%) experienced recurrence. Patients were stratified into low vs. high Task 2 model score groups using a cutoff value of 0.496. Kaplan–Meier survival curve showed that a high Task 2 model score (predicted HER2-low) was significantly associated with longer DFS (p = 0.046; Figure 7b). However, in Cox analyses adjusted for clinicopathologic variables, the association between the Task 2 model score and DFS did not reach statistical significance, showing only a borderline trend (HR 2.423, 95% CI 0.930–6.317; p = 0.070; Table S9).

4. Discussion

4. Discussion
This study utilized ML models based on conventional MRI features combined with SHAP to visualize feature importance rankings, aiding in feature selection for distinguishing the three levels of HER2 expression. The results demonstrated that this approach achieved AUCs of 0.75 and 0.73 in distinguishing HER2-positive from HER2-negative tumors in the internal and external test sets, respectively, and AUCs of 0.73 and 0.72 in distinguishing HER2-low from HER2-zero tumors. In exploratory survival analyses, DFS differed between the model-defined groups, suggesting that the task-specific MRI-derived model scores may be associated with clinical outcomes.

4.1. Comparison with Prior Studies
Recent studies on HER2 triple classification primarily focus on the application of MRI radiomics [12,15,16,32]. Several studies have reported robust performance in distinguishing HER2-positive from HER2-negative tumors. Zheng et al. [9] developed a radiomics model using T2WI, DCE, diffusion-weighted imaging, and apparent diffusion coefficient (ADC) images, achieving an AUC of 0.725; Bian et al. [13] constructed a combined model using T1WI contrast-enhanced and ADC imaging features, achieving an AUC of 0.76. Luo et al. used T2WI and DCE-MRI radiomics in a machine learning model and achieved an AUC of 0.777 [17]. For the more clinically challenging task of distinguishing HER2-low from HER2-zero tumors, these studies also showed similar performance, with AUC values ranging from 0.71 to 0.77 [13,17]. Overall, these radiomics models achieved performance that was in a comparable range, with some reporting slightly higher AUCs. However, radiomics-based performance may be more sensitive to differences in datasets, MRI systems, and preprocessing pipelines, as it relies on standardized image acquisition and feature-extraction procedures, which can limit reproducibility and routine clinical implementation [18].
In contrast, studies using conventional MRI features for HER2 classification remain limited [20,33]. Zhou et al. developed ML models based on BI-RADS MRI features and reported an AUC of 0.79 (KNN model) for distinguishing HER2-zero from non-zero tumors, and an AUC of 0.69 (DT model) for differentiating HER2-low from HER2-positive tumors [20]. Their overall performance was slightly better than ours, which may be related to differences in study design and the feature selection strategies. Despite these differences, their findings, together with ours, underscore the potential of ML methods based on conventional MRI features for HER2 classification, as these methods are capable of capturing complex, non-linear relationships between imaging features and HER2 status.

4.2. Interpretability and Feature Relevance
Previous studies have shown that in HER2-positive tumors, abnormal ALNs and larger tumor size are commonly associated with more aggressive biological features, while peritumoral edema indicates vascular invasion around the tumor, reflecting the invasive characteristics of HER2-positive tumors [9,13,20]. These findings are consistent with those of our study. However, when distinguishing HER2-low from HER2-zero tumors, we did not observe significant differences in feature distribution, which is consistent with previous findings [14]. Due to the lack of distribution differences, traditional univariable analysis failed to effectively select features, making it challenging to construct an effective classification model. In contrast, our study utilized SHAP to visualize feature contributions during the ML model construction process, aiding feature selection. This approach demonstrated good performance, highlighting the potential of SHAP-based interpretability analysis for this specific task. Hu et al. applied a similar SHAP-assisted method for feature selection and demonstrated that it achieved consistent performance across internal and external validations [24]. By comparing the performance of different feature subsets during model construction, this approach retained the most meaningful features, reduced model complexity, and maintained good classification accuracy.
In the SHAP analysis, tumor size and abnormal ALNs were identified as the most important features for distinguishing HER2-positive from HER2-negative tumors. These findings are consistent with our previous feature distributions analysis and align with the results from traditional univariable analysis, further confirming the importance of these conventional MRI features in HER2 classification. When distinguishing HER2-low from HER2-zero tumors, tumor size remained the most important feature in SHAP global analysis, although previous studies have not reached a consensus on the distribution differences in tumor size between HER2-low and HER2-zero tumors [34,35,36]. Interestingly, peritumoral edema, as a binary variable, played a significant role in both tasks. In Task 1, the presence of peritumoral edema was strongly associated with HER2-positive tumors, while in Task 2, it showed a stronger association with HER2-low tumors. This finding aligns with those of Zhou et al., who observed that the incidence of peritumoral edema increased with higher HER2 scores [20]. Although no significant differences in the distribution of peritumoral edema were observed between HER2-low and HER2-zero tumors, SHAP analysis uncovered a complex relationship between peritumoral edema and HER2-negative tumors. By evaluating the marginal effects of features, SHAP analysis provided a more nuanced assessment of feature contributions, facilitating model optimization and deeper insights into tumor biology.

4.3. Exploratory Survival Analysis
Survival analysis in the internal test set suggested that the task-specific model outputs derived from conventional MRI features may carry prognostic information. In Task 1, patients predicted as HER2-positive had shorter DFS, and this association remained significant after adjustment for clinicopathologic factors, indicating that the imaging-based HER2 phenotype captured by the model may be biologically relevant and correlates with the aggressive behavior of HER2-positive tumors. Within the HER2-negative population (Task 2), predicted HER2-low patients were associated with longer DFS on Kaplan–Meier analysis, which is consistent with previous reports suggesting that HER2-low tumors may have a more favorable prognosis than HER2-zero tumors [11,37]. However, this association did not remain statistically significant after multivariable Cox adjustment, possibly reflecting the limited number of events and the complex interplay between HER2 expression and other biological and treatment-related factors. Therefore, these survival findings should be interpreted with caution and require validation in future studies with larger sample sizes.

4.4. Limitations
This study has several limitations. Firstly, due to the retrospective nature of this study, there may be an unavoidable sample selection bias, although we have conducted a comparison of the included populations of the different datasets. Therefore, future multi-center, prospective studies are required to validate the results of this study. Second, the evaluation of conventional MRI features was performed by radiologists trained at the same center. However, the applicability of this model when evaluated by radiologists from other centers or with different levels of experience remains uncertain. Therefore, further validation involving radiologists from diverse clinical settings and with varying years of experience is needed to assess the model’s generalizability.

5. Conclusions

5. Conclusions
The ML model based on conventional MRI features, assisted by SHAP analysis, can help distinguish different levels of HER2 expression and may contribute to prognostic stratification, offering valuable insights for personalized patient management.

출처: PubMed Central (JATS). 라이선스는 원 publisher 정책을 따릅니다 — 인용 시 원문을 표기해 주세요.

🏷️ 같은 키워드 · 무료전문 — 이 논문 MeSH/keyword 기반

🟢 PMC 전문 열기