Stratified prediction of HER2 status in breast cancer by integrating intratumoral and peritumoral radiomics from DCE-MRI.
1/5 보강
[OBJECTIVE] This study aimed to investigate the value of Dynamic Contrast-Enhanced Magnetic Resonance Imaging (DCE-MRI) radiomics in predicting HER2 expression status in breast cancer and to develop s
APA
Gao Y, Gong J, et al. (2026). Stratified prediction of HER2 status in breast cancer by integrating intratumoral and peritumoral radiomics from DCE-MRI.. BMC cancer, 26(1). https://doi.org/10.1186/s12885-026-15688-x
MLA
Gao Y, et al.. "Stratified prediction of HER2 status in breast cancer by integrating intratumoral and peritumoral radiomics from DCE-MRI.." BMC cancer, vol. 26, no. 1, 2026.
PMID
41688957 ↗
Abstract 한글 요약
[OBJECTIVE] This study aimed to investigate the value of Dynamic Contrast-Enhanced Magnetic Resonance Imaging (DCE-MRI) radiomics in predicting HER2 expression status in breast cancer and to develop stratified prediction models.
[METHODS] We enrolled 210 breast cancer patients, categorized into HER2-negative and HER2-positive groups, as well as HER2-low and HER2-zero expression groups based on their HER2 status. Radiomic features were extracted from the intratumoral region and various peritumoral regions (1 mm, 3 mm, 5 mm) on DCE-MRI. Predictive models were constructed for distinguishing HER2-negative from HER2-positive cases and for differentiating HER2-low from HER2-zero expression. Feature selection was performed using LASSO regression, followed by classification via logistic regression.
[RESULTS] The model based on the intratumoral region alone demonstrated robust performance in classifying HER2-negative and HER2-positive status. For distinguishing HER2-low from HER2-zero expression, the combined model incorporating both intratumoral and 3 mm peritumoral features showed superior performance. We introduced the Generalization Decay Index (GDI) as a novel metric for evaluating model generalizability. Analysis using GDI revealed that relying solely on the AUC could be misleading for stability assessment. The model based on the 5 mm peritumoral region exhibited a high GDI, suggesting potential overfitting, whereas the intratumoral model achieved the lowest GDI, indicating the highest stability. The combined intratumoral and 3 mm peritumoral model not only showed good diagnostic efficacy but was also validated by GDI as the most stable and generalizable model among all configurations for the HER2-low vs. HER2-zero classification task.
[CONCLUSION] DCE-MRI-based radiomics can effectively predict HER2 expression status and facilitate the construction of stratified prediction models in breast cancer. The peritumoral region-based model demonstrates stability in classifying HER2-negative and HER2-positive status, while the combined intratumoral and 3 mm peritumoral model offers advantages for distinguishing HER2-low from HER2-zero expression. The proposed GDI serves as a valuable new indicator for assessing model generalizability, providing a novel approach for the non-invasive evaluation of HER2 status and offering fresh insights into model performance evaluation.
[METHODS] We enrolled 210 breast cancer patients, categorized into HER2-negative and HER2-positive groups, as well as HER2-low and HER2-zero expression groups based on their HER2 status. Radiomic features were extracted from the intratumoral region and various peritumoral regions (1 mm, 3 mm, 5 mm) on DCE-MRI. Predictive models were constructed for distinguishing HER2-negative from HER2-positive cases and for differentiating HER2-low from HER2-zero expression. Feature selection was performed using LASSO regression, followed by classification via logistic regression.
[RESULTS] The model based on the intratumoral region alone demonstrated robust performance in classifying HER2-negative and HER2-positive status. For distinguishing HER2-low from HER2-zero expression, the combined model incorporating both intratumoral and 3 mm peritumoral features showed superior performance. We introduced the Generalization Decay Index (GDI) as a novel metric for evaluating model generalizability. Analysis using GDI revealed that relying solely on the AUC could be misleading for stability assessment. The model based on the 5 mm peritumoral region exhibited a high GDI, suggesting potential overfitting, whereas the intratumoral model achieved the lowest GDI, indicating the highest stability. The combined intratumoral and 3 mm peritumoral model not only showed good diagnostic efficacy but was also validated by GDI as the most stable and generalizable model among all configurations for the HER2-low vs. HER2-zero classification task.
[CONCLUSION] DCE-MRI-based radiomics can effectively predict HER2 expression status and facilitate the construction of stratified prediction models in breast cancer. The peritumoral region-based model demonstrates stability in classifying HER2-negative and HER2-positive status, while the combined intratumoral and 3 mm peritumoral model offers advantages for distinguishing HER2-low from HER2-zero expression. The proposed GDI serves as a valuable new indicator for assessing model generalizability, providing a novel approach for the non-invasive evaluation of HER2 status and offering fresh insights into model performance evaluation.
🏷️ 키워드 / MeSH 📖 같은 키워드 OA만
같은 제1저자의 인용 많은 논문 (5)
- Comparison of aesthetic facial criteria between Caucasian and East Asian female populations: An esthetic surgeon's perspective.
- Spatial and phenotypic plasticity of B cells in remodeling the tumor microenvironment.
- Case Report: A case of jejunal T-cell non-Hodgkin lymphoma with secondary bone involvement presenting as gastrointestinal perforation.
- Beyond αβ T cells: unlocking the potential of diverse immune cells in CAR modification.
- The application of home enteral nutrition in cancer patients: a scoping review.
📖 전문 본문 읽기 PMC JATS · ~79 KB · 영문
Introduction
Introduction
Breast cancer is a highly heterogeneous malignancy [1]. The status of the human epidermal growth factor receptor 2 (HER2) serves as a critical molecular marker for determining treatment strategies and predicting prognosis [2]. HER2 promotes cancer cell proliferation, invasion, and survival by triggering the activation of various signaling pathways through homodimerization or heterodimerization [2, 3]. Consequently, the development and application of targeted therapies against HER2 have significantly improved treatment outcomes and prognosis for patients with HER2-positive breast cancer [4]. These agents primarily include monoclonal antibodies, antibody-drug conjugates (ADCs), and small-molecule tyrosine kinase inhibitors.
The 2018 guidelines from the American Society of Clinical Oncology/College of American Pathologists (ASCO/CAP) further subclassified HER2-negative breast cancer into HER2-zero and HER2-low subtypes [5]. Among these subtypes, HER2-low has emerged as a key therapeutic target for novel antibody-drug conjugates (ADCs) [6, 7]. In 2020, Tarantino et al. [8] performed a systematic review of the pathological features and current clinical management of HER2-low breast cancer, which defined this subtype as a distinct disease entity with clear clinical relevance and provided critical evidence for the development of targeted treatment strategies. This clinical relevance was further validated by Modi et al. in 2022, who demonstrated that a novel trastuzumab-based ADC significantly improved survival outcomes in a cohort of 557 patients with advanced HER2-low breast cancer [9]. These findings highlight the imperative of distinguishing this independent disease entity from conventional HER2-negative breast cancer [10].
Although immunohistochemistry (IHC) and fluorescence in situ hybridization (FISH) performed on biopsy specimens remain the clinical standard for assessing HER2 status, these methods are inherently invasive, labor-intensive, and expensive. An additional challenge is the risk of under-detecting HER2-low expression caused by the limited tissue sample from a core needle biopsy and underlying tumor heterogeneity. These significant drawbacks underscore the critical demand for a non-invasive and reliable approach to evaluate HER2 expression.
Radiomics acts as a bridge linking medical imaging and precision medicine [11], which decodes tumor phenotypes by extracting high-throughput quantitative features [12]. It has exhibited substantial potential for predicting the molecular subtypes of breast cancer. In recent years, deep learning has undergone transformative advances in breast imaging analysis over the past decade [13]; yet traditional radiomics models hold distinct advantages in terms of feature interpretability and clinical translational feasibility. Early MRI-based radiomics models developed for evaluating HER2 status primarily focused on HER2-overexpressing breast cancer, while recent research has shifted its focus to the HER2-low subtype [14–16]. Despite this research focus shift, the predictive value of the peritumoral region in this context remains largely underexplored. This is of particular note because the peritumoral stroma is a critical component of the tumor microenvironment [17, 18], and specific signaling pathways—including those related to tumor necrosis factor—are closely correlated with peritumoral features [19].
To this end, the present study seeks to construct and evaluate a stratified prediction model for HER2 expression status by leveraging intratumoral and multi-shell peritumoral (1 mm, 3 mm, 5 mm) radiomic features derived from DCE-MRI.
This retrospective study was approved by the Ethics Committee of the First Affiliated Hospital of Guangxi Medical University (Approval No.: 2025-E0420). In line with the ethical review criteria of the committee, the requirement for obtaining written informed consent from study participants was formally waived. All procedures involving human participants and human-derived data in this study were conducted in strict adherence to the ethical principles set forth in the World Medical Association Declaration of Helsinki (adopted June 1964, amended October 2024; https://www.wma.net/policies-post/wma-declaration-of-helsinki/).
Study sample
This retrospective study analyzed data from female patients with breast cancer at two clinical centers: Center A (The First Affiliated Hospital of Guangxi Medical University) and Center B (The Second Affiliated Hospital of Guangxi Medical University). The collected clinical information included age, height, weight, body mass index (BMI), menopausal status, clinical T stage, N stage, and molecular subtype.
The inclusion criteria were as follows:
preoperative breast DCE-MRI performed within one month before surgery
postoperative pathological confirmation of breast cancer
availability of complete clinical records
The exclusion criteria comprised:
incomplete or inadequate imaging data
receipt of neoadjuvant endocrine therapy, radiotherapy, chemotherapy, or other anticancer treatments prior to MRI and surgery
absence of HER2 status as determined by IHC or FISH, or lack of further FISH testing when IHC result was 2+
Based on these criteria, a total of 210 eligible patients were included in the final cohort. A flowchart illustrating the patient selection process is provided in Fig. 1.
HER2 score testing
HER2 expression was evaluated by specialized breast pathologists in accordance with the 2018 ASCO/CAP guideline [5]. HER2 status was categorized as follows: HER2-zero was defined as IHC 0; HER2-low was defined as IHC 1+ or IHC 2+ with negative FISH results; and HER2-positive was defined as IHC 3+ or IHC 2+ with positive FISH results. A detailed flowchart of the HER2 testing algorithm is provided in Fig. 2.
MRI procedure and image evaluation
All MRI examinations were performed on a 3.0T scanner using an 8-channel dedicated breast coil. The imaging protocol included non-contrast and dynamic contrast-enhanced sequences, with the scanning coverage extending from the axilla to the inferior breast margin. The key sequence parameters were as follows:
T1-weighted imaging (T1WI):
TR 3.8–4.62 ms, TE 1.57–1.66 ms, slice thickness 1–1.5 mm, slice gap 1–1.5 mm, FOV 340 × 340–379 × 379 mm².
T2-weighted imaging with fat suppression (T2WI-FS):
TR 2500–7129 ms, TE 61–82.3 ms, slice thickness 4–5.5 mm, slice gap 4–7 mm, FOV 340 × 340–400 × 400 mm².
DCE-MRI:
A 3D dynamic sequence was used with the following parameters: TR 3.8–4.62 ms, TE 1.57–1.66 ms, slice thickness 1–1.5 mm, slice gap 1–1.5 mm, FOV 340 × 340–379 × 379 mm².
Radiomic feature extraction
Axial images from the second phase of the dynamic contrast-enhanced (DCE) breast MRI series were loaded into the open-source software ITK-SNAP (version 3.8, www.itk-snap.org). A radiologist with five years of experience in breast imaging manually delineated the region of interest (ROI) slice-by-slice along the enhanced tumor border, encompassing the entire tumor volume. In cases of multiple unilateral tumors, the largest lesion was selected for segmentation. A three-dimensional peritumoral region was automatically expanded outward from the tumor contour at distances of 1 mm, 3 mm, and 5 mm using in-house Python scripts. If the expanded region incorporated extramammary air or pectoralis major muscle, these areas were manually erased in ITK-SNAP on a slice-by-slice basis to exclude interference from non-relevant tissues. To assess inter-observer agreement, a second radiologist with a decade of experience in breast MRI independently segmented the tumors in 40 randomly selected cases. Features demonstrating an intraclass correlation coefficient (ICC) ≥ 0.75 among ROIs delineated by the two radiologists were considered to have good reproducibility and were retained for subsequent analysis. For non-mass enhancement (NME) lesions, the demarcation of enhancement margins is inherently ambiguous—a factor that may impede the standardized delineation of peritumoral regions. In the present study, regions of interest (ROIs) were manually delineated in strict accordance with the visual margins of the enhancing area. Subsequent to peritumoral region extension, non-target tissues including the pectoralis major muscle and extramammary structures were manually excluded to minimize the confounding effects of morphological heterogeneity on radiomic feature extraction.
Feature selection and model construction
Radiomic features were extracted using the open-source Python package Pyradiomics (version 3.0; https://pyradiomics.readthedocs.io/). A total of 1,749 features were derived from each ROI, including the intratumoral region and the 1 mm, 3 mm, and 5 mm peritumoral regions. These features comprised the following five categories: first-order statistics, shape-based features, texture features (from gray-level co-occurrence, run-length, size-zone, and distance-size matrices), and fractal features. Feature extraction was performed on original images, Laplacian of Gaussian (LoG) filtered images, and wavelet-transformed images [20, 21]. The overall radiomics workflow is illustrated in Fig. 3.
The least absolute shrinkage and selection operator (LASSO) regression was applied to select the most predictive features. Subsequently, logistic regression classifiers were built to develop prediction models using four different feature sets: intratumoral features alone, and intratumoral features combined with peritumoral features from the 1 mm, 3 mm, or 5 mm regions, for distinguishing between HER2-negative and HER2-positive groups.
Statistical analysis
The predictive performance of each model was evaluated in both the training and test sets using the mean values of the area under the receiver operating characteristic curve (AUC), accuracy, sensitivity, and specificity. The DeLong test was employed to compare AUC differences between models within the same subgroup. Calibration curves were plotted to assess the agreement between predicted probabilities and observed outcomes, and decision curve analysis (DCA) was performed to evaluate the clinical utility of the models. A two-sided pp-value < 0.05 was considered statistically significant.
To quantitatively assess model generalizability, we introduced the GDI, defined as:
This metric reflects the relative decline in model performance from the training set to the test set. Model generalization stability was interpreted as follows: GDI ≤ 15% indicated a “safe zone,” 15% < GDI < 20% a “warning zone,” and GDI ≥ 20% a “high-risk zone.” The Generalization Discrepancy Index (GDI) is designed to complement conventional performance metrics (e.g., AUC, accuracy) by offering an intuitive quantification of model performance degradation from the training to the test set. It is thus particularly valuable for identifying potential model overfitting. Not intended to replace such metrics as AUC, GDI instead provides an additional dimension for evaluating model stability.
Breast cancer is a highly heterogeneous malignancy [1]. The status of the human epidermal growth factor receptor 2 (HER2) serves as a critical molecular marker for determining treatment strategies and predicting prognosis [2]. HER2 promotes cancer cell proliferation, invasion, and survival by triggering the activation of various signaling pathways through homodimerization or heterodimerization [2, 3]. Consequently, the development and application of targeted therapies against HER2 have significantly improved treatment outcomes and prognosis for patients with HER2-positive breast cancer [4]. These agents primarily include monoclonal antibodies, antibody-drug conjugates (ADCs), and small-molecule tyrosine kinase inhibitors.
The 2018 guidelines from the American Society of Clinical Oncology/College of American Pathologists (ASCO/CAP) further subclassified HER2-negative breast cancer into HER2-zero and HER2-low subtypes [5]. Among these subtypes, HER2-low has emerged as a key therapeutic target for novel antibody-drug conjugates (ADCs) [6, 7]. In 2020, Tarantino et al. [8] performed a systematic review of the pathological features and current clinical management of HER2-low breast cancer, which defined this subtype as a distinct disease entity with clear clinical relevance and provided critical evidence for the development of targeted treatment strategies. This clinical relevance was further validated by Modi et al. in 2022, who demonstrated that a novel trastuzumab-based ADC significantly improved survival outcomes in a cohort of 557 patients with advanced HER2-low breast cancer [9]. These findings highlight the imperative of distinguishing this independent disease entity from conventional HER2-negative breast cancer [10].
Although immunohistochemistry (IHC) and fluorescence in situ hybridization (FISH) performed on biopsy specimens remain the clinical standard for assessing HER2 status, these methods are inherently invasive, labor-intensive, and expensive. An additional challenge is the risk of under-detecting HER2-low expression caused by the limited tissue sample from a core needle biopsy and underlying tumor heterogeneity. These significant drawbacks underscore the critical demand for a non-invasive and reliable approach to evaluate HER2 expression.
Radiomics acts as a bridge linking medical imaging and precision medicine [11], which decodes tumor phenotypes by extracting high-throughput quantitative features [12]. It has exhibited substantial potential for predicting the molecular subtypes of breast cancer. In recent years, deep learning has undergone transformative advances in breast imaging analysis over the past decade [13]; yet traditional radiomics models hold distinct advantages in terms of feature interpretability and clinical translational feasibility. Early MRI-based radiomics models developed for evaluating HER2 status primarily focused on HER2-overexpressing breast cancer, while recent research has shifted its focus to the HER2-low subtype [14–16]. Despite this research focus shift, the predictive value of the peritumoral region in this context remains largely underexplored. This is of particular note because the peritumoral stroma is a critical component of the tumor microenvironment [17, 18], and specific signaling pathways—including those related to tumor necrosis factor—are closely correlated with peritumoral features [19].
To this end, the present study seeks to construct and evaluate a stratified prediction model for HER2 expression status by leveraging intratumoral and multi-shell peritumoral (1 mm, 3 mm, 5 mm) radiomic features derived from DCE-MRI.
This retrospective study was approved by the Ethics Committee of the First Affiliated Hospital of Guangxi Medical University (Approval No.: 2025-E0420). In line with the ethical review criteria of the committee, the requirement for obtaining written informed consent from study participants was formally waived. All procedures involving human participants and human-derived data in this study were conducted in strict adherence to the ethical principles set forth in the World Medical Association Declaration of Helsinki (adopted June 1964, amended October 2024; https://www.wma.net/policies-post/wma-declaration-of-helsinki/).
Study sample
This retrospective study analyzed data from female patients with breast cancer at two clinical centers: Center A (The First Affiliated Hospital of Guangxi Medical University) and Center B (The Second Affiliated Hospital of Guangxi Medical University). The collected clinical information included age, height, weight, body mass index (BMI), menopausal status, clinical T stage, N stage, and molecular subtype.
The inclusion criteria were as follows:
preoperative breast DCE-MRI performed within one month before surgery
postoperative pathological confirmation of breast cancer
availability of complete clinical records
The exclusion criteria comprised:
incomplete or inadequate imaging data
receipt of neoadjuvant endocrine therapy, radiotherapy, chemotherapy, or other anticancer treatments prior to MRI and surgery
absence of HER2 status as determined by IHC or FISH, or lack of further FISH testing when IHC result was 2+
Based on these criteria, a total of 210 eligible patients were included in the final cohort. A flowchart illustrating the patient selection process is provided in Fig. 1.
HER2 score testing
HER2 expression was evaluated by specialized breast pathologists in accordance with the 2018 ASCO/CAP guideline [5]. HER2 status was categorized as follows: HER2-zero was defined as IHC 0; HER2-low was defined as IHC 1+ or IHC 2+ with negative FISH results; and HER2-positive was defined as IHC 3+ or IHC 2+ with positive FISH results. A detailed flowchart of the HER2 testing algorithm is provided in Fig. 2.
MRI procedure and image evaluation
All MRI examinations were performed on a 3.0T scanner using an 8-channel dedicated breast coil. The imaging protocol included non-contrast and dynamic contrast-enhanced sequences, with the scanning coverage extending from the axilla to the inferior breast margin. The key sequence parameters were as follows:
T1-weighted imaging (T1WI):
TR 3.8–4.62 ms, TE 1.57–1.66 ms, slice thickness 1–1.5 mm, slice gap 1–1.5 mm, FOV 340 × 340–379 × 379 mm².
T2-weighted imaging with fat suppression (T2WI-FS):
TR 2500–7129 ms, TE 61–82.3 ms, slice thickness 4–5.5 mm, slice gap 4–7 mm, FOV 340 × 340–400 × 400 mm².
DCE-MRI:
A 3D dynamic sequence was used with the following parameters: TR 3.8–4.62 ms, TE 1.57–1.66 ms, slice thickness 1–1.5 mm, slice gap 1–1.5 mm, FOV 340 × 340–379 × 379 mm².
Radiomic feature extraction
Axial images from the second phase of the dynamic contrast-enhanced (DCE) breast MRI series were loaded into the open-source software ITK-SNAP (version 3.8, www.itk-snap.org). A radiologist with five years of experience in breast imaging manually delineated the region of interest (ROI) slice-by-slice along the enhanced tumor border, encompassing the entire tumor volume. In cases of multiple unilateral tumors, the largest lesion was selected for segmentation. A three-dimensional peritumoral region was automatically expanded outward from the tumor contour at distances of 1 mm, 3 mm, and 5 mm using in-house Python scripts. If the expanded region incorporated extramammary air or pectoralis major muscle, these areas were manually erased in ITK-SNAP on a slice-by-slice basis to exclude interference from non-relevant tissues. To assess inter-observer agreement, a second radiologist with a decade of experience in breast MRI independently segmented the tumors in 40 randomly selected cases. Features demonstrating an intraclass correlation coefficient (ICC) ≥ 0.75 among ROIs delineated by the two radiologists were considered to have good reproducibility and were retained for subsequent analysis. For non-mass enhancement (NME) lesions, the demarcation of enhancement margins is inherently ambiguous—a factor that may impede the standardized delineation of peritumoral regions. In the present study, regions of interest (ROIs) were manually delineated in strict accordance with the visual margins of the enhancing area. Subsequent to peritumoral region extension, non-target tissues including the pectoralis major muscle and extramammary structures were manually excluded to minimize the confounding effects of morphological heterogeneity on radiomic feature extraction.
Feature selection and model construction
Radiomic features were extracted using the open-source Python package Pyradiomics (version 3.0; https://pyradiomics.readthedocs.io/). A total of 1,749 features were derived from each ROI, including the intratumoral region and the 1 mm, 3 mm, and 5 mm peritumoral regions. These features comprised the following five categories: first-order statistics, shape-based features, texture features (from gray-level co-occurrence, run-length, size-zone, and distance-size matrices), and fractal features. Feature extraction was performed on original images, Laplacian of Gaussian (LoG) filtered images, and wavelet-transformed images [20, 21]. The overall radiomics workflow is illustrated in Fig. 3.
The least absolute shrinkage and selection operator (LASSO) regression was applied to select the most predictive features. Subsequently, logistic regression classifiers were built to develop prediction models using four different feature sets: intratumoral features alone, and intratumoral features combined with peritumoral features from the 1 mm, 3 mm, or 5 mm regions, for distinguishing between HER2-negative and HER2-positive groups.
Statistical analysis
The predictive performance of each model was evaluated in both the training and test sets using the mean values of the area under the receiver operating characteristic curve (AUC), accuracy, sensitivity, and specificity. The DeLong test was employed to compare AUC differences between models within the same subgroup. Calibration curves were plotted to assess the agreement between predicted probabilities and observed outcomes, and decision curve analysis (DCA) was performed to evaluate the clinical utility of the models. A two-sided pp-value < 0.05 was considered statistically significant.
To quantitatively assess model generalizability, we introduced the GDI, defined as:
This metric reflects the relative decline in model performance from the training set to the test set. Model generalization stability was interpreted as follows: GDI ≤ 15% indicated a “safe zone,” 15% < GDI < 20% a “warning zone,” and GDI ≥ 20% a “high-risk zone.” The Generalization Discrepancy Index (GDI) is designed to complement conventional performance metrics (e.g., AUC, accuracy) by offering an intuitive quantification of model performance degradation from the training to the test set. It is thus particularly valuable for identifying potential model overfitting. Not intended to replace such metrics as AUC, GDI instead provides an additional dimension for evaluating model stability.
Results
Results
Characteristics of the study sample
This study adopted two cohort partitioning strategies. For classifying HER2-positive versus HER2-negative cases, a center-stratified split was used, where data from Center A were split into training and internal test sets, and data from Center B were utilized as the external test set—thereby enabling assessment of the model’s cross-center generalization ability. For differentiating HER2-low from HER2-zero subtypes (both subgroups of the HER2-negative category with relatively limited sample sizes), a random 7:3 split was employed to maximize the use of available data for model training. In the first analysis, which focused on distinguishing HER2-positive from HER2-negative status, 101 cases from Center A were assigned to the training set, 46 cases from Center A served as the internal test set, and 63 cases from Center B constituted the external test set. In the second analysis, which classified HER2-zero and HER2-low expression among HER2-negative patients, 139 eligible cases from both centers were randomly split into a training set (n = 97) and a test set (n = 42) at a ratio of 7:3. All enrolled patients were female, with an average age of 51.73 ± 11.38 years (range: 28–86). The baseline characteristics of the cohorts are summarized in Tables 1 and 2.
Analysis of radiomic features
HER2-Negative vs. HER2-Positive Classification
In the analysis differentiating HER2-negative from HER2-positive tumors, the number of stable features initially identified was 105 for the intratumoral region, 196 for the intratumoral + 1 mm peritumoral region, 190 for the intratumoral + 3 mm peritumoral region, and 212 for the intratumoral + 5 mm peritumoral region. Following further selection via the LASSO regression algorithm, the number of features retained in the four respective models was 4, 13, 9, and 19 (Fig. 4).
HER2-Zero vs. HER2-Low expression classification
For the task of discriminating HER2-zero from HER2-low expression among HER2-negative cases, the initial numbers of stable features were 20 (intratumoral), 52 (intratumoral + 1 mm), 53 (intratumoral + 3 mm), and 56 (intratumoral + 5 mm). After LASSO regression, the final numbers of selected features in these four models were 9, 13, 16, and 20, respectively (Fig. 5).
Efficacy evaluation of stratified models
HER2-Negative vs. HER2-Positive classification
In the training set, the models based on the intratumoral region, intratumoral + 1 mm peritumoral region, intratumoral + 3 mm peritumoral region, and intratumoral + 5 mm peritumoral region achieved AUCs of 0.818 (95% CI: 0.736–0.899), 0.918 (95% CI: 0.865–0.971), 0.887 (95% CI: 0.824–0.950), and 0.932 (95% CI: 0.884–0.978), respectively. The DeLong test indicated no statistically significant differences in AUC among the models in either the internal or external test sets (all P = 0.111–0.978; Figs. 6 and 7).
The corresponding GDI values for the four models were 15%, 29%, 23%, and 25%, respectively (Fig. 8). According to the predefined GDI zones, the intratumoral model (GDI = 15%) lay at the boundary of the safety zone (GDI ≤ 15%), indicating superior generalization stability compared to the other three models, which fell into the high-risk zone (GDI ≥ 20%).
HER2-Low vs. HER2-Zero expression classification
In the training set, the models incorporating both intratumoral and peritumoral features (i.e., IntraPeri1mm, IntraPeri3mm, and IntraPeri5mm) achieved higher AUC values—0.864 (95% CI: 0.794–0.934), 0.889 (95% CI: 0.825–0.954), and 0.932 (95% CI: 0.887–0.978), respectively—than the intratumoral-only model. In the test set, the IntraPeri3mm model yielded the highest AUC of 0.796 (95% CI: 0.656–0.937). The DeLong test confirmed that the IntraPeri3mm model performed significantly better than the intratumoral model (P = 0.042), while no significant differences were observed among the other models (P = 0.057–0.559) (Figs. 9 and 10). The IntraPeri3mm model also exhibited an accuracy of 0.762, a sensitivity of 0.750, and a specificity of 0.778 in the test set.
Decision curve analysis revealed that the IntraPeri3mm model provided superior net benefit within a risk threshold range of 45% to 85% for predicting HER2-low and HER2-zero status. Furthermore, the calibration curve indicated good agreement between the predicted probabilities and actual outcomes for this model (Fig. 11).
The GDI values for the four models (intratumoral, IntraPeri1mm, IntraPeri3mm, IntraPeri5mm) were 24%, 23%, 10%, and 24%, respectively (Fig. 12). According to the predefined GDI criteria, the IntraPeri3mm model (GDI = 10%) fell within the safety zone (GDI ≤ 15%), demonstrating better generalization stability than the other three models.
Characteristics of the study sample
This study adopted two cohort partitioning strategies. For classifying HER2-positive versus HER2-negative cases, a center-stratified split was used, where data from Center A were split into training and internal test sets, and data from Center B were utilized as the external test set—thereby enabling assessment of the model’s cross-center generalization ability. For differentiating HER2-low from HER2-zero subtypes (both subgroups of the HER2-negative category with relatively limited sample sizes), a random 7:3 split was employed to maximize the use of available data for model training. In the first analysis, which focused on distinguishing HER2-positive from HER2-negative status, 101 cases from Center A were assigned to the training set, 46 cases from Center A served as the internal test set, and 63 cases from Center B constituted the external test set. In the second analysis, which classified HER2-zero and HER2-low expression among HER2-negative patients, 139 eligible cases from both centers were randomly split into a training set (n = 97) and a test set (n = 42) at a ratio of 7:3. All enrolled patients were female, with an average age of 51.73 ± 11.38 years (range: 28–86). The baseline characteristics of the cohorts are summarized in Tables 1 and 2.
Analysis of radiomic features
HER2-Negative vs. HER2-Positive Classification
In the analysis differentiating HER2-negative from HER2-positive tumors, the number of stable features initially identified was 105 for the intratumoral region, 196 for the intratumoral + 1 mm peritumoral region, 190 for the intratumoral + 3 mm peritumoral region, and 212 for the intratumoral + 5 mm peritumoral region. Following further selection via the LASSO regression algorithm, the number of features retained in the four respective models was 4, 13, 9, and 19 (Fig. 4).
HER2-Zero vs. HER2-Low expression classification
For the task of discriminating HER2-zero from HER2-low expression among HER2-negative cases, the initial numbers of stable features were 20 (intratumoral), 52 (intratumoral + 1 mm), 53 (intratumoral + 3 mm), and 56 (intratumoral + 5 mm). After LASSO regression, the final numbers of selected features in these four models were 9, 13, 16, and 20, respectively (Fig. 5).
Efficacy evaluation of stratified models
HER2-Negative vs. HER2-Positive classification
In the training set, the models based on the intratumoral region, intratumoral + 1 mm peritumoral region, intratumoral + 3 mm peritumoral region, and intratumoral + 5 mm peritumoral region achieved AUCs of 0.818 (95% CI: 0.736–0.899), 0.918 (95% CI: 0.865–0.971), 0.887 (95% CI: 0.824–0.950), and 0.932 (95% CI: 0.884–0.978), respectively. The DeLong test indicated no statistically significant differences in AUC among the models in either the internal or external test sets (all P = 0.111–0.978; Figs. 6 and 7).
The corresponding GDI values for the four models were 15%, 29%, 23%, and 25%, respectively (Fig. 8). According to the predefined GDI zones, the intratumoral model (GDI = 15%) lay at the boundary of the safety zone (GDI ≤ 15%), indicating superior generalization stability compared to the other three models, which fell into the high-risk zone (GDI ≥ 20%).
HER2-Low vs. HER2-Zero expression classification
In the training set, the models incorporating both intratumoral and peritumoral features (i.e., IntraPeri1mm, IntraPeri3mm, and IntraPeri5mm) achieved higher AUC values—0.864 (95% CI: 0.794–0.934), 0.889 (95% CI: 0.825–0.954), and 0.932 (95% CI: 0.887–0.978), respectively—than the intratumoral-only model. In the test set, the IntraPeri3mm model yielded the highest AUC of 0.796 (95% CI: 0.656–0.937). The DeLong test confirmed that the IntraPeri3mm model performed significantly better than the intratumoral model (P = 0.042), while no significant differences were observed among the other models (P = 0.057–0.559) (Figs. 9 and 10). The IntraPeri3mm model also exhibited an accuracy of 0.762, a sensitivity of 0.750, and a specificity of 0.778 in the test set.
Decision curve analysis revealed that the IntraPeri3mm model provided superior net benefit within a risk threshold range of 45% to 85% for predicting HER2-low and HER2-zero status. Furthermore, the calibration curve indicated good agreement between the predicted probabilities and actual outcomes for this model (Fig. 11).
The GDI values for the four models (intratumoral, IntraPeri1mm, IntraPeri3mm, IntraPeri5mm) were 24%, 23%, 10%, and 24%, respectively (Fig. 12). According to the predefined GDI criteria, the IntraPeri3mm model (GDI = 10%) fell within the safety zone (GDI ≤ 15%), demonstrating better generalization stability than the other three models.
Discussion
Discussion
Based on dual-center retrospective data, this study employed a two-stage stratified modeling strategy integrating intratumoral and peritumoral radiomic features from DCE-MRI to systematically evaluate their diagnostic performance in differentiating HER2 expression status. Although Zhou et al. [22] retrospectively analyzed multimodal MRI data from 992 breast cancer patients and developed a support vector machine (SVM) model combining intratumoral and peritumoral regions (2 mm, 4 mm, 6 mm, 8 mm) to predict HER2 status—concluding that the intratumoral plus 4 mm peritumoral model based on DCE-MRI offered superior predictive value—our findings in the first part of the study did not align with theirs.
In our training set, the model incorporating the 5 mm peritumoral region exhibited excellent AUC performance, substantially outperforming the intratumoral-only model. However, this statistically significant advantage was entirely lost in both internal and external validation. Instead, the intratumoral model, which performed modestly in the training set, demonstrated the strongest stability. The DeLong test further indicated that the combined models were not significantly superior to the intratumoral model, and no statistically significant differences were observed among any of the models. These results suggest that the AUC value alone is insufficient for evaluating model generalizability. In the presence of data distribution shifts between training and validation sets, statistically significant findings from the training phase may be reversed, potentially leading to erroneous clinical decisions. Even when all models show comparable performance, differences in stability can critically affect their clinical applicability.
To mitigate this limitation, we introduce the Generalization Deterioration Index (GDI) as a novel metric for assessing model stability. Traverso et al. [23] highlighted that the reproducibility and stability of radiomic features constitute key bottlenecks hindering the clinical translation of radiomic models. The GDI introduced in this study directly addresses the gap in the quantitative assessment of model generalization ability inherent to conventional metrics, and aligns closely with the principle of “multi-dimensional validation” emphasized by Traverso et al. Kelly et al. [24] noted that a core barrier to the clinical translation of artificial intelligence models lies in their generalization stability across heterogeneous patient populations, as overreliance on high AUC values in the training set can result in erroneous clinical decision-making. GDI not only offers an intuitive characterization of a model’s generalization ability but also circumvents non-comparability issues stemming from absolute differences in AUC values, while exhibiting high sensitivity to model overfitting. Vickers et al. [25] noted that conventional AUC fails to quantify the actual net clinical benefit of predictive models for clinical decision-making, which necessitates the incorporation of Decision Curve Analysis (DCA) to assess clinical utility. The development of GDI is predicated on a similar rationale for methodological expansion. Incorporating GDI and the DeLong test into a multidimensional evaluation framework thus effectively mitigates the risk of model overfitting. In the present study, the intratumoral plus 5 mm peritumoral model yielded a GDI of approximately 25%, indicative of a high overfitting risk, whereas the intratumoral model—with a GDI of around 15%—was the only model to pass stability validation. Thus, for the differentiation of HER2-positive and HER2-negative statuses, the intratumoral model is deemed to have superior predictive value.
This phase of the study serves as an important caution: the apparent “premium” performance gained from high-dimensional feature sets in the training set may mask underlying exponential generalization debt. It underscores that model evaluation should not rely solely on the AUC, but should incorporate multiple validation methods—such as the DeLong test and decision curve analysis—together with novel tools like the GDI, to identify more robust and generalizable prediction models.
In the second part of this study, focusing on the fine classification of HER2-low and HER2-zero expression, the model incorporating both intratumoral and peritumoral 3 mm regions demonstrated superior predictive performance compared to other configurations, along with good diagnostic efficacy. This model was further validated by the GDI as the most stable among the four groups, exhibiting excellent generalization ability.
These findings are consistent with previous research. Yan et al. [26] conducted a multicenter study that integrated ultrasound radiomics, clinicopathological features, and explainable artificial intelligence techniques to predict changes in HER2 status after neoadjuvant therapy. Their findings underscored the importance of integrating multi-regional radiomic features and performing multicenter validation to improve model robustness—an observation that aligns with the rationale of hierarchical modeling coupled with GDI-based evaluation adopted in the present study.
Bian et al. [27] developed models to distinguish HER2-zero from HER2-low expression by combining T1-weighted contrast-enhanced and ADC sequences with a 4 mm peritumoral region, and reported that their best radiomics model showed good calibration across all combined models—a result aligned with the present study.
The optimal model selected in our study suggests that the 3 mm peritumoral region may correspond to the stromal reaction zone at the tumor invasion front. Biological processes such as abnormal vascular proliferation, immune cell infiltration, and collagen remodeling in this region can be captured quantitatively through enhancement heterogeneity on DCE-MRI. These imaging characteristics may reflect distinct biological traits of HER2-high tumors and could be linked to mechanisms of trastuzumab resistance.
From a clinical perspective, radiomics not only aids in precise HER2 stratification among HER2-negative patients but may also offer decision-making support for patients who do not undergo breast surgery or whose core needle biopsy fails to detect HER2-low expression. This could help identify additional candidates who might benefit from ADCs therapies.
We hypothesize that the subtle clinicopathological and molecular heterogeneity between HER2-low and HER2-zero subtypes may be quantified using high-throughput radiomic features. However, this premise requires further validation in future studies. Zwanenburg et al. [28] launched the Imaging Biomarker Standardization Initiative (IBSI), which provides a unified framework for the extraction and standardization of radiomic features. Future studies may optimize radiomic feature extraction pipelines by adhering to IBSI standards, thereby improving the comparability and generalizability of study results. Additionally, Shamout et al. [29] developed an artificial intelligence (AI) system that dynamically tracks changes in patient clinical status, which exhibits considerable potential for clinical translation. This finding implies that the stratified model established in the present study could be further expanded to serve as a longitudinal assessment tool for predicting treatment response and clinical prognosis. Furthermore, Rivera et al. [30] proposed the SPIRIT-AI Extension Guidelines, which offer a standardized framework for designing clinical trials of AI-based therapeutic interventions. Future prospective studies adhering to these guidelines may be undertaken to further validate the clinical utility and efficacy of the model developed in the present study.
Notably, in clinical deployment, adoption of a sequential classification strategy—first distinguishing HER2-positive from HER2-negative cases, and subsequently differentiating HER2-low from HER2-zero subtypes within the HER2-negative subgroup—will result in misclassifications from the first-stage model directly introducing biases into the input of the second-stage model, thus compromising the overall classification accuracy. To mitigate such error propagation, the following mitigation strategies are proposed: generating probabilistic outputs instead of hard classification results from the first-stage model to inform comprehensive clinical decision-making; developing an integrated model that concurrently integrates features across both stages to enable joint prediction; and retaining pathological testing as a supplementary reference standard for clinically critical cases. Future studies should build on this work to perform prospective clinical validation studies, so as to assess the robustness of sequential models in real-world clinical workflows.
Furthermore, deep learning-driven automated segmentation techniques [31] have been widely adopted in biomedical image analysis. Future studies may incorporate such automated segmentation approaches into the workflow of the present study to further reduce inter-observer variability associated with manual segmentation and improve the reproducibility of the study methodology. Additionally, Skrede et al. [32] demonstrated the considerable potential of deep learning models for predicting tumor prognosis, indicating that the stratified modeling framework established in the present study could be further expanded through the incorporation of deep learning features to develop a more precise multimodal predictive model.
Based on dual-center retrospective data, this study employed a two-stage stratified modeling strategy integrating intratumoral and peritumoral radiomic features from DCE-MRI to systematically evaluate their diagnostic performance in differentiating HER2 expression status. Although Zhou et al. [22] retrospectively analyzed multimodal MRI data from 992 breast cancer patients and developed a support vector machine (SVM) model combining intratumoral and peritumoral regions (2 mm, 4 mm, 6 mm, 8 mm) to predict HER2 status—concluding that the intratumoral plus 4 mm peritumoral model based on DCE-MRI offered superior predictive value—our findings in the first part of the study did not align with theirs.
In our training set, the model incorporating the 5 mm peritumoral region exhibited excellent AUC performance, substantially outperforming the intratumoral-only model. However, this statistically significant advantage was entirely lost in both internal and external validation. Instead, the intratumoral model, which performed modestly in the training set, demonstrated the strongest stability. The DeLong test further indicated that the combined models were not significantly superior to the intratumoral model, and no statistically significant differences were observed among any of the models. These results suggest that the AUC value alone is insufficient for evaluating model generalizability. In the presence of data distribution shifts between training and validation sets, statistically significant findings from the training phase may be reversed, potentially leading to erroneous clinical decisions. Even when all models show comparable performance, differences in stability can critically affect their clinical applicability.
To mitigate this limitation, we introduce the Generalization Deterioration Index (GDI) as a novel metric for assessing model stability. Traverso et al. [23] highlighted that the reproducibility and stability of radiomic features constitute key bottlenecks hindering the clinical translation of radiomic models. The GDI introduced in this study directly addresses the gap in the quantitative assessment of model generalization ability inherent to conventional metrics, and aligns closely with the principle of “multi-dimensional validation” emphasized by Traverso et al. Kelly et al. [24] noted that a core barrier to the clinical translation of artificial intelligence models lies in their generalization stability across heterogeneous patient populations, as overreliance on high AUC values in the training set can result in erroneous clinical decision-making. GDI not only offers an intuitive characterization of a model’s generalization ability but also circumvents non-comparability issues stemming from absolute differences in AUC values, while exhibiting high sensitivity to model overfitting. Vickers et al. [25] noted that conventional AUC fails to quantify the actual net clinical benefit of predictive models for clinical decision-making, which necessitates the incorporation of Decision Curve Analysis (DCA) to assess clinical utility. The development of GDI is predicated on a similar rationale for methodological expansion. Incorporating GDI and the DeLong test into a multidimensional evaluation framework thus effectively mitigates the risk of model overfitting. In the present study, the intratumoral plus 5 mm peritumoral model yielded a GDI of approximately 25%, indicative of a high overfitting risk, whereas the intratumoral model—with a GDI of around 15%—was the only model to pass stability validation. Thus, for the differentiation of HER2-positive and HER2-negative statuses, the intratumoral model is deemed to have superior predictive value.
This phase of the study serves as an important caution: the apparent “premium” performance gained from high-dimensional feature sets in the training set may mask underlying exponential generalization debt. It underscores that model evaluation should not rely solely on the AUC, but should incorporate multiple validation methods—such as the DeLong test and decision curve analysis—together with novel tools like the GDI, to identify more robust and generalizable prediction models.
In the second part of this study, focusing on the fine classification of HER2-low and HER2-zero expression, the model incorporating both intratumoral and peritumoral 3 mm regions demonstrated superior predictive performance compared to other configurations, along with good diagnostic efficacy. This model was further validated by the GDI as the most stable among the four groups, exhibiting excellent generalization ability.
These findings are consistent with previous research. Yan et al. [26] conducted a multicenter study that integrated ultrasound radiomics, clinicopathological features, and explainable artificial intelligence techniques to predict changes in HER2 status after neoadjuvant therapy. Their findings underscored the importance of integrating multi-regional radiomic features and performing multicenter validation to improve model robustness—an observation that aligns with the rationale of hierarchical modeling coupled with GDI-based evaluation adopted in the present study.
Bian et al. [27] developed models to distinguish HER2-zero from HER2-low expression by combining T1-weighted contrast-enhanced and ADC sequences with a 4 mm peritumoral region, and reported that their best radiomics model showed good calibration across all combined models—a result aligned with the present study.
The optimal model selected in our study suggests that the 3 mm peritumoral region may correspond to the stromal reaction zone at the tumor invasion front. Biological processes such as abnormal vascular proliferation, immune cell infiltration, and collagen remodeling in this region can be captured quantitatively through enhancement heterogeneity on DCE-MRI. These imaging characteristics may reflect distinct biological traits of HER2-high tumors and could be linked to mechanisms of trastuzumab resistance.
From a clinical perspective, radiomics not only aids in precise HER2 stratification among HER2-negative patients but may also offer decision-making support for patients who do not undergo breast surgery or whose core needle biopsy fails to detect HER2-low expression. This could help identify additional candidates who might benefit from ADCs therapies.
We hypothesize that the subtle clinicopathological and molecular heterogeneity between HER2-low and HER2-zero subtypes may be quantified using high-throughput radiomic features. However, this premise requires further validation in future studies. Zwanenburg et al. [28] launched the Imaging Biomarker Standardization Initiative (IBSI), which provides a unified framework for the extraction and standardization of radiomic features. Future studies may optimize radiomic feature extraction pipelines by adhering to IBSI standards, thereby improving the comparability and generalizability of study results. Additionally, Shamout et al. [29] developed an artificial intelligence (AI) system that dynamically tracks changes in patient clinical status, which exhibits considerable potential for clinical translation. This finding implies that the stratified model established in the present study could be further expanded to serve as a longitudinal assessment tool for predicting treatment response and clinical prognosis. Furthermore, Rivera et al. [30] proposed the SPIRIT-AI Extension Guidelines, which offer a standardized framework for designing clinical trials of AI-based therapeutic interventions. Future prospective studies adhering to these guidelines may be undertaken to further validate the clinical utility and efficacy of the model developed in the present study.
Notably, in clinical deployment, adoption of a sequential classification strategy—first distinguishing HER2-positive from HER2-negative cases, and subsequently differentiating HER2-low from HER2-zero subtypes within the HER2-negative subgroup—will result in misclassifications from the first-stage model directly introducing biases into the input of the second-stage model, thus compromising the overall classification accuracy. To mitigate such error propagation, the following mitigation strategies are proposed: generating probabilistic outputs instead of hard classification results from the first-stage model to inform comprehensive clinical decision-making; developing an integrated model that concurrently integrates features across both stages to enable joint prediction; and retaining pathological testing as a supplementary reference standard for clinically critical cases. Future studies should build on this work to perform prospective clinical validation studies, so as to assess the robustness of sequential models in real-world clinical workflows.
Furthermore, deep learning-driven automated segmentation techniques [31] have been widely adopted in biomedical image analysis. Future studies may incorporate such automated segmentation approaches into the workflow of the present study to further reduce inter-observer variability associated with manual segmentation and improve the reproducibility of the study methodology. Additionally, Skrede et al. [32] demonstrated the considerable potential of deep learning models for predicting tumor prognosis, indicating that the stratified modeling framework established in the present study could be further expanded through the incorporation of deep learning features to develop a more precise multimodal predictive model.
Limitations
Limitations
This study has several acknowledged limitations. First, the sample size is limited, particularly for the HER2-low and HER2-zero subgroups. This small sample size may limit the statistical power of the analyses, elevate the risk of model overfitting, and restrict the generalizability of the models developed herein. The findings reported herein are thus considered a preliminary validation of the proposed methodological framework. Second, given the exclusion of data on treatment response and survival outcomes, the prognostic relevance of the identified radiomic features could not be evaluated in the present study. Third, the proposed Generalization Deterioration Index is inherently dependent on the composition of the validation set—a factor that may result in the underestimation of models’ actual generalization ability. Finally, while manual segmentation of intratumoral and peritumoral regions was deemed necessary to ensure analytical accuracy at this exploratory stage, this approach introduces inter-operator variability, thus restricting the reproducibility and scalability of the proposed methodology.
This study has several acknowledged limitations. First, the sample size is limited, particularly for the HER2-low and HER2-zero subgroups. This small sample size may limit the statistical power of the analyses, elevate the risk of model overfitting, and restrict the generalizability of the models developed herein. The findings reported herein are thus considered a preliminary validation of the proposed methodological framework. Second, given the exclusion of data on treatment response and survival outcomes, the prognostic relevance of the identified radiomic features could not be evaluated in the present study. Third, the proposed Generalization Deterioration Index is inherently dependent on the composition of the validation set—a factor that may result in the underestimation of models’ actual generalization ability. Finally, while manual segmentation of intratumoral and peritumoral regions was deemed necessary to ensure analytical accuracy at this exploratory stage, this approach introduces inter-operator variability, thus restricting the reproducibility and scalability of the proposed methodology.
Conclusion
Conclusion
This study employed a two-stage stratified modeling approach to systematically evaluate the value of DCE-MRI radiomics in the refined classification of HER2 expression status in breast cancer. The results demonstrated that the peritumoral region-based model exhibited good stability in distinguishing HER2-positive from HER2-negative tumors, while the combined intratumoral and 3 mm peritumoral model showed superior performance in differentiating HER2-low from HER2-zero expression. Furthermore, this study highlighted that statistically significant advantages in model performance during training cannot serve as the sole criterion for model selection, as apparent superiority in the training set may mask generalization risks. A multidimensional evaluation incorporating both predictive performance and stability is essential to minimize potential errors in clinical decision-making. Despite limitations related to sample size and technical challenges, this work provides a non-invasive approach for HER2 status assessment and introduces a novel perspective for radiomics model evaluation. Future research should seek to: 1) conduct large-scale, prospective multicenter validation studies to evaluate the model’s predictive performance in clinical practice and its utility for predicting treatment efficacy; 2) adopt more robust validation strategies (e.g., repeated nested cross-validation) to ensure the robustness of the model’s generalization ability; and 3) develop and integrate automated, semi-automated, and deep learning-based segmentation tools to improve analytical efficiency and standardization, thereby facilitating clinical translation. These endeavors will ultimately facilitate the optimization of personalized treatment strategies for patients with breast cancer.
This study employed a two-stage stratified modeling approach to systematically evaluate the value of DCE-MRI radiomics in the refined classification of HER2 expression status in breast cancer. The results demonstrated that the peritumoral region-based model exhibited good stability in distinguishing HER2-positive from HER2-negative tumors, while the combined intratumoral and 3 mm peritumoral model showed superior performance in differentiating HER2-low from HER2-zero expression. Furthermore, this study highlighted that statistically significant advantages in model performance during training cannot serve as the sole criterion for model selection, as apparent superiority in the training set may mask generalization risks. A multidimensional evaluation incorporating both predictive performance and stability is essential to minimize potential errors in clinical decision-making. Despite limitations related to sample size and technical challenges, this work provides a non-invasive approach for HER2 status assessment and introduces a novel perspective for radiomics model evaluation. Future research should seek to: 1) conduct large-scale, prospective multicenter validation studies to evaluate the model’s predictive performance in clinical practice and its utility for predicting treatment efficacy; 2) adopt more robust validation strategies (e.g., repeated nested cross-validation) to ensure the robustness of the model’s generalization ability; and 3) develop and integrate automated, semi-automated, and deep learning-based segmentation tools to improve analytical efficiency and standardization, thereby facilitating clinical translation. These endeavors will ultimately facilitate the optimization of personalized treatment strategies for patients with breast cancer.
출처: PubMed Central (JATS). 라이선스는 원 publisher 정책을 따릅니다 — 인용 시 원문을 표기해 주세요.
🏷️ 같은 키워드 · 무료전문 — 이 논문 MeSH/keyword 기반
- Diagnostic accuracy of Ga-PSMA PET/CT versus multiparametric MRI for preoperative pelvic invasion in the patients with prostate cancer.
- Early local immune activation following intra-operative radiotherapy in human breast tissue.
- Overall survival and prognostic factors in young women with breast cancer: a retrospective cohort study from Southern Thailand.
- Age at First Pregnancy, Adult Weight Gain and Postmenopausal Breast Cancer Risk: The PROCAS Study (United Kingdom).
- Whole-body MRI for staging and follow-up of primary musculoskeletal tumours: a systematic review.
- Advances in Targeted Therapy for Human Epidermal Growth Factor Receptor 2-Low Tumors: From Trastuzumab to Antibody-Drug Conjugates.