본문으로 건너뛰기
← 뒤로

Development and multicenter external validation of an intratumoral and peritumoral ultrasound-based radiomics model for preoperative prediction of HER2 status in IHC 2 + breast cancer.

1/5 보강
European journal of medical research 📖 저널 OA 83.9% 2021: 1/1 OA 2022: 2/2 OA 2023: 5/5 OA 2024: 5/5 OA 2025: 88/88 OA 2026: 26/49 OA 2021~2026 2025 Vol.31(1) p. 172
Retraction 확인
출처

PICO 자동 추출 (휴리스틱, conf 2/4)

유사 논문
P · Population 대상 환자/모집단
환자: IHC 2 + breast cancer
I · Intervention 중재 / 시술
추출되지 않음
C · Comparison 대조 / 비교
추출되지 않음
O · Outcome 결과 / 결론
DCA showed greater net benefit than the clinical or radiomics model across most thresholds. [CONCLUSIONS] A nomogram combining US-based intratumoral and peritumoral radiomics features with key clinical variables showed potential utility for noninvasive, preoperative prediction of HER2 status in patients with IHC 2 + breast cancer and may assist in individualized treatment planning.

Wang J, Qu N, Liu C, Lin Y, Cui Y, Cao X

📝 환자 설명용 한 줄

[BACKGROUND] Accurate assessment of human epidermal growth factor receptor 2 (HER2) status can guide eligibility for HER2-targeted therapy in breast cancer.

🔬 핵심 임상 통계 (초록에서 자동 추출 — 원문 검증 권장)
  • 표본수 (n) 308
  • 95% CI 0.772-0.869

이 논문을 인용하기

↓ .bib ↓ .ris
APA Wang J, Qu N, et al. (2025). Development and multicenter external validation of an intratumoral and peritumoral ultrasound-based radiomics model for preoperative prediction of HER2 status in IHC 2 + breast cancer.. European journal of medical research, 31(1), 172. https://doi.org/10.1186/s40001-025-03698-7
MLA Wang J, et al.. "Development and multicenter external validation of an intratumoral and peritumoral ultrasound-based radiomics model for preoperative prediction of HER2 status in IHC 2 + breast cancer.." European journal of medical research, vol. 31, no. 1, 2025, pp. 172.
PMID 41462365 ↗

Abstract

[BACKGROUND] Accurate assessment of human epidermal growth factor receptor 2 (HER2) status can guide eligibility for HER2-targeted therapy in breast cancer. We aimed to develop and externally validate a nomogram that combines ultrasound (US) radiomics features from intratumoral and peritumoral regions with clinical variables to predict HER2 status in patients with IHC 2 + breast cancer.

[METHODS] We retrospectively included 440 IHC 2 + breast cancers with FISH results and randomly split them into a training cohort (n = 308) and an internal testing cohort (n = 132). Two independent cohorts provided external validation (pooled, n = 153; single center, n = 102). Radiomics features were extracted from the intratumoral region (ITR), peritumoral region (PTR) at 1/3/5 mm, and combined intratumoral and peritumoral region (IPTR) on 2D US. The models were trained with mRMR and LASSO-regularized logistic regression. A Rad-score was derived and combined with key clinical variables to build a nomogram. Performance was assessed with the AUC, calibration curves, and DCA.

[RESULTS] The combined model using the IPTR3 Rad-score achieved AUCs of 0.821 (95% CI 0.772-0.869), 0.828 (95% CI 0.756-0.900), 0.774 (95% CI 0.697-0.851), and 0.803 (95% CI 0.699-0.906) in the training, internal testing, external validation 1, and external validation 2 cohorts, respectively. The calibration curves indicated good agreement. DCA showed greater net benefit than the clinical or radiomics model across most thresholds.

[CONCLUSIONS] A nomogram combining US-based intratumoral and peritumoral radiomics features with key clinical variables showed potential utility for noninvasive, preoperative prediction of HER2 status in patients with IHC 2 + breast cancer and may assist in individualized treatment planning.

🏷️ 키워드 / MeSH 📖 같은 키워드 OA만

같은 제1저자의 인용 많은 논문 (5)

📖 전문 본문 읽기 PMC JATS · ~73 KB · 영문

Background

Background
Breast cancer is the most commonly diagnosed cancer in women worldwide and remains a leading cause of cancer death [1]. Human epidermal growth factor receptor 2 (HER2) is a ligand-independent, transmembrane receptor tyrosine kinase of the ERBB family; its gene amplification or protein overexpression activates downstream oncogenic signaling and promotes tumor progression [2]. Epidemiologic data indicate that approximately 15–20% of breast cancers are HER2-positive, typically characterized by ERBB2 gene amplification and/or HER2 protein overexpression [3]. Randomized trials have shown that adding trastuzumab to chemotherapy reduces recurrence and improves survival in early HER2-positive disease [4]. Because HER2-targeted therapy is reserved for unequivocally HER2-positive patients, accurate assessment of HER2 status is essential for treatment planning. Recent reports highlight the sustained global burden of breast cancer and the growing dependence of treatment selection on biomarker status, reinforcing the need for reliable preoperative HER2 assessment to avoid misclassification that could affect eligibility and timing for targeted therapy [5, 6]. In routine practice, HER2 evaluation relies on immunohistochemistry (IHC) and in situ hybridization (ISH), most often FISH. Under ASCO/CAP guidance, IHC 2 + is equivocal and requires reflex ISH to determine gene amplification; amplified tumors are classified as HER2-positive and non-amplified tumors as HER2-negative [7]. Although IHC is relatively inexpensive and widely available, ISH/FISH is slower, more costly, and requires invasive tissue sampling [8]. Moreover, spatial heterogeneity and biopsy sampling error may yield results that do not fully represent the whole lesion and may influence treatment decisions [9].
Radiomics is an image-based, noninvasive approach that has been investigated to help address these limitations. It extracts quantitative features such as texture and shape from medical images and may reflect imaging correlates of tumor microstructure, functional characteristics, and underlying tumor biology. In breast cancer, radiomics has been increasingly applied to tasks, such as diagnosis, molecular subtyping, and prognostic assessment [10]. Imaging plays a central role in oncologic evaluation by integrating morphological and functional information for diagnosis, staging, and treatment planning. FDG–PET complements anatomic imaging by providing noninvasive metabolic information that supports cancer staging and treatment monitoring [11]. Compared with higher cost modalities, such as MRI or CT, ultrasound (US) is inexpensive, radiation-free, and available in real time, so it is widely used for breast cancer screening and lesion evaluation. Building on routine clinical use, US-based radiomics enables quantitative phenotyping relevant to tumor biology and preoperative risk stratification. However, evidence for US-based radiomics in HER2 prediction remains limited, and further multicenter validation is needed [12]. Most radiomics studies that predict HER2 status have used MRI. Several have suggested that combining features from intratumoral and peritumoral regions is associated with HER2 expression and that adding clinical variables may further improve model performance [13, 14]. Biologically, breast cancer includes not only malignant epithelial cells but also a surrounding microenvironment composed of immune cells, fibroblasts, vessels, and extracellular matrix. Activity within this microenvironment is closely linked to tumor aggressiveness, molecular subtype, and signaling through the HER2 pathway [15]. Prior studies have suggested that peritumoral imaging features may partly reflect the tumor microenvironment. In MRI studies, investigators often define an annular peritumoral band around the lesion and extract radiomics features from this region. These peritumoral features have been linked to tumor-infiltrating lymphocytes, HER2-related molecular phenotypes, and response to neoadjuvant therapy [16–18]. US-based studies often contour a 3–5 mm peritumoral region for tasks, such as molecular subtyping or HER2-related prediction, further supporting inclusion of a peritumoral ROI in US radiomics analyses [19]. Reported optimal peritumoral distances vary across studies, and distances that are too small or too large may compromise model performance. This suggests that the choice of peritumoral distance may depend on the study objective and imaging technique. Taken together, combining intratumoral and peritumoral radiomics features may improve the robustness of HER2 prediction. Compared with models that use intratumoral features alone, models that incorporate peritumoral features often show higher area under the curve (AUC), sensitivity, and specificity, with similar findings in both MRI- and US-based studies [13, 19, 20]. We, therefore, hypothesize that the peritumoral microenvironment contains quantifiable imaging information related to HER2 pathway activity and immune contexture, which may support the preoperative prediction of HER2 status in IHC 2 + breast cancer.
HER2 expression also shows marked intratumoral heterogeneity. In HER2-positive breast cancer, intratumoral heterogeneity has been reported in approximately 11–40% of tumor cell populations and is associated with a suboptimal response to neoadjuvant anti-HER2 therapy, a higher risk of distant metastasis, and worse overall survival [21–23]. This heterogeneity can undermine the consistency of IHC/ISH scoring and increase discordance between core-needle biopsy and surgical specimens. As a result, a single biopsy may not reflect the true HER2 status of a tumor [9, 24]. Radiomics can quantitatively characterize the whole lesion and the surrounding tissue and offers a noninvasive way to capture spatial heterogeneity that complements tissue sampling. It has been used for molecular status assessment and for predicting treatment response [25, 26].
In this study, we extracted radiomics features from intratumoral regions and from 1-, 3-, and 5-mm peritumoral regions on 2D US and integrated these features with key clinical variables to develop and validate a noninvasive model for the preoperative prediction of HER2 status in IHC 2 + breast cancer. Our goal was to improve the efficiency and accuracy of HER2 assessment to better support individualized treatment planning.

Methods

Methods

Patient selection
This retrospective study was conducted in accordance with the ethical principles of the Declaration of Helsinki and was approved by the Academic Ethics Committee of the participating centers (approval details blinded for review). Because of the retrospective design, the requirement for written informed consent was waived, and all patient data were anonymized before analysis.
We retrospectively screened breast cancer patients treated at Center A between August 2015 and August 2025. Of the 722 patients initially reviewed, inclusion criteria were as follows: (a) invasive breast cancer confirmed by preoperative tissue biopsy; (b) preoperative breast US performed within 2 weeks, with image quality adequate for post-processing; and (c) an HER2 IHC score of 2 + with available FISH results. Exclusion criteria were: (a) prior breast surgery, radiotherapy, chemotherapy, or endocrine therapy before the US examination; (b) incomplete clinical or ultrasound data; (c) poor ultrasound image quality or artifacts that precluded radiomics analysis (e.g., extensive calcification shadowing or dominant cystic change), or incomplete lesion visualization; and (d) multifocal lesions or non-mass-like breast cancer. In total, 440 patients were included and randomly assigned to the training and internal testing cohorts in a 7:3 ratio (training cohort, n = 308; internal testing cohort, n = 132).
To assess external applicability, we assembled additional patients from two independent centers for external validation. External validation cohort 1 comprised 153 patients pooled from two centers: Center B (156 initially screened; 102 included) and Center C (81 initially screened; 51 included). Because the proportion of HER2-positive cases was markedly higher in Center C, its data were combined with those from Center B to reduce systematic bias. External validation cohort 2 consisted of an independent cohort of 102 patients from Center B and was used to evaluate the model’s center-specific generalizability. All external validation patients met the same inclusion and exclusion criteria as the internal cohorts and had complete IHC and FISH results. The detailed screening process is shown in Fig. 1.

Clinical characteristics
We retrieved clinical and histopathologic data from the electronic medical records. Histopathologic variables included HER2 status, Ki-67 (high expression ≥ 14%, low expression < 14%), estrogen receptor (ER) status, and progesterone receptor (PR) status. ER/PR status was marked positive if IHC staining in at least 1% of tumor nuclei; otherwise, it was marked as negative. Clinical variables included patient age and lesion diameter on US, which was measured as the maximal transverse diameter on the largest cross-sectional image. We also recorded margin characteristics, presence of calcifications, posterior acoustic features, and lymph node (LN) metastasis.

Ultrasound image acquisition
All patients underwent preoperative breast US within 2 weeks before surgery. Examinations were performed by sonographers with more than 7 years of experience in breast imaging and followed a standardized protocol. A linear-array transducer (L3–12A) was used to scan both breasts in transverse and longitudinal planes. All images were stored in Digital Imaging and Communications in Medicine (DICOM) format to preserve complete information for subsequent analysis. To ensure image quality and comparability, we used uniform settings: an overall gain of approximately 50% and an imaging depth of 3.0–5.0 cm. The transverse image was acquired at the lesion’s maximal diameter.

Image segmentation
We selected the US image showing the lesion’s maximal cross section for manual segmentation. Regions of interest (ROIs) were the intratumoral region (ITR), the peritumoral region (PTR), and the combined intratumoral and peritumoral region (IPTR). First, a sonographer with more than 7 years of experience in breast imaging manually delineated the tumor boundary using the open-source platform 3D Slicer (v5.0.1) to obtain the ITR. A second sonographer with more than 10 years of experience reviewed and confirmed the annotation to ensure consistency. Both readers were blinded to all clinical and pathologic information.
After obtaining the ITR, we isotropically expanded its contour by 1, 3, and 5 mm to create three distinct combined intratumoral and peritumoral regions. For each image, the in-plane pixel spacing (mm per pixel) was obtained from the DICOM metadata and used to convert 1, 3, and 5 mm into image-specific pixel distances for isotropic dilation, ensuring consistent physical margins across centers and devices. The dilation was performed in Python. Any ROI portions extending beyond the breast parenchyma were removed manually to avoid the inclusion of non-breast tissues. We then subtracted the original ITR from each IPTR to generate three distinct peritumoral regions.
In total, seven distinct ROIs were identified for each patient: ITR, PTR1/3/5 (peritumoral regions at 1, 3, and 5 mm); and IPTR1/3/5 (intratumoral and peritumoral regions at 1, 3, and 5 mm). The segmentation scheme for each ROI type is illustrated in Fig. 2.

Radiomics feature extraction and selection
After segmentation of the ITR and PTR ROIs, we extracted radiomics features for each ROI using the PyRadiomics extension in 3D Slicer. Using the IPTR3 as an example, 847 features were obtained, comprising 10 shape-based attributes, 162 first-order statistics (e.g., mean, variance, skewness, and kurtosis), and 675 texture features. The texture features were grouped into five texture matrices, comprising 216 Gy-level co-occurrence matrix (GLCM) features, 126 Gy-level dependence matrix (GLDM) features, 144 Gy-level run length matrix (GLRLM) features, 144 Gy-level size zone matrix (GLSZM) features, and 45 neighbouring grey tone difference matrix (NGTDM) features. In addition to features from the original images, we generated filtered features by applying PyRadiomics image filters (including LoG, wavelet, exponential, square, gradient, and logarithm) to each ROI, which increased the richness of the feature set. All extraction procedures followed the Image Biomarker Standardisation Initiative (IBSI) recommendations to improve comparability and reproducibility.
Before feature selection, we standardized all data: missing values were imputed using the median, and all the features were z score normalized (mean 0, standard deviation (1) to reduce systematic biases from different ultrasound devices and acquisition parameters and to provide a uniform basis for subsequent feature selection and modeling. The data set was then stratified and randomly sampled in a 7:3 ratio to form the training and testing cohorts, ensuring balanced and representative distributions. To improve model stability and interpretability, we adopted a four-step feature selection workflow. First, interobserver reproducibility of each radiomics feature was evaluated using the intraclass correlation coefficient (ICC) analysis. Features with ICC > 0.85 were retained to ensure acceptable agreement and to reduce variability attributable to ROI delineation. Second, in the training cohort, minimum redundancy maximum relevance (mRMR) was applied to preliminarily select the top 20 candidate features with high relevance to HER2 status and low redundancy. Third, least absolute shrinkage and selection operator (LASSO) logistic regression with tenfold cross-validation was used to determine the optimal regularization parameter (λ) and further reduce dimensionality, retaining 10 features. Finally, these 10 features were entered into stepwise multivariable logistic regression guided by the minimum Akaike information criterion (AIC), yielding 7 independent predictors most strongly associated with HER2 status and constituting the radiomics signature.

Construction and internal testing of the radiomics model
We built separate radiomics models for the ITR, PTR, and IPTR configurations using mRMR prefiltering followed by LASSO-regularized logistic regression; the Rad-score was defined as the linear predictor of the final model. During training, we tuned each model with three times repeated fivefold stratified cross-validation, ensuring that every sample was strictly assigned to its prespecified training and testing folds. After training, model performance in the internal testing cohort was assessed for discrimination, calibration, and clinical applicability. Receiver operating characteristic (ROC) curves were plotted for the training and internal testing cohorts, and we calculated the AUC, sensitivity, specificity, accuracy, PPV, and NPV to describe model discrimination and stability. We compared performance across models and ROI types and selected the model with the highest AUC and optimal calibration in the internal testing cohort for subsequent analyses. In addition, to further visualize the model’s individual predictive ability, we computed the Rad-score for all patients in the training and internal testing cohorts and generated waterfall plots to visually display the distribution of the Rad-score across patients with different HER2 status.

Construction and evaluation of the clinical model
In the training cohort, we screened clinical and pathologic variables. We first used univariable logistic regression to identify variables potentially associated with HER2 status and retained those with p < 0.200, yielding six candidates. These candidates were then entered into stepwise multivariable logistic regression for further selection, with model simplification guided by the minimum AIC. Ultimately, age, LN metastasis, ER status, and Ki-67 were retained in the final clinical model to predict HER2 status. After model construction, we evaluated predictive performance and stability in the training and internal testing cohorts by plotting ROC curves and calculating the AUC, along with sensitivity, specificity, and accuracy.

Development of the combined radiomics–clinical model and nomogram
We integrated the optimal Rad-score with the independent clinical predictors to build a combined radiomics–clinical prediction model using multivariable logistic regression. On the basis of this model, we constructed a nomogram that displays each variable’s contribution to HER2 prediction and allows clinicians to estimate the preoperative probability of HER2 positivity in IHC 2 + breast cancer.
We evaluated the combined model with ROC curves and AUC and compared it with the clinical model alone and the radiomics model alone in terms of discrimination, calibration, and clinical utility. Calibration curves and the Hosmer–Lemeshow goodness-of-fit test were used to assess agreement between predicted probabilities and pathology-confirmed HER2 status. Clinical utility was assessed with decision curve analysis (DCA) by quantifying the model’s net benefit across a range of threshold probabilities.

External validation of the optimal model
To examine cross-center generalizability and clinical stability, we applied the optimal combined model from the training phase to data from two independent external centers. All external data were anonymized, and inclusion and exclusion criteria matched those of the internal cohorts. Imaging and clinical variables were preprocessed with parameters derived from the training cohort to avoid information leakage.
Because the proportion of HER2-positive patients was much higher in center C, we did not model its data separately to reduce systematic bias. Instead, data from center C were pooled with those from center B to form the overall external validation cohort (n = 153). In parallel, the larger and more balanced data set from center B alone was used to build a single-center external validation cohort (n = 102) for evaluating the model’s generalizability and calibration performance in one external center.
In both external validation cohorts, we evaluated performance using the prespecified threshold from the training phase and compared results with those in the training and internal testing cohort. Discrimination was assessed with ROC curves and AUC with 95% confidence intervals, and AUCs were compared using the DeLong test. Calibration was evaluated with calibration curves and the Hosmer–Lemeshow goodness-of-fit test. Clinical utility was assessed with DCA by quantifying net benefit across a range of threshold probabilities.

Statistical analysis
All the statistical analyses were performed via IBM SPSS Statistics (v26.0), R (v4.0.2), or Python (v3.6.8). Continuous variables were compared between groups with the independent-samples t test or the Mann–Whitney U test according to normality, and categorical variables with the chi-square test. Univariable and multivariable logistic regression were used to identify clinical variables independently associated with HER2 status. LASSO modeling was implemented with the R package “glmnet” and tuned by tenfold cross-validation for radiomics feature selection. Model performance metrics included AUC, accuracy, sensitivity, and specificity. The optimal decision threshold was determined by the Youden index, and ROC curves were plotted in Python. Differences in AUC between models integrating features from different intratumoral and peritumoral regions were assessed using the paired DeLong test. Calibration curves were plotted using the R package “gbm”, and the Hosmer–Lemeshow goodness-of-fit test was performed with “generalhoslem” to assess the calibration performance of the nomogram model. DCA was performed using the R packages "rmda" and “ggDCA” to evaluate clinical utility. A two-sided p value < 0.05 was considered statistically significant.

Results

Results

Patient characteristics
A total of 440 breast cancer patients meeting the inclusion and exclusion criteria were included for model training and internal testing, comprising 163 HER2-positive patients and 277 HER2-negative patients. All patients were randomly assigned in a 7:3 ratio to a training cohort (n = 308) and an internal testing cohort (n = 132), with 117 and 46 HER2-positive patients, respectively. Most clinical variables did not differ significantly between HER2-positive and HER2-negative groups (all p > 0.05). In the training cohort, the HER2-positive group differed significantly from the HER2-negative group in age, LN metastasis, ER status, PR status, and Ki-67 (all p < 0.05), with age and Ki-67 showing highly significant differences (p < 0.001). In the internal testing cohort, the HER2-positive group was older and had a higher proportion of high Ki-67 expression than the HER2-negative group (p < 0.05), suggesting that these variables may be potential predictors, whereas the remaining variables were not significant. Details are presented in Table 1(at the end of the document text file).

Table 2 summarizes the associations between clinical characteristics and HER2 status from univariable and multivariable logistic regression analyses. In the univariable analysis, age, boundary, LN metastasis, ER status, PR status, and Ki-67 were significant predictors of HER2 status. In multivariable analysis, age (OR = 1.04; 95% CI 1.02–1.07; p < 0.001), LN metastasis (OR = 0.53; 95% CI 0.32–0.89; p = 0.016), and Ki-67 (OR = 5.37; 95% CI 2.18–13.25; p < 0.001) remained independent predictors (all p < 0.05; Table 2).

Radiomics model development and performance evaluation
To ensure robustness and reproducibility, we first extracted gray-scale US features from seven distinct ROIs in the training cohort (ITR, PTR1, PTR3, PTR5, IPTR1, IPTR3, IPTR5) and removed low-consistency features using an ICC threshold of ≥ 0.85. We then applied mRMR to select 20 candidate features associated with HER2 status, followed by LASSO regression to retain 10 features. These features were entered into stepwise multivariable logistic regression with AIC, yielding 7 features independently associated with HER2 status (Fig. 3A). In the training cohort, pairwise Spearman correlation among the 7 retained features was computed and visualized (Fig. 3B). Aside from the unit diagonal, all absolute pairwise coefficients were < 0.70, indicating a low risk of multicollinearity. Rad-scores were then calculated for ITR, PTR1/3/5, and IPTR1/3/5, and each Rad-score was used as a continuous predictor to build the corresponding radiomics model.
Across the training and internal testing cohorts, the AUCs of these ROI-specific models ranged from 0.693 (95% CI 0.599–0.787) to 0.752 (95% CI 0.665–0.840) (Fig. 4). The IPTR3 model achieved the highest AUC in the internal testing cohort at 0.752 (95% CI 0.665–0.840), with AUCs of 0.744 (95% CI 0.686–0.803) in the training cohort, 0.716 (95% CI 0.632–0.799) in external validation cohort 1, and 0.738 (95% CI 0.626–0.849) in external validation cohort 2. Given its higher internal performance and stable external performance, subsequent radiomics analyses used the IPTR3-based Rad-score as the radiomics predictor. Unless otherwise specified, “radiomics model” refers to the IPTR3 radiomics model.

Radiomics score calculation
For each patient, the Rad-score was calculated as a linear combination of the selected features, expressed as z scores and weighted by their respective coefficients from the final multivariable logistic regression model. In both the training and internal testing cohorts, Rad-scores were significantly higher in the HER2-positive group than in the HER2-negative group (training cohort:−0.09 vs−1.08; internal testing cohort: 0.08 vs−0.85; Mann–Whitney U test, both p < 0.001; Fig. 5E, F). Similar results were obtained in both external validation cohorts (external validation cohort 1: −0.06 vs−0.96; external validation cohort 2: 0.20 vs−0.88; Mann–Whitney U test, both p < 0.001; Fig. 5G, H). To show individual distributions, Figs. 5A, B and 5 C, D present patient-level waterfall plots ordered by Rad-score and color coded by HER2 status.

Construction and performance evaluation of the clinical model and nomogram
To compare the incremental value of clinical variables and radiomics, we first built a clinical model in the training cohort using four independent clinical predictors (age, LN metastasis, ER status, and Ki-67). The AUCs were 0.728 (95% CI 0.672–0.784) in the training cohort, 0.703 (95% CI 0.614–0.791) in the internal testing cohort, 0.681 (95% CI 0.594–0.768) in external validation cohort 1, and 0.659 (95% CI 0.548–0.769) in external validation cohort 2. We then integrated the IPTR3 Rad-score with the selected clinical predictors to construct a combined model and developed a nomogram based on this model to enable individualized prediction of HER2 status (Fig. 6). The combined model achieved AUCs of 0.821 (95% CI 0.772–0.869) in the training cohort, 0.828 (95% CI 0.756–0.900) in the internal testing cohort, 0.774 (95% CI 0.697–0.851) in external validation cohort 1, and 0.803 (95% CI 0.699–0.906) in external validation cohort 2. Compared with the clinical model, the combined model performed best across cohorts (DeLong test, p < 0.001, p = 0.003, 0.020 and 0.010). Compared with the radiomics model, the combined model likewise showed higher AUCs (DeLong test, p < 0.001, p = 0.020, 0.100 and 0.200). In the training cohort, at the prespecified threshold of 0.293, the combined model yielded a sensitivity of 82.4% and a specificity of 68.5% (Table 3). ROC curves for the radiomics model, the clinical model, and the combined model in each cohort are shown in Fig. 7A–D.
The calibration curves indicated good calibration of the combined model across the training, internal testing, and external validation cohorts, with close agreement between predicted and observed HER2 status (Fig. 8A). The Hosmer–Lemeshow test showed no significant lack of fit in any cohort (all p > 0.05). DCA compared the standardized net benefit of the three models across threshold probabilities. In the internal testing cohort and both external validation cohorts, the combined model consistently yielded higher net benefit than the clinical model and the IPTR3 radiomics model across most clinically relevant thresholds; all three models clearly exceeded the treat-all and treat-none baselines (Fig. 8B–D). These results indicate that, across cohorts and centers, nomogram-based clinical decision-making can provide higher clinical net benefit, supporting the model’s utility in clinical practice.

Discussion

Discussion
HER2-positive breast cancer is typically more aggressive and has a poorer overall prognosis. However, with the advent of HER2-targeted therapies, progression-free survival and overall survival have improved significantly [3, 27]. For patients with IHC 2 + breast cancer, the current ASCO/CAP guidelines recommend reflex ISH, most commonly FISH, to confirm HER2 gene amplification. The result directly determines HER2 positivity as well as the eligibility and timing of targeted therapy [7]. In this context, we extracted intratumoral and peritumoral radiomics features from 2D US and combined them with key clinical variables to develop a noninvasive preoperative model for predicting HER2 status in IHC 2 + breast cancer, and we constructed a corresponding nomogram. The combined model achieved AUCs of 0.821 and 0.828 in the training and internal testing cohorts and maintained favorable discrimination and calibration in two external validation cohorts. DCA showed that the combined model provided higher net benefit than either the clinical model alone or the radiomics model alone. Taken together, the nomogram may help with preoperative risk stratification and FISH triage in the IHC 2 + population and may optimize testing pathways and resource allocation while not replacing histologic or molecular pathology testing.
In US radiomics research on breast cancer, ROI selection is critical. Prior studies have largely focused on intratumoral texture and morphologic features to characterize intratumoral imaging phenotypes and to distinguish molecular subtypes, laying the groundwork for radiomics-based assessment of HER2 status, whereas peritumoral regions have received less attention [28]. However, several studies have indicated that the tumor margin and the surrounding peritumoral tissue likewise contain information related to breast cancer biology, and models that combine intratumoral and peritumoral features often outperform models that use intratumoral features alone [29, 30]. Peritumoral features likely capture local tumor–microenvironment alterations and treatment response. For example, in an MRI-based radiomics study, Braman and colleagues reported that IPTR features were associated with HER2-related molecular phenotypes and with pathologic response to preoperative HER2-targeted therapy in HER2-positive disease [13, 29]. An ABVS-based study likewise suggested that models incorporating peritumoral 3- and 5-mm regions significantly improved discrimination of HER2 status [19]. Guided by these findings, we incorporated features from both intratumoral and peritumoral ROIs during model development and systematically compared models based on ITR, PTR1/3/5, and IPTR1/3/5. IPTR3 performed best, with AUCs of 0.744 and 0.752 in the training and internal testing cohorts, and it outperformed the intratumoral-only model, which had AUCs of 0.719 and 0.715. These results are consistent with prior reports that combine intratumoral and peritumoral information. Using IPTR3 as the optimal radiomics signature, we developed a nomogram by integrating intratumoral and peritumoral radiomics features with key clinical variables. In the internal testing cohort, the combined model achieved an AUC of 0.828, higher than both the clinical model alone and the radiomics model alone, indicating that integrating intratumoral and peritumoral 2D US radiomics with key clinical information can further improve the model’s predictive performance for HER2 status in IHC 2 + breast cancer. This finding is consistent with prior ultrasound-based radiomics studies on HER2 assessment [20, 31].
Wu et al. used US radiomics to preoperatively predict the expression of multiple molecular biomarkers in mass-type DCIS, enrolling 116 patients and selecting 41 ultrasound radiomics features (more than 10% of the training sample) to build the model [32]. Their model achieved an AUC of 0.940 in the training cohort but droped to 0.740 in the validation cohort, suggesting that small samples with many candidate predictors are prone to overfitting and optimism bias, which can lead to poorer performance at validation [33]. By contrast, this study included a larger sample (n = 440) and covered a broader range of histologic subtypes (for example, invasive ductal carcinoma and invasive lobular carcinoma), which may improve generalizability. During model development, we trained separate radiomics models for the ITR, PTR, and IPTR configurations using mRMR prefiltering followed by LASSO-regularized logistic regression, and we chose the optimal ROI configuration based on discrimination, calibration, and DCA for subsequent validation. We then integrated the optimal radiomics signature with key clinical variables, including age, LN metastasis, ER status, and Ki-67, to construct a nomogram. In the internal testing cohort, the combined model achieved a higher AUC than both the radiomics model and the clinical model and showed good calibration.
We adopted two external validation strategies. First, we pooled data from the two external centers to estimate overall discrimination and calibration. Second, we performed a single-center external validation at the larger and better balanced site to assess center-level performance. Across the four data sets, the combined model achieved significantly higher AUCs than the clinical model. Compared with the radiomics model, the AUCs were numerically higher in both external validation cohorts, but the DeLong test did not reach statistical significance (p = 0.10 and 0.20). These results suggest that when the external validation center differs from the development center in patient characteristics (for example, the proportion of HER2-positive cases) or in image acquisition conditions (protocols and equipment), model performance and calibration across centers may be affected. At new centers, model’s predicted probabilities may show systematic shifts; therefore, we recommend simple recalibration before application. For example, updating the intercept and the overall calibration slope can correct global probability bias and improve the accuracy and reliability of risk estimates [34–36]. In addition, cross-center heterogeneity in predictor measurement (such as differences in ultrasound acquisition and reconstruction) can degrade the model’s external performance. Measurement workflows should be standardized wherever possible, with intercept and slope recalibration applied as needed [37].
HER2 is a clinically actionable therapeutic target in breast cancer, and accurate assessment of HER2 status directly guides whether and when to start HER2-targeted therapy. In contrast, other biomarkers such as Ki-67 are used mainly for risk stratification and prognosis and are not usually used as the sole basis for targeted therapy decisions [38]. Importantly, HER2 status is not static. Multiple studies have shown discordance in ER, PR, and HER2 status between primary tumors and recurrent or metastatic lesions, showing that tumor phenotypes can change over the disease course or after treatment. These shifts, including conversion from HER2 positive to negative and vice versa, have important implications for subsequent therapy and prognosis [39, 40]. For example, some patients convert from HER2 positive to negative after neoadjuvant chemotherapy or HER2-targeted therapy, which may indicate reduced responsiveness to HER2-targeted agents. Others convert from HER2 negative to positive during the disease course, making them candidates for first-time or renewed HER2-targeted therapy. When feasible, retesting of receptors and molecular markers on postoperative specimens or accessible metastatic lesions is recommended to update treatment plans [38, 41]. Future work should move beyond a single preoperative timepoint and collect longitudinal paired histopathology and contemporaneous ultrasound images at key milestones, such as before and after neoadjuvant therapy and at recurrence or metastasis. This approach would allow evaluation of concordance between HER2 conversion and imaging phenotypes and assessment of model performance over time [42]. In appropriate populations, noninvasive liquid-biopsy approaches such as circulating tumor DNA (ctDNA) may be explored as complementary tools. For example, ctDNA could be used to track ERBB2/HER2 amplification and disease evolution, supporting treatment decisions and response assessment [43, 44]. However, their incremental value should be tested prospectively in combination with ultrasound radiomics.
Although our ultrasound radiomics-based model showed potential value for predicting HER2 status in the IHC 2 + population, several limitations remain. First, this was a retrospective study. Despite the inclusion of an internal testing cohort and two external validation cohorts, the overall sample size was still limited. Moreover, differences in patient characteristics across centers, such as the proportion of HER2-positive cases, may affect the model’s discrimination and calibration at external centers, thereby reducing the statistical significance of contrasts between comparator models. Prospective validation in larger, geographically diverse, multi-device external cohorts is warranted, and model recalibration or updating should be undertaken as needed to increase generalizability and stability [37, 45]. Second, we used 2D gray-scale US only and did not incorporate multiparametric information, such as shear-wave elastography, color Doppler, or contrast-enhanced ultrasound. Prior studies have shown that these ultrasound parameters provide complementary biological information (for example, tissue stiffness and vascularity), and combining them can further improve performance for diagnosis, subtyping, or response assessment [46, 47]. Third, ROIs were still manually delineated. Inter-operator variability and heterogeneity in ultrasound acquisition, reconstruction, and preprocessing can affect the reproducibility and cross-center consistency of radiomics features. Standardization of gray-level discretization, resampling, and texture calculations is recommended, and it is worth exploring deep-learning algorithms (for example, convolutional neural networks), (semi)automatic segmentation, or the use of automated breast ultrasound(ABUS) to reduce observer bias and operator dependence and to improve stability [48, 49]. Fourth, neoadjuvant chemotherapy can alter tumor biology, and HER2 status may convert between positive and negative during treatment; changes in HER2-low status across disease stages have also been reported, underscoring the importance of reassessing HER2 on residual tumor after neoadjuvant chemotherapy[50]. To minimize treatment-related bias, we enrolled IHC 2 + patients based on pretreatment biopsy results. In future work, longitudinal, paired pathology with contemporaneous ultrasound should be collected at key timepoints, including before and after neoadjuvant therapy and at progression, to evaluate how HER2 conversion affects model outputs and calibration. Fifth, in certain ROI configurations, the model’s AUC approached the 0.70 threshold, indicating room to enhance discrimination by refining feature processing and improving the signal-to-noise ratio [51]. Between-center differences in imaging and preprocessing may also weaken the model’s performance and calibration. Before applying the model at a new center, simple recalibration in the target population is advisable to correct systematic probability bias. If calibration remains unsatisfactory, established model-updating frameworks can be used to revise the model by re-estimating some or all coefficients or to extend the model by adding new predictors and re-evaluating performance; during external validation, the impact of between-center differences on performance and calibration should be systematically assessed and reported [34, 36, 52].

Conclusions

Conclusions
The combined model integrating IPTR3 radiomics features extracted from 2D US with key clinical variables demonstrated the best predictive performance in both internal and external cohorts, underscoring its potential as a non-invasive preoperative tool for predicting HER2 status in patients with IHC 2 + breast cancer. It is not intended to replace histology or ISH/FISH testing, but rather to estimate the probability of HER2 positivity, thereby guiding the prioritization of preoperative ISH/FISH testing and supporting clinical decision-making in patients with breast cancer.

출처: PubMed Central (JATS). 라이선스는 원 publisher 정책을 따릅니다 — 인용 시 원문을 표기해 주세요.

🏷️ 같은 키워드 · 무료전문 — 이 논문 MeSH/keyword 기반

🟢 PMC 전문 열기