본문으로 건너뛰기
← 뒤로

MRI-Based Radiomics Model for Classifying Axillary Lymph Node Burden and Disease-Free Survival in Patients With Early-Stage Breast Cancer.

1/5 보강
Journal of magnetic resonance imaging : JMRI 📖 저널 OA 40.4% 2024: 1/5 OA 2025: 7/14 OA 2026: 11/28 OA 2024~2026 2026 Vol.63(2) p. 378-393
Retraction 확인
출처

PICO 자동 추출 (휴리스틱, conf 2/4)

유사 논문
P · Population 대상 환자/모집단
환자: high- and low-ALN burden
I · Intervention 중재 / 시술
추출되지 않음
C · Comparison 대조 / 비교
추출되지 않음
O · Outcome 결과 / 결론
[DATA CONCLUSION] MRI-based radiomics models show promise for noninvasive evaluation of ALN burden and prognostic stratification of survival outcomes in breast cancer patients. [TECHNICAL EFFICACY] Stage 2.

Tong Y, Zhu Y, Wen S, Du M, Miao H, Zhou J, Wang M, Su MY

📝 환자 설명용 한 줄

[BACKGROUND] Axillary lymph node (ALN) burden is a key prognostic determinant in breast cancer and plays an important role in diagnosis and treatment planning.

🔬 핵심 임상 통계 (초록에서 자동 추출 — 원문 검증 권장)
  • 표본수 (n) 343
  • p-value p < 0.05
  • p-value p = 0.022
  • 95% CI 0.811-0.941
  • HR 2.9

이 논문을 인용하기

↓ .bib ↓ .ris
APA Tong Y, Zhu Y, et al. (2026). MRI-Based Radiomics Model for Classifying Axillary Lymph Node Burden and Disease-Free Survival in Patients With Early-Stage Breast Cancer.. Journal of magnetic resonance imaging : JMRI, 63(2), 378-393. https://doi.org/10.1002/jmri.70182
MLA Tong Y, et al.. "MRI-Based Radiomics Model for Classifying Axillary Lymph Node Burden and Disease-Free Survival in Patients With Early-Stage Breast Cancer.." Journal of magnetic resonance imaging : JMRI, vol. 63, no. 2, 2026, pp. 378-393.
PMID 41289056 ↗
DOI 10.1002/jmri.70182

Abstract

[BACKGROUND] Axillary lymph node (ALN) burden is a key prognostic determinant in breast cancer and plays an important role in diagnosis and treatment planning. The noninvasive assessment of ALN burden might improve patient stratification and guide individualized treatment.

[PURPOSE] To explore the potential of MRI-based radiomics in preoperative classification of ALN burden in early-stage breast cancer and to assess survival differences between patients with high- and low-ALN burden.

[STUDY TYPE] Retrospective.

[POPULATION] Pathologically confirmed breast cancer patients (n = 343): training (n = 170), testing (n = 73) and internal validation (n = 50) from center 1; center 2 (n = 50) for external validation.

[FIELD STRENGTH/SEQUENCE] 3T, dynamic contrast-enhanced (DCE) sequence.

[ASSESSMENT] Four different machine learning classifiers were used to develop clinical, radiomics, and combined models for preoperative ALN burden assessment (66 high-burden cases). DCE-MRI radiomics features were extracted, and the optimal model was used to determine the Radscore. A clinical model was derived from clinicopathological variables, and integrated with the Radscore to form a combined model. Kaplan-Meier and Cox regression analyses were performed to compare disease-free survival (DFS) between high- and low-burden groups.

[STATISTICAL TESTS] Intraclass Correlation Coefficient (ICC), LASSO, logistic regression, Mann-Whitney U tests, Chi-squared tests, DeLong's test, Area Under the Curve (AUC), Decision Curve Analysis (DCA), calibration curves and Kaplan-Meier analysis, with p < 0.05 as significant.

[RESULTS] The Random Forest-based combined model yielded AUCs of 0.881 (95% CI, 0.811-0.941) in the training set, 0.826 (0.716-0.917) in the testing set, 0.912 (0.811-0.985) in the internal validation set, and 0.881 (0.737-0.985) in the external validation set. When using the cut-off value determined from the training set, the overall accuracy was 0.759, 0.795, 0.840, and 0.860, respectively. Kaplan-Meier analysis revealed significant DFS differences between the model-classified high- and low-burden groups (p = 0.022, HR = 2.9).

[DATA CONCLUSION] MRI-based radiomics models show promise for noninvasive evaluation of ALN burden and prognostic stratification of survival outcomes in breast cancer patients.

[TECHNICAL EFFICACY] Stage 2.

🏷️ 키워드 / MeSH 📖 같은 키워드 OA만

같은 제1저자의 인용 많은 논문 (5)

📖 전문 본문 읽기 PMC JATS · ~33 KB · 영문

Introduction

1 ∣
Introduction
Breast cancer (BC) represents the most prevalent malignancy among women globally [1]. Axillary lymph node (ALN) burden, defined as the number of metastatic ALNs, constitutes a significant prognostic determinant in BC patients [2]. Thus, the assessment of ALN status is crucial for staging and prognostic prediction to select optimal treatment strategies [3, 4]. The ACOSOG Z0011 trial demonstrated that in early-stage (T1 or T2) breast cancer with one or two positive sentinel lymph nodes, aggressive axillary lymph node dissection (ALND) may be omitted without compromising survival outcomes [5]. For higher lymph node burden (≥ 3 positive nodes), however, a more aggressive treatment is recommended, such as ALND and chemotherapy [6, 7]. Accurate preoperative classification of lymph node burden is important for informed treatment decision-making to help choose optimal therapies that are sufficient to achieve a good outcome while avoiding unnecessary toxicities.
The current standard for evaluating ALN burden in early-stage BC patients is sentinel lymph node biopsy (SLNB) followed by ALND for patients with positive sentinel lymph nodes [8, 9]. However, the false-negative rate of SLNB may lead to understaging and delayed treatment as the procedure samples only a portion of axillary nodes and may not reflect the true extent of disease [10, 11]. On the other hand, the false positive diagnosis of enlarged reactive benign nodes may lead to overtreatment. Noninvasive complementary methods that may provide more accurate assessment of lymph node burden could provide valuable guidance for optimizing surgical management [12].
Breast MRI is a noninvasive modality commonly used for breast cancer diagnosis and preoperative staging, which might provide comprehensive information about tumor biological behavior [13]. Radiomics by extracting high-dimensional quantitative features from standard MRI sequences captures intratumoral heterogeneity through texture, intensity, and spatial descriptors that extend beyond human visual assessment. When integrated with machine learning, radiomics enables the development of predictive models capable of estimating axillary nodal burden preoperatively with promising accuracy [14]. Several studies have demonstrated that radiomics-based classifiers outperform traditional radiological evaluation in predicting sentinel and non-sentinel nodal involvement [15-17]. Beyond accuracy, radiomics offers practical clinical advantages. Its noninvasive and reproducible nature supports objective decision-making, helping stratify patients who may benefit from axillary surgery versus those who might safely avoid it [18-21].
Since early-stage breast cancer patients with one or two positive sentinel lymph nodes may consider omitting ALND, this study aimed to develop a machine learning model based on dynamic contrast-enhanced MRI (DCE-MRI) radiomic features to classify ALN burden by distinguishing patients with ≤ two positive nodes (i.e., 0, 1, or 2, low-ALN burden) from those with ≥ three nodes (high-ALN burden). Furthermore, disease-free survival (DFS) was analyzed to evaluate the prognostic value of the developed classifier.

Materials and Methods

2 ∣
Materials and Methods
2.1 ∣
Patient Population
The study was approved by the Institutional Review Board of two centers, and informed consent was waived due to the retrospective study design. Patients with stage T1-T2 invasive breast cancer diagnosed between January 2017 and December 2019 at Center 1 were used as the training and testing set. An internal validation set was obtained from patients imaged between December 2022 and June 2023 in a different scanner at the same center. The external validation set included patients with early-stage breast cancer diagnosed between January 2020 and December 2024 at Center 2. The selection process is provided in the flow chart shown in Figure 1. All patients underwent surgery with SLNB or ALND within 2 weeks after pre-operative MRI exams. The exclusion criteria were as follows: [1] patients with incomplete clinical and pathological data (n1 = 14, n2 = 11); [2] patients who had anti-tumor treatment before the surgery or MRI scan (n1 = 29, n2 = 14); [3] patients with incomplete MRI or poor image quality (n1 = 11, n2 = 4); [4] patients with a previous history of breast cancer (n1 = 7, n2 = 3). In total, 343 patients were included in the study. Of these, 243 patients from Center 1 were randomly allocated to the training and testing sets in a 7:3 ratio. The internal validation set comprised of 9 high-ALN and 41 low-ALN burden patients from Center 1, while the external validation set included 13 high-ALN and 37 low-ALN burden patients from Center 2. All patients received upfront surgery, and depending on the final pathological staging, adjuvant treatments were given according to the standard of care guidelines. All included patients were followed at the same institutions where they received treatment.
Patients in the training and testing sets were followed until January 2025, with a minimum surveillance of 5 years for survival analysis. The duration of follow-up was calculated as the elapsed time between the date of surgery and the last follow-up date, the occurrence of any event, or death. Disease-free survival (DFS) events were defined as follows: the time interval from the surgical intervention to the first occurrence of disease recurrence, distant metastasis, or death from any cause. Patients without DFS events were censored at the last follow-up [22].

2.2 ∣
Clinical and Pathologic Evaluation
The clinical characteristics included age and menopause status. The pathological characteristics assessed included clinical T stage, estrogen receptor (ER) status, progesterone receptor (PR) status, human epidermal growth factor receptor-2 (HER-2) status, Ki-67, histological type, molecular subtype, ALN burden, and ALN positive or negative status. Imaging parameters were also considered, which included the tumor size measured as the largest diameter of the tumor shown on DCE-MRI, clinical ALN status evaluated on MR images, and MRI BI-RADS scores. The information was obtained from a prospective reading by two radiologists.
Cases with three or more positive axillary lymph nodes were categorized as ‘high-ALN burden’, and those with zero, one, or two nodes as ‘low-ALN burden’ [5]. ER or PR were defined as positive with the nuclear staining of ≥ 1% of the tumor cells [23]. Ki-67 was defined as high expression with the nuclear staining ≥ 20% of the tumor cells [24]. According to the immunohistochemistry (IHC) score and fluorescence in situ hybridization (FISH), HER2 were defined as positive when IHC 3+ or IHC 2+ with FISH gene amplification; HER2 were defined as negative when IHC 0, 1+, or IHC 2+ with FISH negativity. Molecular subtypes included hormone receptor positive/HER2 negative (HR+/HER2−), HER2-positive (HER2+), and Triple-negative (TN) [25].

2.3 ∣
MRI Acquisition
The imaging protocol included axial T2-weighted imaging (T2WI), T1-weighted imaging (T1WI), diffusion weighted imaging (DWI), and dynamic contrast-enhanced MRI (DCE-MRI). The training and testing dataset were acquired on a 3.0T scanner (GE SIGNA HDx, GE Healthcare, Milwaukee), the internal validation dataset on a 3.0T scanner (MAGNETOM Prisma, Siemens Healthcare, Erlangen, Germany) and the external validation dataset on a 3.0T scanner (Discovery 750; GE Healthcare, Milwaukee). Detailed sequence parameters are listed in Table S1. Radiomic features were harmonized across scanners using the ComBat method to correct for batch effects arising from different MRI systems. This approach has been shown to effectively reduce scanner-related variability and improve the reproducibility of radiomic features.

2.4 ∣
MRI Interpretation
The review of MRI features was performed by two breast radiologists (Y.T. and S.W. with 3 and 2 years of experience interpreting breast MR) according to the MRI BI-RADS lexicon, who were blinded to patients’ clinical information. The clinical ALN status was evaluated from MR images that covered the axillary region. The ALNs with a round or irregular shape, short diameter ≥ 1 cm, cortical thickening (≥ 3 mm), and the absence of fatty hilum were considered suspicious for metastasis. The presence of ≥ 1 suspicious lymph node was classified as positive MRI-ALN status, while the absence of suspicious lymph nodes was classified as negative [26]. The determination of the ALN status by the interpreting radiologist at the time of MRI, as shown in the original MRI report, was used as the third independent reader. Any cases with discrepancies made by the independent readers were reviewed by a senior radiologist (J.Z.) with 20 years of experience, and the result was used in the final analysis.

2.5 ∣
Tumor Segmentation
The analysis flowchart for model development is shown in Figure 2. The original MR images were downloaded from the PACS system. ITK-SNAP software (http://www.itksnap.org/pmwiki/pmwiki.php) was used to perform lesion segmentation. A radiologist (Y.Z.) with 5 years of experience interpreting breast MRI delineated the region of interest (ROI) using the second post-contrast DCE images. The ROI included the slice demonstrating the maximal tumor diameter along with its immediately adjacent anterior and posterior slices. When multiple lesions were shown, only the largest one was chosen for analysis.

2.6 ∣
Radiomics Feature Extraction and ICC Evaluation
Radiomics features were extracted from the ROI using the PyRadiomics package (v3.0.1; Python v3.7.11) in accordance with the Image Biomarker Standardization Initiative (IBSI) guidelines. The features included Shape-based, First Order statistics, and Texture features (GLCM (Gray Level Co-occurrence Matrix), GLRLM (Gray Level Run-Length Matrix), GLSZM (Gray Level Size Zone Matrix), GLDM (Gray Level Dependence Matrix)), as well as higher-order features derived from Laplacian of Gaussian (LoG) and wavelet filters to capture image heterogeneity and multi-scale spatial details.
Radiomics features reproducibility was assessed using interclass correlation coefficients (ICCs) to evaluate interobserver agreement. Forty-six cases randomly selected from the training set were segmented independently by two additional breast radiologists (M.D. and H.M.) with 8 and 10 years of experience, respectively. An ICC > 0.75 was considered as good agreement.

2.7 ∣
Radiomics Feature Selection and the Construction of the Radiomics Model
ComBat harmonization was applied to reduce scanner/protocol effects on the extracted features [27]. SMOTE was tested during model optimization but ultimately not applied, as it yielded minimal improvement and could distort the real-world class distribution. The obtained radiomics features were individually standardized using MinMax normalization to ensure distributional consistency across the datasets. Features with ICCs < 0.75 were excluded to ensure reproducibility. Next, features with p values > 0.05 in the Mann–Whitney U-test were removed, retaining only statistically relevant variables.
For radiomics analysis, the selected feature dimension was reduced through least absolute shrinkage and selection operator (LASSO) and forward feature selection. Four machine learning algorithms: support vector machine (SVM), random forest (RF), logistic regression (LR) and extreme gradient boosting (XGB), were applied in the training set by using a nested 5-fold cross-validation and grid search strategy. This two-step strategy leverages the interpretability of LASSO with the flexible decision boundaries of ensemble learning, aiming to balance model sparsity and generalization. The classifier with the highest average AUC was selected to construct the radiomics model and calculate the radiomics score (Radscore).

2.8 ∣
Construction of the Combined Model
Based on all included clinical-pathological variables, univariate and multivariate statistical tests using analysis of variance (ANOVA) were employed to identify independent predictors to differentiate low- from high-ALN burden. These predictors were subsequently used to construct a clinical classification model.
To enhance the classification accuracy, the Radscore generated by the optimal classifier, along with clinical factors significantly associated with the outcome in univariate analysis, were integrated into a nomogram to construct the combined model.

2.9 ∣
Visualization of the Classification by Shapley Additive Explanations
To enhance interpretability in the classification model, Shapley Additive Explanations (SHAP) dependence plots were utilized [28]. These visualizations leverage SHAP values to quantify the contribution of each feature to individual classifications. By aggregating SHAP values, the plots highlight feature importance, nonlinear relationships, and interactions. This approach demystifies complex models by illustrating how specific variables impact classifications, allowing for the understanding of the key drivers and conditions that influence outcomes.

2.10 ∣
Survival Analysis
Kaplan–Meier curves were generated using the disease-free survival information of patients in training and testing sets, and differences between high- and low-ALN burden groups were assessed by the log-rank test. For the internal and external validation cohorts, short-term DFS was assessed at 24 months. To further assess the independent prognostic value of the Radscore, a multivariate Cox proportional hazards regression analysis was performed.

2.11 ∣
Statistical Analysis
Statistical analyses were performed using R software (version 4.1.3; www.r-project.org). Baseline clinical characteristics between high- and low-ALN burden groups were compared using appropriate statistical tests. Categorical variables were analyzed using the Chi-squared or Fisher’s exact test, and continuous variables with either the Mann–Whitney U test or Student’s t-test. The agreement among the three readers on positive and negative MRI-ALN status was assessed using Fleiss’ kappa. Univariable associations were assessed using logistic regression to mitigate small-sample bias and separation. For multivariable analysis, we additionally performed a prespecified enter model including all clinically relevant covariates to ensure full confounder adjustment, alongside a parsimonious model from stepwise selection as a sensitivity analysis.
ROC curves were used to evaluate the performance of the developed models. In addition, a confusion matrix was generated using the Minimum Difference Method, and the threshold determined from the training set was applied to the other three testing and validation sets to calculate accuracy, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV). Calibration curves were further used to evaluate the agreement between classified probabilities and actual outcomes.
All tests were two-tailed, with p < 0.05 considered statistically significant.

Results

3 ∣
Results
3.1 ∣
Clinical Characteristics and Feature Selection
Table 1 summarizes the clinical characteristics of the 343 patients with early-stage breast cancer. Of 343 patients, 277 had a low ALN burden and 66 had a high ALN burden. Significant differences in tumor size, MRI BI-RADS, and clinical MRI-ALN status were observed between high- and low-ALN burden groups in all sets (p < 0.05). The MRI-ALN status was prospectively determined by two independent readers, and a third reading was obtained from the results in the original report made by the interpreting radiologist at that time. Their results showed a Fleiss’ kappa of 0.540 [CI, 0.329, 0.719], indicating only moderate agreement.
As shown in Table 2, univariate analysis in the training set identified MRI BI-RADS, clinical MRI ALN status, tumor size, clinical tumor stage and Ki-67 as significant clinical predictors. In multivariate analysis, only MRI BI-RADS and MRI ALN status remained independently associated with ALN burden. Multivariate logistic regression models were fitted using both stepwise and enter methods (Table S2). Results were consistent across approaches, confirming the robustness of variable selection and confounder adjustment.

3.2 ∣
Radiomics Feature Extraction and Selection
A total of 1098 radiomic features were extracted from each ROI on DCE-MRI, after the ComBat harmonization (Figure S1). ICC analysis identified 1077 features with ICC > 0.75, which were retained for further selection. These included 61 original, 341 log-sigma, and 675 wavelet-transformed features. During the training phase, 1061 features were excluded through a combination of univariate Mann–Whitney U tests, LASSO regularization, and forward feature selection. The remaining 11 features, which demonstrated significant differences between high- and low-ALN tumor burden groups, were selected for model construction. The contributions of these selected features are illustrated in Figure 3.

3.3 ∣
Model Construction and Assessment
Using the 11 selected features, four machine learning models were constructed by employing SVM, RF, LR and XGB classifiers. The performance of the four classifiers in the different datasets is presented in Table 3. The RF model achieved the highest AUC of 0.894 (95% CI, 0.837–0.936) in the training set, and it was selected to calculate the Radscore. The AUC was 0.763 (95% CI, 0.649–0.854) in the testing set, 0.817 (95% CI, 0.682–0.912) in the internal validation set and 0.713 (95% CI, 0.568–0.832) in the external validation set.
The classification performance of the clinical, radiomics, and combined models is shown in Figure 4 with ROC curves and confusion matrices.
In the univariate analysis, several clinical variables were significantly associations with ALN burden (p < 0.05), with MRI ALN status exhibited the strongest correlation during model construction. However, when evaluated separately, ROC analysis indicated the MRI ALN status had only moderate discriminatory ability and limited standalone classified value.
Table 4 summarizes the model performance across the four datasets, while Table S3 presents ROC curve comparisons using DeLong’s test between the combined model and other models. Although the combined model yielded marginally lower AUCs than the clinical or radiomics model in certain datasets, DeLong’s test revealed no statistically significant differences. However, the combined model outperformed both the clinical and radiomics models in terms of accuracy, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) in almost all comparisons.
Table 5 shows the gains of True Positive (TP) and True Negative (TN) between models. In the training set, the TP and TN were increased in both radiomics and combined models compared to the clinical model, showing a net gain of 15 (+2 TP, +13 TN) and 13 (+3 TP, +10 TN) cases, respectively. In the testing and the internal and external validation sets, the combined model had increased classification of TN cases compared to the clinical model (+9, +10, +2) with comparable TP cases (0, −1, 0). The results showed that the added benefit of the combined model was in the classification of more low-ALN-burden TN cases.
Three representative cases in Figure 5 illustrate the practical utility of the radiomics model, in which ALN burden was accurately stratified by the Radscore, providing MRI-based decision support.

3.4 ∣
Assessment of Variable Importance
Shapley Additive Explanations analysis identified the predominant factors influencing the RF classification model. Figure 6 presents the SHAP summary plot, which provides a visual representation of the results and illustrates the ranking of input feature importance. The positive SHAP value for a feature indicates its association with a higher risk of high-ALN tumor burden.

3.5 ∣
Calibration Curves and Decision Curve
Calibration curves of the RF model demonstrated strong concordance between classified probabilities and observed outcomes, with bias curves closely aligning with the ideal reference line. Decision curve analysis (DCA) indicated that the RF model provided greater net clinical benefit than either the treat-all or treat-none strategies in distinguishing high from low ALN burden (Figure S2).

3.6 ∣
Survival Analysis
Three Kaplan-Meier curves were constructed to evaluate the prognostic classification. They were performed based on the ground truth histological nodal burden (high with 3 or more nodes vs. low with 0, 1 or 2 nodes), using the high vs. low burden classified by the combined model, and using the positive vs. negative MRI-ALN status determined by the radiologist’s assessment. Patients in the training and testing sets had more than 5 years of follow-up, and they were used for this analysis. Kaplan–Meier survival analysis demonstrated that patients in the histological low-ALN burden group exhibited longer DFS compared to the high-ALN group (log-rank test, p = 0.0053, HR: 3.39) (Figure 7A). When patients were stratified into high- and low-ALN burden groups based on the classification from the combined model, similarly, survival was significantly improved in the classified low-burden group compared to the high-burden group (p = 0.022, HR: 2.90) (Figure 7B). When patients were stratified by the radiologist’s MRI ALN status, there was no significant difference (p = 0.24, HR: 1.85) (Figure 7C).
In the internal and external validation sets, patients had much shorter follow-up, and the progression events were rare. There were only 2 events among 50 patients in the internal validation set and 3 events among 50 patients in the external validation set.
In the multivariate Cox regression analysis, Radscore remained a significant independent predictor of DFS after adjustment for clinical covariates (HR = 1.739, 95% CI = 1.129–2.678, p = 0.012) (Table S4).

Discussion

4 ∣
Discussion
In this study, the clinical, radiological, and pathological parameters, and MRI-based radiomics features of the primary tumor, were used to develop a classification model for ALN burden. Survival analysis revealed significant differences in DFS between patients with high- and low-ALN burden, underscoring the clinical relevance of the model. It demonstrated that the noninvasive, radiomics-based tool could stratify ALN burden and assist surgical decision-making.
High-ALN burden, typically defined as the presence of three or more positive nodes with pathological macrometastases, is a critical prognostic factor in breast cancer [5]. Clinically, the prediction of ALN status provides critical information for surgical decision-making, from ALND to de-escalation strategies in selected patients. A high-ALN burden is consistently associated with inferior overall survival and increased risk of recurrence [3]. While conventional prognostic markers remain valuable, ALN burden demonstrates strong, independent prognostic value, and these factors are complementary. Their integrated assessment offers a more nuanced and patient-specific stratification, as their prognostic relevance may vary across different clinical and biological contexts [29-31]. Compared with clinical approaches for ALN evaluation, such as SLNB and ALND, which are constrained by notable limitations and associated with procedural invasiveness, accurate preoperative evaluation based on imaging may help in individualized treatment planning and minimize overtreatment.
Radiomics analysis has been applied in some studies to predict the ALN metastasis in breast cancer patients [16, 17, 32, 33], but they did not include survival analysis. Chen et al. [17] employed LASSO for highly regularized feature selection to construct a radiomics-based predictive model of ALN metastasis. Chen et al. [16] developed a radiomics nomogram incorporating deep learning features to predict ALN metastasis, achieving AUCs of 0.80 and 0.71 in the training and validation sets, respectively.
The radiomics features highly correlated with ALN burden may capture microenvironmental processes that promote extensive nodal involvement [34]. The DCE MRI perfusion parameters and the semi-quantitative DCE-MRI features are also potential imaging biomarkers related to tumor angiogenesis and aggressiveness [35, 36]. Identifying parameters associated with the inferior DFS in the high-burden group and favorable DFS in the low-burden group may aid in choosing the optimal axillary surgery [5].
In the current study, 11 radiomics features associated with ALN burden were selected to construct the model. Compared with other machine learning approaches, the RF-based radiomics model exhibited superior diagnostic performance. The combined model integrating clinical and radiomics features exhibited slightly lower AUC values than the radiomics or clinical models in some sets, but the DeLong’s test showed that these differences were not statistically significant. Importantly, the combined model consistently demonstrated superior performance in terms of accuracy, specificity, and sensitivity. The varying degrees of improvement in both PPV and NPV also demonstrated the gains achieved by the combined model. While PPV varied due to differences in prevalence, NPV remained consistently high across all cohorts, enhancing the balance between correctly identifying positive cases and minimizing false positives, thereby improving the model’s clinical applicability.
The radiologist’s assessment of MRI-ALN status is a significant predictor of the ALN-nodal burden, which is a trivial finding. Nevertheless, the determination of the positive node by three independent readers only reached moderate agreement, and also the Kaplan-Meier curve did not show a significant difference between MRI-ALN positive versus negative groups. It is known that an enlarged node may be reactive and may not harbor metastatic cancer. Usually, an ultrasound-guided biopsy is recommended. Therefore, in our study, the radiomics features of the primary tumor provided additional information. As shown in the multivariate analysis, the Radscore was an independent predictor of the nodal burden. This reinforces the robustness of the proposed model and highlights that radiomics features provide complementary information beyond conventional imaging assessment.
Beyond developing a radiomics model to classify ALN burden, prognostic differences between early breast cancer patients with high- versus low-ALN burden were investigated. Survival analysis using the combined model revealed a significant difference in DFS between the two classified groups. A subsequent analysis based on actual postoperative pathological ALN status also revealed a notable difference between patients with high- versus low-ALN burden. And Cox regression analysis indicated that Radscore was an independent predictor of DFS. These findings further demonstrate the model’s potential prognostic value.

Limitation

5 ∣
Limitation
First, ROIs were delineated only on the primary tumor, not on the axillary nodes. The radiomics analysis requires high-quality images; the determination of the suspicious node is already ambiguous in this study, and a precise ROI drawing is difficult to perform. Future studies using high-quality images in the breast and axillary region may be performed to analyze the combined tumor- and node-based radiomics features. Second, only DCE-MRI sequences were included; incorporating additional MR imaging sequences may further enhance the classification performance for ALN burden. Next, this is a retrospective analysis, which warrants validation in larger and prospective studies. Lastly, the biological interpretability of radiomic features remains limited and requires further investigation.

Conclusion

6 ∣
Conclusion
This study developed and validated an MRI–based radiomics model for classifying axillary lymph node burden in early-stage breast cancer patients. The model demonstrated robust performance across independent sets and showed prognostic value in stratifying patient survival outcomes.

Supplementary Material

Supplementary Material
Supplementary InformationAdditional supporting information can be found online in the Supporting Information section. Data S1: Supporting Information.

출처: PubMed Central (JATS). 라이선스는 원 publisher 정책을 따릅니다 — 인용 시 원문을 표기해 주세요.

🏷️ 같은 키워드 · 무료전문 — 이 논문 MeSH/keyword 기반

🟢 PMC 전문 열기