본문으로 건너뛰기
← 뒤로

Machine learning-based prognostic model for metastatic breast cancer and its interpretability: a multicenter retrospective study.

1/5 보강
Gland surgery 📖 저널 OA 100% 2021: 23/23 OA 2022: 34/34 OA 2023: 50/50 OA 2024: 52/52 OA 2025: 56/56 OA 2026: 34/34 OA 2021~2026 2025 Vol.14(12) p. 2481-2496
Retraction 확인
출처

PICO 자동 추출 (휴리스틱, conf 2/4)

유사 논문
P · Population 대상 환자/모집단
035 cases) and the internal validation cohort (350 cases).
I · Intervention 중재 / 시술
추출되지 않음
C · Comparison 대조 / 비교
추출되지 않음
O · Outcome 결과 / 결론
[CONCLUSIONS] The constructed RSF prognostic model demonstrated excellent predictive performance in MBC survival prediction and achieved good interpretability as confirmed by the SHAP analysis. These findings indicate that the developed model can facilitate prognostic assessment and promote the design of individualized treatments for MBC patients.

Wang J, Jin SL, Wu JH, Zhu K

📝 환자 설명용 한 줄

[BACKGROUND] Prognostic evaluation of metastatic breast cancer (MBC) currently confronts a two-fold challenge: suboptimal accuracy of conventional scoring systems and insufficient clinical interpretab

🔬 핵심 임상 통계 (초록에서 자동 추출 — 원문 검증 권장)
  • 95% CI 0.693-0.761

이 논문을 인용하기

↓ .bib ↓ .ris
APA Wang J, Jin SL, et al. (2025). Machine learning-based prognostic model for metastatic breast cancer and its interpretability: a multicenter retrospective study.. Gland surgery, 14(12), 2481-2496. https://doi.org/10.21037/gs-2025-362
MLA Wang J, et al.. "Machine learning-based prognostic model for metastatic breast cancer and its interpretability: a multicenter retrospective study.." Gland surgery, vol. 14, no. 12, 2025, pp. 2481-2496.
PMID 41502591 ↗

Abstract

[BACKGROUND] Prognostic evaluation of metastatic breast cancer (MBC) currently confronts a two-fold challenge: suboptimal accuracy of conventional scoring systems and insufficient clinical interpretability of machine learning models. This study aimed to construct and validate an accurate prognostic model for predicting the overall survival (OS) of patients with MBC in the Surveillance, Epidemiology, and End Results (SEER) database using machine learning (ML) techniques.

[METHODS] A total of 1,385 MBC patients were enrolled from the SEER database and randomly assigned into the training cohort (1,035 cases) and the internal validation cohort (350 cases). An external validation cohort comprising 73 patients from Jiaxing Women and Children's Hospital was also set up. The key characteristics influencing the OS were identified through multivariate Cox regression analysis, and prognostic models were constructed using four ML algorithms.

[RESULTS] The random survival forest (RSF) model achieved the best performance both in the training and internal validation cohorts, with a concordance index (C-index) of 0.723 [95% confidence interval (CI): 0.704-0.740] and 0.727 (95% CI: 0.693-0.761), respectively. Notably, the area under the curve and Brier scores of the RSF model exceeded those of other models, confirming its superior survival prediction performance. The decision curve analysis (DCA) further indicated that the RSF model could effectively predict the 1-, 3-, and 5-year OS, making it ideal for clinical application. In the external validation cohort, the C-index of the RSF model was 0.685 (95% CI: 0.606-0.758), which, although slightly lower compared with that recorded in the training cohort, was more stable. The area under the curve and Brier scores further confirmed high accuracy and calibration power of the model. The SHapley Additive exPlanations (SHAP) analysis revealed that triple-negative breast cancer (TNBC) and brain metastasis were core variables that increased mortality risk.

[CONCLUSIONS] The constructed RSF prognostic model demonstrated excellent predictive performance in MBC survival prediction and achieved good interpretability as confirmed by the SHAP analysis. These findings indicate that the developed model can facilitate prognostic assessment and promote the design of individualized treatments for MBC patients.

🏷️ 키워드 / MeSH 📖 같은 키워드 OA만

같은 제1저자의 인용 많은 논문 (5)

📖 전문 본문 읽기 PMC JATS · ~37 KB · 영문

Introduction

Introduction
Breast cancer (BC) is the most prevalent malignant tumor among women globally (1,2). The emergence of multimodal treatment approaches and improvements in socioeconomic status have significantly improved the patient survival rates. Indeed, BC is now considered to be a potentially preventable and treatable disease. Although prior studies have reported that the 5-year survival rate following a surgical procedure for BC exceeds 90%, approximately 20–30% of patients undergoing this procedure develop distant metastases. Research has confirmed that metastatic breast cancer (MBC) is the primary cause of mortality in patients with BC (3,4). Given that MBC is an incurable systemic disease, it imposes a significant psychological and economic burden on patients, their families, and society.
In BC, tumor progression and invasion of distant organs are highly specific processes, with bone, brain, liver, and lung being the most common sites of distant metastasis (4). Although significant advances have been achieved in the treatment of MBC, its five-year survival rate remains suboptimal, of less than 30% (5,6). Currently, individualized treatment strategies for MBC patients are primarily selected based on the molecular subtypes of the cancer and metastatic characteristics. The overall survival (OS) is generally predicted using factors such as patient age, molecular subtype, and anatomical sites of metastases. However, this approach yields inconsistent results given that their evaluation depends on the clinical experience of the physician (7-9). With the rapid advancement of personalized medicine and the increasing availability of targeted therapies, there is an unprecedented need for precise, individualized prognostic tools that can identify patients most likely to benefit from specific interventions. Therefore, accurate assessment of survival time and disease severity in MBC patients is crucial for optimizing treatment decisions, improving quality of life, and maximizing survival benefits.
Artificial intelligence (AI) is increasingly applied in the medical field, particularly in medical research, disease prevention, and clinical diagnosis and treatment. Moreover, the use of machine learning (ML), a subfield of AI, has significantly improved the diagnosis and treatment capabilities of clinicians, making it a robust tool for medical research (10). While recent studies have demonstrated the potential of ML algorithms in predicting BC survival and recurrence, most have focused predominantly on early-stage disease, with comparatively fewer investigations specifically addressing the MBC (11). Effective diagnosis and treatment of MBC are inextricably linked to the precise evaluation of prognostic models. However, current research on ML models for MBC prognosis prediction remains relatively limited. In this study, we used data from the Surveillance, Epidemiology, and End Results (SEER) database to construct a precise and robust prognostic model using ML technology. The model’s interpretability was tested using the SHapley Additive exPlanations (SHAP) method to evaluate the model’s potential to identify high-risk patients. We present this article in accordance with the TRIPOD reporting checklist (available at https://gs.amegroups.com/article/view/10.21037/gs-2025-362/rc).

Methods

Methods

Sources of data and patient selection
The SEER database (https://seer.cancer.gov) is a comprehensive multi-center tumor registry database that compiles cancer epidemiological data across the United States (12). Medical records of 1,385 MBC patients were obtained from the SEER database between 2010 and 2015 using the SEER*Stat 8.4.1 software. The data samples were randomly assigned in a 4:1 ratio into a training cohort (1,035 cases) and an internal validation cohort (350 cases) using the “rsample” package. These cohorts were used to construct prognostic models. Additionally, an external validation cohort comprising 73 MBC patients treated at Jiaxing Women and Children’s Hospital between 2010 and 2023 was employed, with inclusion and exclusion criteria consistent with those applied to the SEER cohort. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. The study was approved by the Ethics Review Committee of Jiaxing Women and Children’s Hospital (approval No. 2025-Y-029). Informed consent was waived in this retrospective study. The detailed research process is presented in Figure S1.

Inclusion and exclusion criteria
Inclusion criteria: (I) diagnosis age ≥20 years; (II) diagnosed with BC as the first primary tumor; (III) pathological diagnosis of invasive ductal carcinoma [International Classification of Diseases for Oncology third edition (ICD-O-3) 8500/3]; (IV) a history of chemotherapy and surgical treatment; (V) confirmed occurrence of distant metastasis (including bone, brain, liver, lung); (VI) complete clinicopathological profiles and longitudinal follow-up information.
Exclusion criteria: (I) male patient diagnosed with BC; (II) patients with multiple primary tumors; (III) absence of invasive ductal carcinoma; (IV) patients with non-target metastasis sites; (V) patients with incomplete clinicopathological profiles and follow-up data.

Variable collection
The following patient data were retrieved from the database: age, pathological type, tumor location, histological grade, tumor (T) stage, node (N) stage, radiotherapy, molecular subtypes, metastasis sites, survival time, and survival outcomes. Tumor staging was based on the guidelines established by the 7th edition of the American Joint Committee on Cancer tumor-node-metastasis (AJCC TNM) staging system. Notably, BC molecular subtypes were classified according to the hormone receptor (HR) and human epidermal growth factor receptor 2 (HER2) status, and categorized as follows: Luminal (HR+/HER2−), HER2-positive (HR+ or HR−/HER2+), and triple-negative breast cancer (TNBC) (HR−/HER2−). Based on the distant metastasis sites, patients were classified into four single-organ metastasis groups: bone, brain, liver, and lung.

Outcome measurement
The primary outcome was OS, which was defined as the time from diagnosis to death from any cause, or the time of the last follow-up. Censored data involved patients who were lost to follow-up or were alive at the end of the follow-up period.

Development and validation of prognostic models
This study employed univariate and multivariate Cox regression analyses to identify characteristics significantly associated with survival outcomes. Subsequently, collinearity analysis was performed using the “car” package to exclude features with variance inflation factors (VIF) exceeding 5. Four supervised learning ML algorithms, including random survival forest (RSF), Cox proportional hazards (CoxPH), eXtreme gradient boosting (XGBoost), and decision tree (DT), were used in this study to establish and validate the model. Each algorithm possesses distinct characteristics: for instance, RSF is suitable for high-dimensional and nonlinear data; CoxPH has higher interpretability, therefore suitable for low-dimensional data; DT can analyze nonlinear relationships, without the need for data standardization; XGBoost exhibits comparable characteristics to DT, showing robust potential to compute data with high efficiency and excellent ability to manipulate complex nonlinear relationships (13-16). Based on the training cohort, model parameters were optimized using grid search via the “mlr3tuning” package, resulting in the selection of the optimal hyperparameter configuration. The specific hyperparameter search space and tuning results are shown in Tables S1, available online: https://cdn.amegroups.cn/static/public/gs-2025-362-1.xlsx. The predictive performance of the models was evaluated using the concordance index (C-index), while time-dependent receiver operating characteristic (ROC) curves, area under the curve (AUC), calibration curves, Brier scores, and decision curve analysis (DCA) were used to assess the discriminative ability, calibration, and clinical utility of the models. The most effective and reliable model was determined through external validation, in which patients were divided into low- and high-risk groups based on the median risk score of the model. The OS of various patients within varying risk groups was compared using the Kaplan-Meier (KM) survival curves.

Interpretability analysis
The model’s interpretability was explored through the SHAP analysis to reveal its internal decision-making mechanisms. In addition, SHAP, a unified framework, was used to assess the contribution of each input variable to the predictive ability of the model based on the Shapley values. This approach provided a visualized analysis of the decision-making role of each feature in the model (17,18).

Statistical analysis
All statistical analyses were conducted using R version 4.4.0 (https://www.r-project.org/). All clinicopathological characteristics were presented as categorical variables and expressed as frequencies and percentages, with intergroup comparisons conducted using Pearson’s chi-square (χ2) test to assess statistical significance. The KM method was employed to analyze the survival curves, while the log-rank tests were used to compare survival rates. The ML prognostic models were constructed using the “mlr3” package (19).

Results

Results

Baseline characteristics of patients
There were no significant differences in the clinicopathological characteristics between the training and internal validation cohorts (P>0.05), indicating robust randomized allocation of patients to the groups. However, the training and external validation cohorts exhibited significant differences in all clinicopathological characteristics, except for OS events (P<0.05). This is likely due to the influence of factors such as geography, ethnicity, and treatment strategies (Table S2). The results further revealed that the incidence of bone and brain metastasis was highest and lowest across different cohorts, respectively. Notably, the incidence of brain metastasis was 13.7% in the external validation cohort, which exceeded those in the training (4.64%) and internal validation (6.00%) cohorts. These observations indicated that patients in the external validation cohort had unique disease metastasis patterns, which contributed to the significant differences in metastasis types and prevalence in patients from different regions. Additionally, analysis of the metastatic patterns of different molecular subtypes across the cohorts (Table S3) revealed that Luminal had a significantly higher incidence of bone metastasis compared to HER2-positive and TNBC, a trend that was consistent across different cohorts. Conversely, TNBC exhibited pronounced metastasis to the brain, being particularly significant in the training cohort (P=0.02). Regarding lung metastasis, the incidence of TNBC was higher compared to that of Luminal and HER2-positive. Specifically, the incidence of lung metastasis in TNBC was markedly increased in the external validation cohort. Collectively, these results revealed significant differences in metastasis patterns among patients with different molecular subtypes, providing a framework for developing targeted, personalized treatment strategies.

Selection of prognostic model characteristics
Univariate and multivariate Cox regression analyses revealed that age, histological grade, T stage, N stage, bone metastasis, liver metastasis, lung metastasis, brain metastasis, and molecular subtypes were independent prognostic factors for MBC patients in the training cohort (Table S4). Additionally, collinearity analysis (Figure 1A,1B) demonstrated that VIF for all nine characteristics was below 5, indicating no collinearity among them.

Comparison of the performance across prognostic models
The performance of ML prognostic models was also explored in the training and internal validation cohorts. Data indicated that the RSF model exhibited good performance across all evaluation metrics (Table S5). Notably, in the training cohort, the RSF C-index of the model was 0.723 [95% confidence interval (CI): 0.704–0.740], which was the highest among all models, confirming a higher predictive AUC at 1-, 3-, and 5-year of 0.826 (95% CI: 0.791–0.860), 0.805 (95% CI: 0.777–0.832), and 0.775 (95% CI: 0.745–0.805), respectively. This observation further confirmed its utility in time-dependent prediction (Figure 2A-2C). Additionally, the RSF model performed optimally in Brier scores, with 1-, 3-, and 5-year Brier scores of 0.093 (95% CI: 0.080–0.105), 0.174 (95% CI: 0.163–0.185), and 0.193 (95% CI: 0.184–0.203), indicating superior calibration capability (Figure 3A-3D). In comparison, the CoxPH and XGBoost models exhibited slightly lower C-Indices in the training cohort, at 0.707 (95% CI: 0.687–0.726) and 0.707 (95% CI: 0.686–0.726), respectively. The two models further exhibited a slightly lower AUC value, indicating poorer predictive performance. The brier scores of two models were also slightly higher compared to the RSF model, indicating their calibration ability was slightly poorer compared to the RSF model. The DT model exhibited a poor performance with a C-index of 0.680 (95% CI: 0.657–0.700) and AUC values at relatively low levels, indicating poor predictive ability on OS. Furthermore, the Brier scores indicated the model’s disadvantages in prediction accuracy and calibration. The DCA results showed that the RSF model exhibited higher net benefits in 1-, 3-, and 5-year predictions (Figure 4A-4C). The RSF model consistently showed excellent performance with a C-index of 0.727 (95% CI: 0.693–0.761) in the internal validation cohort, compared to other models. The AUCs of the RSF model at 1-, 3-, and 5-year were 0.737 (95% CI: 0.654–0.820), 0.798 (95% CI: 0.751–0.846), and 0.783 (95% CI: 0.731–0.835), which were significantly higher compared to the other models (Figure 5A-5C). Brier scores revealed that the brier scores of the RSF model at 1-, 3-, and 5-year were 0.099 (95% CI: 0.077–0.122), 0.181 (95% CI: 0.162–0.199), and 0.196 (95% CI: 0.180–0.212), indicating that the model demonstrated better predictive calibration ability at varying time points (Figure 6A-6D). The DCA further confirmed that the RSF model consistently exhibited greater net benefits in predictions at various time points (Figure 7A-7C). These indicators suggest that the constructed RSF model had superior predictive performance in both the training and internal validation cohorts and potentially provide accurate survival predictions in the short term (1 year) and medium to long term (3 and 5 years), indicating robustness. Consequently, the RSF model was selected as the final prognostic model. Inclusion of an external validation cohort is often used to evaluate the generalization ability of the model. The C-index of the RSF model in the external validation cohort was 0.685 (95% CI: 0.606–0.758), which decreased but still maintained a high level, demonstrating its strong generalization capability. The AUCs at 1-, 3-, and 5-year were 0.857 (95% CI: 0.775–0.940), 0.824 (95% CI: 0.730–0.919), and 0.798 (95% CI: 0.662–0.934), respectively, showing its accurate predictive ability at different time points. The models showed excellent performance in predicting the OS as confirmed by the optimal Brier scores. Finally, based on the median risk score of the RSF model, patients were divided into high- and low-risk groups. Subsequently, survival analysis was performed on these groups using the KM curves and log-rank tests. The results demonstrated significant differences in OS rates between high- and low-risk groups across various cohorts (P<0.05), further validating the accuracy and practicality of the RSF model (Figure 8A-8C).

Interpretability analysis of prognostic models based on SHAP
Furthermore, the interpretability of the model was determined using the SHAP method to clarify the internal decision-making mechanism of the RSF model. Notably, the Shapley values of various features within the RSF model and visualization feature importance indicated that molecular subtypes significantly outperformed other features, with bone metastasis showing the least effect on outcomes (Figure 9A). However, analysis of only the ranking of feature importance fails to comprehensively reveal the decision-making mechanism of the model, since it cannot demonstrate the direction of each feature’s influence on the model’s predictions. Consequently, we identified the contribution of different features in predictions using a boxplot of categorical characteristics to provide a more comprehensive explanation of the predictive contribution of the models. The results showed that TNBC and brain metastasis were the most critical features affecting outcomes (Figure 9B). Additionally, analysis of the partial dependence plots revealed the influence of changes in feature values on the prediction results over time. This observation further clarified the complex relationship between predicted outcomes and characteristics (Figure S2). Additionally, through the SHAP waterfall plot, the main features affecting the individual prognosis were identified. For instance, the molecular subtypes and brain metastasis of patient 702 were negatively associated with the prognosis, while other indicators exhibited a positive effect (Figure 9C).

Discussion

Discussion
MBC refers to BC cells that metastasize to other organs through blood, lymph, or direct infiltration. These cells continue their proliferative and progressive activity at the metastatic sites, manifesting as advanced BC (4). With the increase in the aging population and the prolonged survival of BC patients, the incidence of MBC has gradually increased, creating a significant clinical challenge. Currently, there is no cure for MBC, and its treatment primarily aims to prolong the OS and alleviate suffering (20). Therefore, accurate prediction of survival time has become a key issue for patients. If the survival time and disease severity of MBC patients can be assessed precisely through routine clinicopathological characteristics, it would provide an important reference for patients and their families, as well as support individualized treatment decisions. Currently, the TNM staging system proposed by the AJCC is the most widely used prognostic assessment tool (21). However, it only considers the anatomical characteristics of tumors and fails to comprehensively capture the actual condition of MBC patients. Additionally, the biological properties of tumors and individual patient factors have important impacts on prognosis. Therefore, a more precise prediction tool is urgently needed to inform the development of individualized treatment for MBC patients. Recently, several studies have explored the prognosis of MBC patients using the SEER database. Ning et al. constructed a Nomogram model for predicting OS for MBC patients based on the SEER database and used an external validation cohort from China to confirm its prognostic value. The results showed that the C-index of the Nomogram model was 0.688 in the training cohort and 0.875 in the validation cohort, indicating a high predictive accuracy. Additionally, Nie et al. constructed a nomogram model for predicting OS in patients with brain metastases, while Xie et al. developed a nomogram model for evaluating the OS in patients with lung metastases (22-24). These studies have provided important reference for clinical practice, but they commonly suffer from insufficient transparency and a lack of interpretability, which limits their widespread application in clinical settings. Therefore, this study integrated various prognostic factors with ML technology and employed the SHAP method for interpretability analysis, aiming to construct a prognostic model with both high predictive accuracy and good interpretability.
Age, histological grade, T stage, N stage, bone metastasis, liver metastasis, lung metastasis, brain metastasis, and molecular subtypes were each independent prognostic factors for OS in MBC patients. Tumor biology differs substantially between young and older patients with BC. Older patients often have more comorbidities, reduced tolerance to systemic therapy, and less favorable treatment responses. Therefore, their treatment options are significantly limited, making them more prone to adverse drug reactions. This study shows that patients over 60 years of age have poorer OS, which is consistent with the findings from a study by Xiao et al. (25). Histological grade also has a significant impact on the prognosis of MBC patients. The risk of death is significantly increased in patients with grade III/IV, which is consistent with the study by Ning et al. (23,26). As measures of tumor burden, higher T and N stages reflecting larger primary tumors and greater nodal involvement, predicted poorer survival. This observation is consistent with the findings by Yang et al. (27). Additionally, this study confirmed that tumor metastasis sites exhibit a significant impact on prognosis. Bone metastasis has a relatively smaller impact on the survival prognosis of MBC patients, whereas liver and lung metastases portended worse outcomes. Brain metastasis is considered the final stage of MBC, with limited effects of chemotherapy drugs on brain metastasis and extremely poor prognosis, aligning with recent findings (22,23). Molecular subtypes capture key aspects of tumor biology and determine the responsiveness of MBC patients to chemotherapy, endocrine therapy, and targeted therapy (28). The results from this study indicate that HER2-positive patients have a better prognosis, potentially associated with their sensitivity to HER2-targeted agents, while TNBC patients have the poorest outcomes, which is related to their lack of endocrine therapy and HER2-targeted therapy (29,30). A previous study has shown that different molecular subtypes have different metastatic characteristics and organ preferences (6). In this study, we demonstrated that Luminal patients exhibit the highest incidence of bone metastasis and the lowest incidence of brain metastasis. Conversely, TNBC patients exhibit a higher likelihood of developing brain metastasis, potentially associated with their higher epidermal growth factor receptor expression (31). These results indicate that molecular subtypes play a significant role in predicting distant metastasis sites and identifying high-risk patients. Accordingly, this study developed four ML prognostic models to predict OS in MBC patients based on these identified characteristics. The results showed that, in different cohorts, the RSF model demonstrated high accuracy and stability in 1-, 3-, and 5-year survival predictions, and the DCA analysis showed that the RSF model exhibits better clinical applicability. Through interpretability analysis using the SHAP method, we further demonstrated the impact of clinicopathological characteristics on the prediction results of the RSF model. Subsequently, we found that TNBC and brain metastasis played key roles in survival prognosis prediction.
Despite achieving certain results, this study exhibits several limitations that need to be acknowledged. First, it is a retrospective study with limited data, and the SEER database lacks systematic treatment information—including chemotherapy, targeted therapy, and endocrine therapy—which may affect survival outcomes and thus limit the model’s application in personalized treatment plans. Second, the external validation cohort dataset was extracted from a single center with a small sample size and shows significant differences compared to the training cohort, which may impact the model’s generalizability and applicability. Finally, while four ML prognostic models were employed in this study, the results of each model may be influenced by algorithm settings and parameter choices, potentially introducing bias. Therefore, although the model exhibited high predictive accuracy, additional validation and refinement in larger, more diverse patient cohorts are necessary to enhance its generalizability and clinical applicability.

Conclusions

Conclusions
Conclusively, this study combines the SEER database with ML technology to develop a prognostic tool for OS in patients with MBC based on an RSF model. Notably, the model demonstrates high predictive accuracy and stability across different cohorts, particularly excelling in time-dependent predictions. Using the SHAP method for comprehensive analysis, we revealed the decisive impact of key characteristics such as TNBC and brain metastasis on patient prognosis, thereby providing a tool with significant clinical potential for the development of targeted and personalized therapeutic strategies. Looking ahead, the integration of multidimensional data—such as genomic profiles, tumor microenvironment features, and therapeutic interventions—together with ongoing advances in ML is expected to further improve the predictive accuracy and generalizability of prognostic models. With the continued incorporation of AI into clinical practice, ML-based survival prognostic models are likely to play an increasingly important role in precision medicine, offering more reliable support for the development of targeted and individualized treatment plans, thereby contributing to improved prognostic outcomes.

Supplementary

Supplementary
The article’s supplementary files as

출처: PubMed Central (JATS). 라이선스는 원 publisher 정책을 따릅니다 — 인용 시 원문을 표기해 주세요.

🏷️ 같은 키워드 · 무료전문 — 이 논문 MeSH/keyword 기반

🟢 PMC 전문 열기