본문으로 건너뛰기
← 뒤로

Triple-negative breast cancer survival outcomes: prognostic model validated with SEER database.

1/5 보강
Discover oncology 📖 저널 OA 95% 2022: 2/2 OA 2023: 3/3 OA 2024: 36/36 OA 2025: 546/546 OA 2026: 297/344 OA 2022~2026 2026 Vol.17(1) p. 258
Retraction 확인
출처

PICO 자동 추출 (휴리스틱, conf 2/4)

유사 논문
P · Population 대상 환자/모집단
100 patients at the 40% risk threshold.
I · Intervention 중재 / 시술
추출되지 않음
C · Comparison 대조 / 비교
추출되지 않음
O · Outcome 결과 / 결론
Future models should incorporate molecular biomarkers (e.g., PD-L1, BRCA) and socioeconomic variables to enhance precision. [SUPPLEMENTARY INFORMATION] The online version contains supplementary material available at 10.1007/s12672-025-04251-y.

Gao H, Yang J, Li Y

📝 환자 설명용 한 줄

[BACKGROUND] Triple-negative breast cancer (TNBC) lacks targeted therapies and precise prognostic tools.

🔬 핵심 임상 통계 (초록에서 자동 추출 — 원문 검증 권장)
  • p-value P < 0.05
  • HR 4.13

이 논문을 인용하기

↓ .bib ↓ .ris
APA Gao H, Yang J, Li Y (2026). Triple-negative breast cancer survival outcomes: prognostic model validated with SEER database.. Discover oncology, 17(1), 258. https://doi.org/10.1007/s12672-025-04251-y
MLA Gao H, et al.. "Triple-negative breast cancer survival outcomes: prognostic model validated with SEER database.." Discover oncology, vol. 17, no. 1, 2026, pp. 258.
PMID 41521352 ↗

Abstract

[BACKGROUND] Triple-negative breast cancer (TNBC) lacks targeted therapies and precise prognostic tools. This study developed a prognostic nomogram integrating clinicopathological factors and treatment response dynamics to improve survival prediction.

[METHOD] Data from 2,978 TNBC patients (SEER database, 2000–2020) were analyzed. Independent prognostic factors were identified via Cox regression. A nomogram incorporating race, AJCC N/M stage, tumor size, surgery type, and pathological response (pCR/pPR/pNR) was constructed. Performance was evaluated using C-index, ROC-AUC, calibration, decision curve analysis (DCA), and compared to AJCC-TNM staging.

[RESULT] Multivariate analysis identified N3 stage (HR = 4.13), M1 stage (HR = 1.77), tumor size ≥ 90 mm (HR = 1.84), mastectomy (HR = 1.28), and pathological non-response (pNR, HR = 6.87) as independent risk factors (all P < 0.05). The nomogram achieved superior discrimination (C-index: 0.780 [training], 0.773 [validation] vs. TNM’s 0.715–0.720). AUCs for 1-/3-/5-year survival were 0.858/0.823/0.820 (training) and 0.0.864/0.802/0.799 (validation). Calibration errors were < 5% for 1–3-year predictions. DCA demonstrated a 7–10% net benefit increase over TNM staging, with 3.9 additional correct decisions per 100 patients at the 40% risk threshold.

[CONCLUSION] This nomogram dynamically integrates pathological treatment response, significantly outperforming TNM staging (ΔC-index =  + 0.066). It enables personalized risk stratification and clinical decision-making, particularly for guiding therapy intensification in high-risk subgroups (e.g., N3/pNR). Future models should incorporate molecular biomarkers (e.g., PD-L1, BRCA) and socioeconomic variables to enhance precision.

[SUPPLEMENTARY INFORMATION] The online version contains supplementary material available at 10.1007/s12672-025-04251-y.

🏷️ 키워드 / MeSH 📖 같은 키워드 OA만

같은 제1저자의 인용 많은 논문 (5)

📖 전문 본문 읽기 PMC JATS · ~78 KB · 영문

Introduction

Introduction
In 2020 the World Health Organization Cancer monitoring center released data on global Cancer patients, in which more than 2.2 million breast cancer patients, accounting for 11.7% of all cancer patients, and more than 680, 000 breast cancer deaths, accounting for 6.9% of all cancer deaths [1]. Triple Negative Breast Cancer (TNBC) is a subtype of breast cancer that accounts for 15–20% of all breast cancer patients. Its characteristic is the absence of expression of Estrogen Receptor (ER) and Progesterone Receptor (PR) and Human Epidermal Growth Factor Receptor 2 (Her-2) [2]. Therefore, TNBC cannot benefit from current hormone therapy or targeted therapy against HER-2 [3], and the 5-year survival rate after diagnosis is only 77% [4]. TNBC is highly aggressive, and about 46% of patients will develop distant metastasis [5]. The median survival times after the metastasis is only 13.3 months, and the recurrence rate after surgery is as high as 25% [6, 7]. Metastasis usually involves the brain and internal organs, and distant metastasis occurs in the third year after diagnosis [8]. Other types of breast cancer patients 5-year survival rate is 91% higher [4]. Currently, there are very limited treatment methods for TNBC patients, so many studies are devoted to exploring the prognostic factors affecting TNBC patients and building predictive models to provide guidance for clinical treatment and prognosis.
Recent advances in systemic therapies have expanded treatment options for TNBC. Immune checkpoint inhibitors (e.g., pembrolizumab) combined with chemotherapy have shown improved pathological complete response (pCR) rates and survival outcomes in PD-L1-positive TNBC patients [9]. In the BRCA-mutated population, poly (ADP-ribose) polymerase (PARP) inhibitors, such as olaparib, reduce synthetic lethality by targeting DNA repair pathways, resulting in a 70% lower risk of disease progression or death in adjuvant therapy compared with placebo (median follow-up, 41 months) [10]. Carboplatin combined with sequential paclitaxel followed by anthracycline regimen increased the pCR rate from 41 to 54% [11]. Antibody-coupled drugs (ADCs) such as Sacituzumab Govitecan have shown significant survival benefit in metastatic TNBC [12, 13]. In addition, ctDNA-based dynamic surveillance techniques provide new directions for early identification of minimal residual lesions, and individualized immunotherapy strategies targeting tumor mutation burden (TMB) are also being explored [14]. Despite these advancements, the lack of precision in prognostic tools remains a critical barrier. Current models often fail to integrate emerging biomarkers or treatment-specific responses [15]. This disconnect limits the ability to stratify patients for emerging therapies, highlighting the urgent need for dynamic prognostic tools aligned with modern therapeutic paradigms.
Nowadays, the use of public databases for prognostic modelling has become a major hot topic in clinical research. Among them, the Surveillance, Epidemiology, and End Results (SEER) database, an important data source for oncology research. It is a large-scale cancer registry system that has been collecting data since 1973, covering cancer cases in several states and regions in the U.S., representing approximately 35% of the population. The database contains demographic information (age, sex, race, and region of residence), various tumour characteristics (primary site, stage, size, grade, and type of tumour), treatment information (surgery, radiotherapy, chemotherapy, and other treatments), and survival data (time to diagnosis, time to survival, and survival status) [16]. Nevertheless, despite the advancements in risk stratification achieved by SEER-based models, they frequently overlook pivotal variables, including molecular subtypes, intricate chemotherapy protocols, and socioeconomic factors influencing access to healthcare. For example, racial disparities in the outcomes of TNBC, which are associated with genomic heterogeneity and systemic injustices, are inadequately addressed in current models, thereby underscoring the necessity for integrated methodologies.
This study aims to address these gaps by constructing and validating a prognostic nomogram for TNBC using SEER data. By incorporating variables such as pathological response to neoadjuvant therapy and surgical approach, our model seeks to bridge the divide between traditional staging systems and the evolving therapeutic landscape. Ultimately, this work not only enhances individualized prognosis but also provides a framework for future integration of molecular and socioeconomic data, aligning with precision oncology goals. However, some limitations of the study should be acknowledged, while leveraging SEER’s population-level data, this study acknowledges inherent limitations in treatment granularity and molecular profiling, which future prospective studies should address.

Materials and methods

Materials and methods

Data sources and patient populations
TNBC patients data source during the Incidence-SEER Research Data, 17 Registries, Nov 2022 Sub (2000–2020), using SEER*Stat software (version 8.4.3) from the SEER database (https://seer.cancer.gov/). The study involved 17 registries and did not publish personal identity information. Thus, review board approval and informed consent were not required. Figure 1 shows the workflow of this study.

Selection of key clinical features and data processing.
This study includes data from patients who meet the following criteria: age between 18 and 80 years, female, diagnosed with breast cancer in the International Classification of Diseases for Oncology, third edition (ICD-O-3), ER, PR, Her-2 expression are all negative. After screening, the clinical data of 3775 patients were collected. These data included patient ID, Age, Race, Year of diagnosis, Primary tumor site, Grade, Laterality and AJCC Stage, 7th ed(American Joint Committee on Cancer, AJCC), T stage, N stage, M stage, Tumor size, Primary surgery, Chemotherapy and operation sequence, Chemotherapy record, Radiotherapy and surgery, Radiation therapy record, Response to Neoadjuvant Therapy, Survival time and Survival status.
During data processing, we excluded some data according to the following exclusion criteria: unknown race, grade, AJCC stage, laterality and tumour size, no surgery or incomplete information, and no record of chemotherapy. Finally, the number of patients actually included in the study was 2978. The baseline characteristics of the patients grouped according to the three-year overall survival are shown in Table 1.

Statistical analysis
IBM SPSS Statistics 25 and R software (R studio version 4.3.3) were used for statistical analysis in this study. In R Studio, patients are divided into training and validation sets in a ratio of 7:3 using the "sample" and "sort" functions to ensure that outcome events are randomly distributed between the two data sets (Table S 1). The training set is used to filter variables and build models, and the validation set is used to validate the results of the training set. Univariate and multivariate Cox analysis was performed on 12 variables in the training set using the auto Reg package in R, and variables with P < 0.05 were identified as independent risk factors. Nomogram plots were plotted based on the independent risk factors identified in Cox univariate and multivariate analyses. Harrell's consistency index (C-index) was used to evaluate the predictive power of nomogram model and TNM staging system model. To rigorously evaluate model performance and mitigate overfitting concerns, we conducted comprehensive internal validation using bootstrap resampling.
Internal validation was conducted to quantify and correct for overoptimism. Specifically, the discriminative performance of the final nomogram (as measured by the C-index) was validated using bootstrap resampling. We performed 1,000 bootstrap iterations on the training set (n = 2,084), wherein each iteration involved refitting the model and calculating the C-index, to obtain an optimism-corrected estimate. Specifically, bootstrap samples were generated by random sampling with replacement from the original training dataset (n = 2084), preserving the 7:3 training-validation split ratio. The proportional hazards assumption was evaluated using Schoenfeld residuals with Kaplan–Meier transformation. Significant violations (p < 0.05) were addressed through stratification of categorical predictors. Final models were validated via residual diagnostics and graphical assessment. The predictive efficiency of the two models was further evaluated using Receiver Operating Characteristic (ROC) and area under the curve (AUC) values, AUC values ranging from 0 to 1, with values closer to 1 indicating better model performance. Calibration plots are used to compare the agreement between the predicted and actual probabilities. Calibration performance was quantitatively assessed using E_avg (mean absolute error) and E_max (maximum absolute error) metrics. E_avg was calculated as the mean of absolute differences between predicted survival probabilities and Kaplan–Meier estimates across predefined bins generated by bootstrap resampling (1,000 iterations, 400 bins). E_max represented the maximum absolute difference observed across all bins. Both metrics are expressed in probability units (e.g., 0.015 = 1.5%). Finally, decision curve analysis (DCA) is used to evaluate the clinical usefulness of nomogram diagram and explore whether the model is more accurate than the AJCC TNM staging system.

Results

Results

Clinicopathologic characteristics
From 2000 to 2020, SEER recorded 3775 TNBC patients. This study involved 2978 patients who met the criteria. Table 1 shows baseline characteristics of the study population, which divided patients into two groups according to their three-year survival. Patients with survival time greater than 3 years and less than 3 years accounted for 77.5% and 22.5%. Patients aged less than 40 years, 40–60 years and older than 60 years accounted for 19.6%, 56.9% and 23.5% respectively. The vast majority of patients were white, accounting for about 70.8%. Patients with Grade III accounted for 84.7%. According to the AJCC staging system, patients were usually classified as T2 (55.9%), N0 (45%), N1 (38.3%), and M0 (95.9%). After neoadjuvant therapy, the levels of pathological response are as follows: pCR (pathological Complete Response) accounts for 48%, pPR (pathological Partial Response) for 41.7%, and pNR (pathological No Response) for 10.3%. Survival and death accounted for 69.3% and 30.7% respectively.

Cox univariate and multivariate analysis
This study conducted comprehensive Cox univariate and multivariate regression analyses on Overall Survival (OS), with detailed results presented in Table 2. In the training set, univariate analysis was first performed, and variables demonstrating statistical significance (P < 0.05) were systematically incorporated into multivariate analysis following rigorous statistical principles. Through multivariate analysis, six variables were identified as independent risk factors for OS: race (Caucasian HR = 0.69), AJCC N stage (N3 HR = 4.13), AJCC M stage (M1 HR = 1.77), tumor size (90-100 mm HR = 1.84), surgical approach (non-breast-conserving HR = 1.28), and pathological response after neoadjuvant therapy (pNR HR = 6.87). Diagnostics of the initial model revealed violations of the proportional hazards assumption for 'N stage' (χ2 = 5.96, p = 0.0146) and 'Surgical approach' (χ2 = 7.44, p = 0.0064). Therefore, the final model was stratified by these two variables. After stratification, the global test of the proportional hazards assumption for the final model was no longer significant (χ2 = 11.8, p = 0.54), indicating the assumption was met (Figure S 1, Figure S 2).
To visually demonstrate the relationship between categorical variables and survival time, Kaplan–Meier survival curves were plotted for these six variables—race, AJCC N stage, AJCC M stage, tumor size, surgical approach, and pathological response after neoadjuvant therapy (Fig. 2). Log-rank tests were employed to validate differences in survival time distributions among groups. Statistical results revealed that only race exhibited borderline significance in survival time distribution differences (P = 0.051), while all other variables showed statistically significant differences in overall survival distributions (P < 0.05), highlighting the differential impact of these variables on patient survival. For instance, Fig. 2B displays survival curves for N stages, illustrating the temporal (monthly) changes in survival probability across different groups (N0, N1, N2, N3). Each curve represents the survival probability trend for a specific group, with median survival times no reached for N0 and N1 groups, 58 months for N2, and 32 months for N3. The table provides specific survival probabilities at 12, 36, and 60 months for each group, while the annotated P < 0.0001 indicates statistically significant survival differences among groups. Overall, N0 and N1 groups exhibited relatively higher survival probabilities, whereas N3 showed lower survival rates, with significant intergroup survival disparities.

Construction and validation of prognostic nomogram
A nomogram is an intuitive visualization tool used to predict patients' survival probabilities under different variable combinations. This study constructed a nomogram model based on six variables, including race, lymph node involvement, distant metastasis, tumor size, surgical type, and pathological response after neoadjuvant therapy. Each variable was categorized into different levels, with each level assigned a corresponding score. By converting individual patient variable values into respective scores and summing them, a total score was obtained. Using this total score, the nomogram predicts the patient's 1-year, 3-year, and 5-year survival probabilities Fig. 3. This nomogram model provides clinicians and patients with a personalized prediction tool, aiding in better understanding disease prognosis and informing treatment decisions.
To comprehensively evaluate the predictive performance of the nomogram model and the TNM staging system model, the C-index was employed, which ranges from 0.5 to 1, with higher values indicating stronger predictive consistency. In this study, the C-indices of both models (nomogram and TNM) were assessed in the training and validation sets to compare their predictive performance (Table 3). The results showed that the nomogram model achieved a C-index of 0.780 in the training set and 0.773 in the validation set, whereas the TNM model had a C-index of 0.715 in the training set and 0.719 in the validation set. The marginal C-index increase in validation (Δ = 0.004) falls within the standard error (se = 0.016), indicating random variation rather than overfitting. The bootstrap validation (1,000 iterations) demonstrated robust discriminative ability with minimal evidence of overfitting. The original C-index of 0.779 (95% CI 0.774–0.780) in the training cohort showed excellent stability upon correction.
The optimistic C-index was 0.779 (95% CI 0.761–0.795) with negligible bias (+ 0.00006). The corrected C-index remained virtually identical at 0.779 (95% CI 0.774–0.780), exhibiting only marginal degradation (bias = -0.0014). Critically, the narrow confidence interval width (0.58%) for the corrected C-index confirms precise performance estimation. The significantly larger standard error of the optimistic estimate (0.0087 vs. 0.0015 for corrected) indicates greater variability in resampled training performance, while the tight distribution of corrected values (Fig. 4) validates consistent generalizability. These results confirm that the model maintains 77.4–78.0% discriminative accuracy in unseen populations, exceeding the 70% threshold for clinically useful prognostic tools. This indicates that the nomogram model outperformed the TNM model in distinguishing patient survival outcomes, with comparable performance between the training and validation sets, demonstrating good stability and generalizability. In contrast, the TNM model exhibited slightly lower C-indices, suggesting inferior predictive performance. Overall, the nomogram model demonstrated superior predictive capability in this study.
Using the 'timeROC' package, ROC curves were generated for the training and validation sets at different time points, with mortality as a continuous variable .
Figure 5). The ROC curves for the training and validation sets (Figs. 5A and B) were plotted to evaluate the model's performance in 1-year, 3-year, and 5-year survival prediction. The training set's AUC values were 0.858 (95% CI 0.815–0.901), 0.823 (95% CI 0.801–0.846), and 0.820 (95% CI 0.800–0.841), indicating strong discriminative ability in the training set. The validation set's AUC values were 0.864 (95% CI 0.792–0.937), 0.802 (95% CI 0.765–0.839), and 0.799 (95% CI 0.767–0.832), demonstrating good predictive performance in the validation set, albeit with slightly lower AUC values than the training set. Overall, the model effectively predicted survival probabilities at different time points, suggesting potential clinical utility.
The calibration curves of the nomogram (Fig. 6) primarily compare the agreement between predicted and actual probabilities. Quantitative calibration assessment revealed minimal errors across all time horizons (Table S 3). The 1-year predictions showed near-perfect calibration with E_avg = 0.006 and E_max = 0.009 in the training set. For 3-year predictions, calibration errors remained low (E_avg = 0.015, E_max = 0.025), while 5-year predictions maintained similar precision (E_avg = 0.015, E_max = 0.033). In validation set, the model demonstrated robust generalizability with E_avg values consistently ≤ 0.014 and E_max values ≤ 0.036 across all timepoints. Critically, at the clinically relevant decision threshold of approximately 60% predicted survival, observed Kaplan–Meier estimates differed by only 0.7–3.1 percentage points from model predictions, Furthermore, the high calibration slope (close to 1.0) confirming the model's reliability for clinical risk stratification.In this analysis, the model exhibited good calibration performance in both the training and validation sets. The training set's calibration curves showed that the actual survival probabilities for 1-year, 3-year, and 5-year predictions closely matched the predicted probabilities, with data points generally distributed around the ideal line. Across all survival horizons, mean absolute calibration errors (E_avg) remained ≤ 0.015 in both training and external validation sets, while the worst-case errors (E_max) never exceeded 0.036. At the clinically relevant decision threshold of approximately 60% predicted survival, observed Kaplan–Meier survival differed by ≤ 3.1 percentage points from model predictions. Furthermore, the high calibration slope (close to 1.0) indicates excellent overall agreement between predicted and observed outcomes.
Finally, Decision curve analysis (DCA) was performed for 1-year, 3-year, and 5-year overall survival (OS) predictions (Fig. 7). The nomogram demonstrated a higher net benefit than the TNM staging system across a wide range of risk thresholds for 3-year and 5-year OS in the training set. In the validation set, the nomogram showed a net benefit comparable to the TNM model for 1-year OS but maintained a favorable net benefit for 3-year and 5-year OS. The maximum net benefit improvement for 5-year OS prediction was 3.9 additional correct decisions per 100 patients at the 40% risk threshold in the training set, and 1.6 in the validation set. For 3-year OS prediction, the maximum net benefit improvement was 2.1 (training) and 1.9 (validation) additional correct decisions per 100 patients.

Model performance for 3-year vs. 5-year overall survival
To directly compare prognostic performance across time horizons, we evaluated the nomogram for both 3-year and 5-year overall survival (OS). The model demonstrated non-inferior discrimination for the 3-year endpoint, with a time-dependent AUC of 0.823 (training) and 0.802 (validation), compared to 0.820 (training) and 0.799 (validation) for the 5-year endpoint. Calibration was marginally better for the 3-year prediction (Validation set E_avg: 0.012) than for the 5-year prediction (E_avg: 0.015). Decision Curve Analysis confirmed clinically useful net benefit for both endpoints against the TNM staging system (Fig. 7).

Discussion

Discussion
A prognostic nomogram model for triple-negative breast cancer (TNBC) was developed using multivariate Cox regression analysis based on data from the SEER database. The model demonstrated robust predictive performance for overall survival (OS) across both training (C-index: 0.780) and validation (C-index: 0.773) cohorts, outperforming the traditional AJCC TNM staging system (C-index: 0.715–0.720). This model not only enhances individualized risk stratification but also possesses the potential to direct therapeutic escalation or de-escalation in clinical practice. For instance, patients categorized as high-risk (such as those with N3 stage disease or pNR status) may derive benefit from more aggressive treatment regimens, including platinum-based immunotherapy combinations. Conversely, low-risk patient cohorts may avoid unnecessary overtreatment, thus reducing the burden of toxicity.
Univariate and multivariate Cox regression analyses identified race, AJCC N stage, AJCC M stage, tumor size, surgical method, and pathological response after neoadjuvant therapy as independent predictors of OS. Among these, N stage (HR for N3 vs. N0: 4.13), M stage (HR for M1 vs. M0: 1.77), tumor size (HR for 90–100 mm vs. 10–30 mm: 1.84), surgical method (mastectomy vs. breast-conserving surgery [BCS]: HR = 1.28), and pathological complete response (pCR) emerged as the most influential prognostic indicators. Kaplan–Meier analyses corroborated these findings, with all variables except race showing statistically significant associations with survival (p < 0.05). Although the racial disparity in survival approached but did not reach statistical significance (p = 0.051), we retained this variable due to its clinical relevance. Non-Hispanic Black women exhibit higher TNBC incidence and poorer survival outcomes, likely attributable to genomic variations (e.g., BRCA mutation prevalence differences), epigenetic factors, and systemic healthcare inequities [17–19]. For instance, delayed diagnosis and limited access to guideline-concordant therapies among African American women may partially explain these disparities [20–22]. Prospective studies incorporating socioeconomic variables (e.g., insurance status, geographic accessibility) are imperative to disentangle these complex interactions.
Lymph node metastasis (LNM) and distant metastasis (DM) were strongly associated with adverse outcomes, consistent with prior studies [23, 24]. The prognosis of patients with lymph node metastasis is poor, and more than 4 axillary lymph node metastases indicate a higher risk of distant metastasis. [25, 26]. Distant metastasis is most common in liver, lung, bone and brain, which seriously affects the quality of life and prognosis of patients [27–29]. TNBC has the highest probability of distant metastasis among all breast cancer subtypes, and effective treatment of metastatic TNBC remains the greatest challenge in breast cancer treatment [30]. Our study found that lymph node involvement and distant metastasis are independent risk factors for poor prognosis in patients with TNBC. The hazard ratios (HR) of N1, N2 and N3 was 2.16, 2.74 and 4.17 respectively compared with node-negative, indicating that patients with lymph node metastasis had a worse prognosis. Patients with distant metastasis in M stage also had a worse prognosis.
Tumor size remains a clinically relevant parameter for assessing initial disease burden in TNBC, reflecting the proliferative capacity at diagnosis [31]. However, its prognostic utility appears secondary to molecular biomarkers such as Programmed Death-Ligand 1 (PD-L1) expression and tumor-infiltrating lymphocyte (TIL) levels. While larger tumors correlate with elevated PD-L1 expression, tumor size alone showed no significant association with pathological complete response (pCR) rates in neoadjuvant settings (p > 0.05) [32]. Instead, age > 40 years and high TILs emerged as stronger predictors of pCR achievement, underscoring the growing importance of immunophenotypic profiling over traditional morphometric parameters [33]. These findings advocate for biomarker-integrated prognostic models to replace size-dependent paradigms in TNBC management.
Our prognostic model demonstrates distinct advantages when directly compared to other contemporary SEER-based TNBC nomograms, such as those by Zhou et al. [34] and Qiu et al. [35]. First, regarding the modeling approach, while their models are proficient, they rely exclusively on static baseline and tumor characteristics (e.g., demographic factors, TNM stage at diagnosis). In contrast, our nomogram is uniquely architected to incorporate the dynamic response to neoadjuvant therapy (pCR/pPR/pNR), a pivotal variable reflecting real-world tumor biology and chemosensitivity. Second, in terms of comparative performance, our model achieves a highly competitive C-index of 0.780, which is comparable to the robust performance reported by Zhou et al. (C-index 0.762–0.793 for OS) and Qiu et al. (C-index 0.794), while providing this discrimination within a more dynamic and clinically relevant framework. Third, concerning methodological rigor, we strengthened the robustness of our Cox model by rigorously testing and validating the proportional hazards assumption using Schoenfeld residuals. Upon detecting violations for key variables (N stage and surgery), we addressed these through stratification, ensuring the model's statistical validity—a critical step not commonly emphasized in similar SEER-based studies. Finally, and most importantly, the novelty and added clinical value of our tool lie in its ability to enable real-time risk re-stratification after neoadjuvant therapy. This directly addresses a critical gap in prior static models by informing subsequent clinical decisions at a crucial juncture, such as escalating therapy with adjuvant capecitabine for non-responders (pNR) or considering de-escalation for those with a pCR.
The statistical justification for utilizing 3-year OS as the primary endpoint constitutes a key finding of this study, directly addressing the clinical reality of TNBC's early hazard function. The head-to-head model comparison revealed non-inferior discriminative power and marginally superior calibration for the 3-year endpoint compared to the conventional 5-year benchmark. Critically, the DCA demonstrated substantial and clinically meaningful net benefit for the 3-year prediction, confirming its utility for near-term risk stratification. The numerically higher net benefit for the 5-year prediction is an expected consequence of event accumulation over time and does not diminish the validity of the earlier endpoint. Rather, the robust performance at 3 years ensures that reliable prognostic information is available when the risk of recurrence is highest, thereby enabling timely clinical interventions—such as adjuvant therapy intensification or follow-up planning—that are most relevant to the early-stage TNBC population. This evidence supports 3-year OS as a statistically sound and clinically responsive endpoint for prognostic modeling in this aggressive disease.
Surgically, patients who did not undergo breast-conserving surgery (HR = 1.28, p = 0.007) had a poorer prognosis, which supports the effectiveness of breast-conserving surgery in the treatment of TNBC. However, this benefit is largely attributable to radiotherapy’s role in reducing residual tumor burden, with Breast-Conserving Therapy (BCT) patients receiving radiation achieving 5-year local recurrence rates comparable to mastectomy (5% vs. 4.8%) [36, 37]. Besides, the observed survival disadvantage in mastectomy patients likely reflects confounding by indication—larger tumor size, aggressive biology, and differential use of radiotherapy—rather than a direct causal effect of the surgical approach itself. Particularly, two critical limitations warrant caution: (1) Residual confounding: Unmeasured variables (e.g., comorbidities, socioeconomic barriers to radiotherapy adherence) may influence surgical outcomes. For instance, patients with limited access to postoperative radiotherapy may be disproportionately directed toward mastectomy, potentially biasing the observed survival advantage of BCT [38, 39]. (2) Heterogeneity in radiotherapy protocols: Variations in dose, fractionation, and compliance rates across institutions complicate cross-population comparisons [40]. Prospective studies integrating detailed treatment records and social determinants of health are needed to disentangle these complex interactions and optimize locoregional control strategies.
Patients with TNBC are insensitive to endocrine and targeted therapies, making chemotherapy and radiotherapy critical treatment options [41]. Neoadjuvant therapy has become the preferred systemic approach for locally advanced TNBC, particularly due to its ability to downstage tumors and provide prognostic information through pathological response assessment [42, 43]. However, the SEER database lacks granular details on chemotherapy regimens (e.g., platinum vs. taxane use) and molecular biomarkers (e.g., BRCA mutations, PD-L1 expression), limiting the model’s ability to account for molecular heterogeneity or regimen-specific effects. This underscores the need for future research to incorporate multi-omics data and prospective treatment records into prognostic frameworks.
In addition, the nomogram has potential clinical application value. We analyzed a large sample using data from 17 medical centers registered in the SEER database, representing populations in different regions. This study compared predictive models with the AJCC TNM staging system for their ability to predict patient outcomes. The C-index for predicting OS was 0.780 in the training set and 0.773 in the validation set, indicating strong risk stratification ability. In comparison, the C-index for the TNM staging system was 0.715 in the training set and 0.719 in the validation set. In predicting 1-year, 3-year, and 5-year OS, the AUC values of the model in this study all exceed 0.8, while the AUC values of the TNM staging system surpass 0.7. This shows that this study provides more clinical information and has a better predictive effect than the TNM staging system. Our DCA (Fig. 7)quantification reveals several clinically important patterns. First, the peak net benefit improvement at 40% risk threshold for 5-year prediction (3.9 additional correct decisions/100 patients, Fig. 7) aligns precisely with established thresholds for chemotherapy intensification in TNBC. This suggests our nomogram could optimally guide platinum-based regimen decisions, potentially improving outcomes for high-risk patients [44]. Second, the sustained advantage across 20–80% thresholds for 3–5 year predictions indicates robust utility across the entire risk spectrum—from de-escalation decisions in low-risk patients to palliative care planning in advanced cases. Third, the attenuated 1-year performance reflects biological reality: early TNBC mortality often involves rapidly progressive disease less amenable to risk-stratified interventions [45]. Importantly, the validation cohort's preservation of > 2.0 additional correct decisions/100 patients at multiple thresholds demonstrates real-world applicability, exceeding the minimum clinically important difference for prognostic tools.
However,the limitations of this study stem from three key gaps: (1) Molecular data availability limitations: The SEER database lacks detailed molecular biomarkers (e.g. BRCA mutations, PD-L1 expression, tumor infiltrating lymphocytes) and chemotherapy regimen differences (e.g. platinum drug doses, immunotherapy use), limiting the fit of the model to the precise oncology framework;
(2) Unmeasured socioeconomic confounding factors: Observed ethnic differences may reflect unequal access to guideline-consistent treatment, not inherent biological differences-a bias not quantified in clinical variables of SEER; (3) Universality constraints: Internal validation alone cannot confirm the applicability of the model to diverse health care systems (e.g., non-American populations) [15]. These shortcomings underscore the imperative for next-generation prognostic tools that holistically integrate molecular, therapeutic, and socioeconomic dimensions.
To address these gaps, future TNBC prognostic models must achieve dual breakthroughs in predictive accuracy and equity through three synergistic strategies. (1) At the molecular level, the future model need to combines deep genomic (e.g. BRCA1/2 mutations, TP53 status), immunophenotype (e.g. PD-L1 dynamic expression) and proteomic data, combined with liquid biopsy techniques (e.g. ctDNA monitoring) to track tumor heterogeneity and minimal residual lesions in real time, and to build dynamic adaptive systems to optimize treatment strategies [12]. (2) Socioeconomic Equity: By incorporating variables such as insurance status, geographic accessibility [13], and race/ethnicity and linking these to cancer registries (e.g. SEER) and poverty indices models can disentangle biological drivers from systemic inequities, guiding targeted interventions to reduce outcome disparities. (3) In terms of technological innovation, artificial intelligence algorithms will realize deep coupling of multi-omics data and real-world treatment options, construct personalized prediction models, and dynamically update recurrence risk and treatment response probability. Ultimately, realizing this vision requires interdisciplinary collaboration (bridging bioinformatics, clinical oncology, and health policy) and ethical governance (ensuring data privacy, algorithmic fairness). Such efforts will propel TNBC prognostication toward a "molecular-societal-technological" paradigm, achieving both precision and inclusivity in cancer care.

Conclusions

Conclusions
This study successfully established TNBC prognostic model based on SEER database, constructed nomogram by identifying independent risk factors, significantly superior to traditional TNM staging system, provided strong support for individualized prognostic assessment and treatment decision, and expected to be widely used in clinical practice.

Supplementary Information

Supplementary Information

출처: PubMed Central (JATS). 라이선스는 원 publisher 정책을 따릅니다 — 인용 시 원문을 표기해 주세요.

🏷️ 같은 키워드 · 무료전문 — 이 논문 MeSH/keyword 기반

🟢 PMC 전문 열기