Development and validation of a machine learning model for predicting early death in metastatic pancreatic ductal adenocarcinoma: a study based on the SEER database.
[BACKGROUND] Metastatic pancreatic ductal adenocarcinoma (mPDAC) has a poor prognosis, with a significant number of patients experiencing early death.
APA
Zhang L, He J (2026). Development and validation of a machine learning model for predicting early death in metastatic pancreatic ductal adenocarcinoma: a study based on the SEER database.. Translational cancer research, 15(1), 53. https://doi.org/10.21037/tcr-2025-1276
MLA
Zhang L, et al.. "Development and validation of a machine learning model for predicting early death in metastatic pancreatic ductal adenocarcinoma: a study based on the SEER database.." Translational cancer research, vol. 15, no. 1, 2026, pp. 53.
PMID
41674965
Abstract
[BACKGROUND] Metastatic pancreatic ductal adenocarcinoma (mPDAC) has a poor prognosis, with a significant number of patients experiencing early death. Identifying these high-risk patients at diagnosis is critical for personalizing treatment intensity, facilitating timely palliative care discussions, and improving clinical trial stratification. Therefore, this study aimed to develop and validate a machine learning (ML)-based algorithm to estimate the probability of early death in patients with mPDAC.
[METHODS] We recruited a total of 14,820 patients diagnosed with mPDAC from the Surveillance, Epidemiology, and End Results (SEER) databases. Key exclusion criteria were missing data on survival time or essential variables. The cohort was randomly split into a training set (70%) and an internal test set (30%). For external validation, we retrospectively enrolled patients with mPDAC from a Chinese medical center (2017-2019), representing a distinct geographic and healthcare population. The primary outcome was early death, defined as all-cause mortality within three months of diagnosis. Baseline clinical predictors included demographic, tumor, and treatment characteristics. Four ML models were constructed based on clinical and pathological features. The effectiveness of these models was assessed through various metrics such as the area under the curve (AUC), calibration plots, and decision curve analysis (DCA). The optimal model was selected based on 10-fold cross-validation and its generalizability was internally and externally validated. Additionally, Shapley values for relevant features were calculated using the SHapley Additive exPlanations (SHAP) method.
[RESULTS] The extreme gradient boosting classifier (XGBoost) model demonstrated the best performance (AUC =0.757). Crucially, it maintained strong generalizability in the independent external Chinese cohort (AUC =0.780), demonstrating robust cross-population applicability. According to the feature importance ranking plot generated, chemotherapy stood out as the most crucial feature, followed by age, and marital status.
[CONCLUSIONS] We developed and validated an interpretable ML model that accurately predicts the risk of early death in mPDAC patients. The model's robust performance across US and Chinese populations underscores its broad clinical utility. This tool can assist clinicians in identifying high-risk individuals at diagnosis, thereby informing personalized treatment strategies, prioritizing palliative care, and optimizing resource allocation in diverse healthcare settings.
[METHODS] We recruited a total of 14,820 patients diagnosed with mPDAC from the Surveillance, Epidemiology, and End Results (SEER) databases. Key exclusion criteria were missing data on survival time or essential variables. The cohort was randomly split into a training set (70%) and an internal test set (30%). For external validation, we retrospectively enrolled patients with mPDAC from a Chinese medical center (2017-2019), representing a distinct geographic and healthcare population. The primary outcome was early death, defined as all-cause mortality within three months of diagnosis. Baseline clinical predictors included demographic, tumor, and treatment characteristics. Four ML models were constructed based on clinical and pathological features. The effectiveness of these models was assessed through various metrics such as the area under the curve (AUC), calibration plots, and decision curve analysis (DCA). The optimal model was selected based on 10-fold cross-validation and its generalizability was internally and externally validated. Additionally, Shapley values for relevant features were calculated using the SHapley Additive exPlanations (SHAP) method.
[RESULTS] The extreme gradient boosting classifier (XGBoost) model demonstrated the best performance (AUC =0.757). Crucially, it maintained strong generalizability in the independent external Chinese cohort (AUC =0.780), demonstrating robust cross-population applicability. According to the feature importance ranking plot generated, chemotherapy stood out as the most crucial feature, followed by age, and marital status.
[CONCLUSIONS] We developed and validated an interpretable ML model that accurately predicts the risk of early death in mPDAC patients. The model's robust performance across US and Chinese populations underscores its broad clinical utility. This tool can assist clinicians in identifying high-risk individuals at diagnosis, thereby informing personalized treatment strategies, prioritizing palliative care, and optimizing resource allocation in diverse healthcare settings.
같은 제1저자의 인용 많은 논문 (5)
- Rapid and sensitive detection of botulinum toxin type A in complex sample matrices by AlphaLISA.
- A new technique for Asian nasal tip shaping: "twin tower" folding ear cartilage transplantation.
- ELK1/NOL3/GRP78 axis regulates proliferation and stemness in TP53-mutant colon cancer by enhancing adaptive endoplasmic reticulum stress.
- Clinical Characteristics and Prognostic Prediction of Secondary Solid Malignancies in Patients With Diffuse Large B-Cell Lymphoma and Follicular Lymphoma.
- Mitochondrial transfer in the HSC-HCC-macrophage network shapes hepatocellular carcinoma progression.