Explainable machine learning model for predicting early recurrence and distant metastasis after surgery in early-onset colorectal cancer.
[OBJECTIVE] To develop explainable machine learning models for predicting the risk of early postoperative recurrence and distant metastasis in patients with early-onset colorectal cancer.
APA
Ni W, Zhang B, et al. (2025). Explainable machine learning model for predicting early recurrence and distant metastasis after surgery in early-onset colorectal cancer.. Surgery, 109973. https://doi.org/10.1016/j.surg.2025.109973
MLA
Ni W, et al.. "Explainable machine learning model for predicting early recurrence and distant metastasis after surgery in early-onset colorectal cancer.." Surgery, 2025, pp. 109973.
PMID
41402185
Abstract
[OBJECTIVE] To develop explainable machine learning models for predicting the risk of early postoperative recurrence and distant metastasis in patients with early-onset colorectal cancer.
[METHODS] Patients with early-onset colorectal cancer who underwent radical resection at the 900th Hospital of PLA Joint Logistic Support Force (2014-2020) were included. Clinical data were retrieved from electronic medical records with 3-year postoperative follow-up. Patients were stratified into recurrence/metastasis and no recurrence/metastasis groups based on clinical outcomes. Feature selection was performed using univariate analysis and least absolute shrinkage and selection operator regression. Subsequently, 5 machine learning algorithms-k-nearest neighbors, logistic regression, random forest, support vector machine, and extreme gradient boosting-were employed to develop predictive models. Model performance and clinical utility were validated through receiver operating characteristic curves and their corresponding area under the curve values, calibration curves, and decision curve analysis. Model explainability was assessed using Shapley additive explanations.
[RESULTS] Among 256 enrolled patients with early-onset colorectal cancer, 121 (47.3%) experienced recurrence/metastasis. Ten predictive features were identified: T stage, N stage, histologic subtype, vascular/neural invasion, carcinoembryonic antigen, neutrophil-to-lymphocyte ratio, platelet-to-lymphocyte ratio, hemoglobin-to-red blood cell distribution width ratio, triglyceride-glucose index, and Prognostic Nutritional Index. The random forest model demonstrated optimal performance in the test set (area under the curve 0.827, sensitivity 0.760, specificity 0.852, accuracy 0.808, precision 0.826, F1 score 0.792). Shapley additive explanations analysis revealed T stage as the most influential predictor.
[CONCLUSION] Among the 5 machine learning models developed, the random forest algorithm demonstrated superior predictive performance for early postoperative recurrence and distant metastasis in patients with early-onset colorectal cancer. Explainable random forest models can provide personalized clinical decision making for the diagnosis and treatment of these patients.
[METHODS] Patients with early-onset colorectal cancer who underwent radical resection at the 900th Hospital of PLA Joint Logistic Support Force (2014-2020) were included. Clinical data were retrieved from electronic medical records with 3-year postoperative follow-up. Patients were stratified into recurrence/metastasis and no recurrence/metastasis groups based on clinical outcomes. Feature selection was performed using univariate analysis and least absolute shrinkage and selection operator regression. Subsequently, 5 machine learning algorithms-k-nearest neighbors, logistic regression, random forest, support vector machine, and extreme gradient boosting-were employed to develop predictive models. Model performance and clinical utility were validated through receiver operating characteristic curves and their corresponding area under the curve values, calibration curves, and decision curve analysis. Model explainability was assessed using Shapley additive explanations.
[RESULTS] Among 256 enrolled patients with early-onset colorectal cancer, 121 (47.3%) experienced recurrence/metastasis. Ten predictive features were identified: T stage, N stage, histologic subtype, vascular/neural invasion, carcinoembryonic antigen, neutrophil-to-lymphocyte ratio, platelet-to-lymphocyte ratio, hemoglobin-to-red blood cell distribution width ratio, triglyceride-glucose index, and Prognostic Nutritional Index. The random forest model demonstrated optimal performance in the test set (area under the curve 0.827, sensitivity 0.760, specificity 0.852, accuracy 0.808, precision 0.826, F1 score 0.792). Shapley additive explanations analysis revealed T stage as the most influential predictor.
[CONCLUSION] Among the 5 machine learning models developed, the random forest algorithm demonstrated superior predictive performance for early postoperative recurrence and distant metastasis in patients with early-onset colorectal cancer. Explainable random forest models can provide personalized clinical decision making for the diagnosis and treatment of these patients.