본문으로 건너뛰기
← 뒤로

Interpretable machine learning model for predicting 5-Year postoperative recurrence risk in patients with stage III colon cancer using preoperative laboratory tests: a two-centre study.

1/5 보강
BMC gastroenterology 📖 저널 OA 100% 2021: 1/1 OA 2024: 14/14 OA 2025: 121/121 OA 2026: 64/64 OA 2021~2026 2026 Vol.26(1) p. 72 OA
Retraction 확인
출처

PICO 자동 추출 (휴리스틱, conf 2/4)

유사 논문
P · Population 대상 환자/모집단
환자: stage III colon cancer (CC) postsurgery
I · Intervention 중재 / 시술
추출되지 않음
C · Comparison 대조 / 비교
추출되지 않음
O · Outcome 결과 / 결론
[CONCLUSIONS] A machine learning model constructed using preoperative laboratory tests and clinical and pathological features can assist in predicting the 5-year recurrence risk of patients with stage III colon cancer. This model provides potential reference values for the clinical development of individualized treatment strategies.

Wei H, Fu X, Cheng Y, Xu L, Wu X, Wang Z

📝 환자 설명용 한 줄

[BACKGROUND] Colorectal cancer (CRC) is one of the most prevalent malignant diseases worldwide and displays significant heterogeneity.

🔬 핵심 임상 통계 (초록에서 자동 추출 — 원문 검증 권장)
  • 표본수 (n) 290

이 논문을 인용하기

↓ .bib ↓ .ris
APA Wei H, Fu X, et al. (2026). Interpretable machine learning model for predicting 5-Year postoperative recurrence risk in patients with stage III colon cancer using preoperative laboratory tests: a two-centre study.. BMC gastroenterology, 26(1), 72. https://doi.org/10.1186/s12876-025-04511-9
MLA Wei H, et al.. "Interpretable machine learning model for predicting 5-Year postoperative recurrence risk in patients with stage III colon cancer using preoperative laboratory tests: a two-centre study.." BMC gastroenterology, vol. 26, no. 1, 2026, pp. 72.
PMID 41612209 ↗

Abstract

[BACKGROUND] Colorectal cancer (CRC) is one of the most prevalent malignant diseases worldwide and displays significant heterogeneity. The aim of this study was to investigate the application of machine learning algorithms to incorporate preoperative laboratory tests for predicting the 5-year recurrence risk in patients with stage III colon cancer (CC) postsurgery.

[METHODS] This study included two patient cohorts: the Zhejiang Cancer Hospital CC cohort (ZCC set, n = 290), which served as the training cohort, and the Dongyang CC cohort (DYC set, n = 125), which was utilized as an external testing cohort. Univariate analysis was initially performed on the 48 preoperative laboratory tests and 15 clinical and pathological features within the training cohort to pinpoint potential predictors. Features with a p value less than 0.05 were incorporated, and six machine learning models-logistic regression, random forest, XGBoost, support vector machine (SVM), back propagation neural network (BP NET), and K-nearest neighbour (KNN)-were employed to develop a model for predicting the 5-year recurrence risk in patients with stage III colon cancer. The prediction efficacy was assessed by calculating the area under the curve (AUC) of the machine learning model using the external test dataset, and comparisons were performed via the DeLong test. Ultimately, the Shapley additive explanations (SHAP) algorithm was applied to rank feature importance and compute the SHAP values for each feature, which were then visualized.

[RESULTS] Univariate analysis identified 10 laboratory tests and 6 clinical and pathological features that were incorporated into six machine learning models. The random forest model exhibited the highest predictive performance in the test cohort, with an AUC of 0.845. Logistic regression closely trailed, achieving an AUC of 0.823. The DeLong test revealed that the predictive performance of the random forest model was comparable to that of logistic regression and outperformed the other models. SHAP analysis indicated that the most important feature for predicting the 5-year recurrence risk of stage III colon cancer was perineural invasion, followed by FIB and then PT.

[CONCLUSIONS] A machine learning model constructed using preoperative laboratory tests and clinical and pathological features can assist in predicting the 5-year recurrence risk of patients with stage III colon cancer. This model provides potential reference values for the clinical development of individualized treatment strategies.

🏷️ 키워드 / MeSH 📖 같은 키워드 OA만

같은 제1저자의 인용 많은 논문 (5)

📖 전문 본문 읽기 PMC JATS · ~47 KB · 영문

Introduction

Introduction
Colorectal cancer (CRC) represents a formidable global health challenge; it is the third and second most prevalent cancer in men and women, respectively [1–3]. It accounts for approximately 10% of all newly diagnosed malignancies and contributes to 9.4% of all cancer-related deaths worldwide, posing a significant public health burden [1, 4]. In recent years, propelled by rapid advancements in basic and clinical research, widespread implementation of early screening techniques, continuous refinement of comprehensive treatment concepts, and remarkable improvements in surgical techniques, significant progress has been made in the diagnosis and treatment of colon cancer (CC), leading to a substantial increase in patient survival rates [5, 6]. Nevertheless, postoperative recurrence remains the primary challenge in the current treatment of CC.
Among its subtypes, stage III CC has a particularly poor prognosis, with a 5-year recurrence rate of approximately 30% [7–9]. Surgical resection remains the cornerstone of curative treatment for stage III CC, with adjuvant chemotherapy being the standard of care due to the increased risk of disease recurrence [7, 10]. Unfortunately, even after undergoing standard treatment, some patients with stage III CC face unfavourable outcomes, and recurrence may occur even during the adjuvant treatment process [11].
The clinical assessment of postoperative recurrence risk in patients with CC predominantly currently relies on the TNM staging system. Although this system provides crucial prognostic insights, it has notable limitations in accurately predicting recurrence. Moreover, the stage III CC patient population exhibits significant heterogeneity, even within the same TNM stage [12, 13]. This gap in risk assessment highlights the urgent need for more sophisticated, multifaceted approaches that transcend traditional staging to better inform adjuvant management strategies and improve patient outcomes [14]. For high-risk patients, a more aggressive chemotherapy regimen can be adopted, or the postoperative monitoring strategy can be strengthened.
Against this backdrop, clinical prediction models have become a research hotspot. These models are mathematical constructs—parametric, semiparametric, or nonparametric—that forecast the unknown using known features. They function as quantitative tools to evaluate risks and benefits in medical decision-making, and their use is becoming increasingly widespread. In contrast to traditional learning, machine learning can identify complex nonlinear relationships within a broad spectrum of medical datasets and can adapt continuously to new data, thereby increasing the accuracy of prediction models [15–18]. Previous research has indicated that certain preoperative laboratory indicators and clinical and pathological features are associated with the recurrence of CC [19, 20]. However, previous studies have analysed only a few specific indicators, omitting the outcomes of all routine examinations, and have primarily used traditional regression analysis. There has also been no further analysis of the models’ interpretability.
The present study aimed to predict the 5-year recurrence risk following CC surgery by combining 48 preoperative routine laboratory test results with 15 clinical and pathological features. Furthermore, it compared and evaluated the predictive performance of six machine learning models and employed Shapley additive explanations (SHAPs) for model interpretation.

Materials and methods

Materials and methods

Patients
This study included cohorts from two centres. The first cohort consisted of patients with pathological stage III CC who underwent curative surgical resection at Zhejiang Cancer Hospital between January 2015 and December 2019 and served as the training cohort (ZCC set, n = 290). The second cohort consisted of pathological stage III CC patients who underwent surgical resection at Dongyang Hospital affiliated with Wenzhou Medical University between January 2013 and December 2019 and was used as the external testing cohort (DYC set, n = 125). All enrolled patients signed an informed consent form prior to surgery and underwent standard surgical resection. Complete pathological reports confirmed stage III disease (i.e., positive lymph nodes). Other inclusion criteria also included an age range of 18 to 90 years, the absence of coexisting of malignant tumours in other parts of the body, missing data not exceeding 30%, the absence of antitumour treatment before surgery, and complete follow-up data.

Data collection
Clinical and pathological features were extracted from electronic medical records and standardized pathology reports. These factors included age, gender, body mass index (BMI), tumour site, pT stage, pN stage, overall stage, maximum diameter, histological grade, histological type, number of lymph nodes dissected, vascular invasion, perineural invasion, adjuvant chemotherapy regimen, and number of chemotherapy cycles. The adjuvant treatment regimens were classified into three categories: no adjuvant treatment, monotherapy, and combination therapy. The number of treatment cycles was grouped into three categories: 0 cycles, fewer than 4 cycles, and 4 or more cycles.
The missing data for D-dimer, which accounted for 22.1%, was excluded from the analysis, resulting in 48 laboratory tests. These tests encompassed six categories: routine blood examination, liver and kidney function tests, lipid metabolism, tumour markers, coagulation function, and other derived parameters. The other derived parameters include the prognostic nutritional index (PNI), systemic immunoinflammatory index (SII), platelet-to-lymphocyte ratio (PLR), and neutrophil-to-lymphocyte ratio (NLR), which are four metrics in total. The specific calculation formulas for these parameters are as follows: PNI = Alb (g/L) + 5 × absolute lymphocyte count (10^9/L), SII = absolute platelet count (10^9/L) × absolute neutrophil count (10^9/L)/absolute lymphocyte count (10^9/L), NLR = absolute neutrophil count (10^9/L)/absolute lymphocyte count (10^9/L), and PLR = absolute platelet count (10^9/L)/absolute lymphocyte count (10^9/L). All indicators represent the last results obtained within one week before surgery.

Follow-up
According to the NCCN guidelines, patients with stage III colon cancer are recommended to undergo six months of oxaliplatin-based adjuvant chemotherapy, such as mFOLFOX6 (5-fluorouracil, leucovorin, and oxaliplatin) or XELOX (capecitabine and oxaliplatin) [7]. However, the decision to undergo chemotherapy, along with the choice of regimen and cycle, was made collaboratively between the patient and the physician, considering the patient’s baseline condition, tolerance, and preferences. We provided outpatient follow-up every 3‒6 months after surgery. The last follow-up period for all patients was January 31, 2025, with a minimum follow-up duration of over 5 years. The definition of a recurrence event refers to the confirmation by comprehensive clinical judgment or histopathological examination of definite recurrent signs detected through clinical assessment methods such as imaging examinations and tumor marker tests in patients after radical resection.

Univariate and machine learning analysis
Features with missing data exceeding 10% of the dataset were excluded (with the D-dimer parameter being the only one removed), and the remaining features with missing data were addressed using multiple imputation. Multiple imputation was performed using the mice package in R, which enables hybrid imputation for handling multiple types of variables. Specifically, the following imputation methods were applied based on variable types: (a) predictive mean matching (pmm) was used for numerical data; (b) logistic regression (logreg) was employed for binary factor variables; (c) multinomial logistic regression (polyreg) was utilized for unordered multi-level factor variables; and (d) the proportional odds model (polr) was adopted for ordered multi-level factor variables. Univariate analysis was performed on all the features in the training cohort to identify potential predictors. Features with a p value less than 0.05 were incorporated, and six machine learning models—logistic regression, random forest, XGBoost, support vector machine (SVM), back propagation neural network (BP NET), and K-nearest neighbour (KNN)—were employed to predict 5-year recurrence. For machine learning processing, 5-fold cross-validation was applied. The caret R package (R version 4.2.0) was used to build all machine learning models, incorporating the “stat” R package for logistic regression, the “randomForest” R package for random forest, the “xgboost” R package for XGBoost, the “e1071” R package for SVM, the “nnet” R package for BP NET and the “KKNN” R package for KNN. To evaluate each model’s performance, we used metrics such as the area under the curve (AUC), accuracy, sensitivity, and specificity. The DeLong test was conducted to compare their predictive powers. To enhance the interpretability of the machine learning models, the optimal model was subsequently selected from the six aforementioned machine learning models. The SHAP method was applied for interpretation, including ranking feature importance, calculating SHAP values for each feature, and visualizing these values to clarify the specific associations between features and the 5-year recurrence risk of patients. The SHAP package (Python version 3.12.8) was used. A flowchart outlining the cohorts used in this study is shown in Fig. 1.

Statistical analysis
All analyses were conducted using R statistical software (version 4.2.0) and Python statistical software (version 3.12.8) for Windows. The Shapiro–Wilk test was applied to assess the normality of the distribution of clinical features within the cohorts. Continuous data are expressed as the means ± standard deviations or medians (interquartile ranges), with differences analysed using t tests or Mann‒Whitney U tests, as appropriate. Categorical data are presented as frequencies or percentages, and differences were analysed using the chi-square test or Fisher’s exact test. A p value of < 0.05 was considered to indicate statistical significance.

Results

Results

Baseline clinical and pathological conditions
Patients lost to follow-up, those with a follow-up period of less than 5 years without recurrence, and patients with missing data exceeding 30% were excluded. A total of 415 patients from two cohorts were included in this study. Of these, 290 patients from Zhejiang Cancer Hospital formed the training set (ZCC set), and 125 patients from Dongyang Hospital affiliated with Wenzhou Medical University constituted the external testing set (DYC set). In the training set the number of patients with and without recurrence was 104 and 186, respectively. In the external testing set, the number of patients with and without recurrence was 63 and 62, respectively. The clinical and pathological features of the two cohorts were largely consistent, as detailed in Table 1. Compared with the training cohort, the test cohort clearly had a greater proportion of elderly patients, larger tumour diameters, later tumour stages, fewer lymph node dissections, more absences of adjuvant chemotherapy, and fewer chemotherapy cycles. These factors may have contributed to the higher recurrence rate in the test cohort. The results of multiple imputation for missing data in the two cohorts are presented in the supplementary materials (Table S1).

Univariate analysis in the training set (ZCC set)
The missing data for D-dimer accounted for 22.1% of the dataset (64 out of 290), and since this exceeded 10% of the total sample size, they were excluded from the analysis. As a result, 48 laboratory tests and 15 clinical and pathological features, including postoperative adjuvant treatment status, were retrieved and subjected to univariate analysis. The P values for 16 features (10 laboratory tests and 6 clinical and pathological features) were less than 0.05, as detailed in Table 2. The 10 laboratory tests included NEU (neutrophil count), WBC (white blood cell count), RDW (red blood cell distribution width), ALP (alkaline phosphatase), TT (thrombin time), FIB (fibrinogen), PT (prothrombin time), CEA (carcinoembryonic antigen), NLR (neutrophil-to-lymphocyte ratio), and the SII (systemic immune-inflammation index). The 6 clinical and pathological features included age, pN stage, overall stage, perineural invasion, adjuvant chemotherapy regimen, and number of chemotherapy cycles. The P values of the remaining features were ≥ 0.05 and are presented in the supplementary materials (Table S2).

Construction and evaluation of machine learning models
The 16 features that exhibited statistical significance in the univariate analysis were utilized as inputs for six distinct machine learning models—specifically, logistic regression, random forest, XGBoost, SVM, BP NET, and KNN—to facilitate learning within the training cohort. The specific parameters of the six machine learning models are provided in supplementary materials A. The ability of these six machine learning models to predict the 5-year recurrence risk of stage III colon cancer was subsequently assessed in the external testing cohort (DYC set). The results indicated that the random forest model attained the highest AUC value, 0.845, followed by the logistic regression model, with an AUC of 0.823. Both models exhibited robust predictive capabilities. The results of the random forest with 5-fold cross-validation are presented in the supplementary materials (Table S3). The error rate ranged from 0.25 to 0.30, and the accuracy ranged from 0.70 to 0.75, indicating good model stability. In comparison, the predictive performance of the BP NET, SVM, and KNN models was moderate, with AUC values ranging from 0.713 to 0.756, whereas XGBoost had poorer predictive performance, with an AUC of 0.651, as illustrated in Fig. 2. The metrics for the training and testing cohorts for the prediction of 5-year recurrence risk using the six machine learning models are presented in Table 3. DeLong’s test was used to compare the predictive efficacy among the six machine learning models. There were no statistically significant differences in predictive efficacy between the random forest and logistic regression models (P value was 0.43), suggesting that their predictive capabilities were comparable and that the random forest model was superior to the remaining four models (all P values were < 0.05). To better present the results, the Delong test results were visualized using a heatmap, as shown in Fig. 3.

Visualization of the optimal machine learning model using SHAP
The optimal random forest model was selected for analysis. A feature importance plot was constructed for the random forest model to analyse the importance of each feature in the machine learning model. The details and code for calculating SHAP values are provided in supplementary materials B. The feature importance plot indicated that the top five features were perineural invasion, FIB, PT, age, and TT, in sequential order, as depicted in Fig. 4. In the SHAP plot, each point corresponds to a patient, with feature values indicated by colour. The colour gradient from red to blue signifies feature values ranging from high to low. The SHAP value on the horizontal axis represents the predicted probability of a patient’s 5-year recurrence. A higher probability correlates with a larger SHAP value, as illustrated in Fig. 5. Features such as perineural invasion, FIB, PT, age, CEA, pN stage, and overall stage were positively correlated with recurrence, whereas features such as TT, RDW, adjuvant chemotherapy regimen, and the number of chemotherapy cycles were negatively correlated.

Discussion

Discussion
CC, a prevalent malignant tumour, has emerged as a major challenge in the realm of global public health, given its high incidence and mortality rates [21]. This underscores the need for strengthened preventive measures, strategies for early detection, and advancements in treatment methods to mitigate its significant impact on public health. Stage III CC typically involves deep infiltration of the colonic wall and adjacent lymph nodes but without distant metastasis [22]. At this stage, treatment usually requires a comprehensive approach that combines radical surgery and adjuvant chemotherapy [7, 23]. Nevertheless, the prognosis for stage III CC remains unsatisfactory, with a high recurrence rate [24]. Accurately predicting patients’ treatment outcomes has become a major challenge. Therefore, designing precise prediction tools to determine patients’ posttreatment recurrence risk is crucial for improving patient outcomes.
Several studies have been conducted on the prediction of postoperative recurrence in patients with stage III CC in China and other countries. As early as 2015, the CRC Subtyping Consortium reported a novel classification system for CC, which aids in more accurately understanding tumour biological behaviour, guiding treatment selection, and predicting prognosis [25]. Unfortunately, this molecular classification system relies on molecular characteristics (such as gene expression, mutation profiles, and epigenetic features), which restricts its universal clinical application. Mitsunori Ushigome et al. enrolled 233 stage III CC patients and identified the CRP, CEA, CA199, and T4 stages as independent prognostic factors for relapse-free survival (RFS) through multivariate analysis [26]. Matsuoka H et al. conducted a retrospective analysis of 120 stage III CRC patients who underwent curative colectomy and identified preoperative bowel obstruction, N2, and fewer than 17 examined lymph nodes as high-risk factors for recurrence [27]. Moreover, indicators calculated through laboratory tests, such as the PNI and NLR, serve as independent predictors of recurrence risk in patients with CRC [28, 29]. However, the above studies predominantly considered clinical and pathological features, either not incorporating laboratory tests or including only a small number of them.
This study integrated 48 laboratory test indicators and 15 clinical and pathological features, including postoperative adjuvant therapy information, to comprehensively evaluate preoperative patients’ multidimensional parameters and predict the recurrence risk of stage III CC. To exclude features with low correlation, univariate analysis was performed first, ultimately screening out 10 laboratory tests and 6 clinical and pathological features (as shown in the Results Section). Six different machine learning methods were employed to construct predictive models, and their performance was evaluated on the test dataset. Additionally, the DeLong test was used to analyse differences between models. Owing to the differences in the logic and complexity of various machine learning algorithms, there may also be variations in their clinical applications. In this study, the random forest model demonstrated the best predictive performance, outperforming the LR, SVM, KNN, BP NET, and XGBoost models. Its AUC reached 0.845, with a prediction accuracy of 77.6%. Random forest is an ensemble learning method that involves constructing multiple decision trees or regression trees and making predictions by averaging the predictions from all the individual trees. It leverages the power of aggregating results from various trees to improve the overall predictive performance and reduce overfitting, making it a robust and accurate model for classification and regression tasks. Previous studies have also shown that the random forest model has excellent performance in clinical applications [30, 31]. The suboptimal performance of the XGBoost model may be attributed to its poor adaptability to the current dataset; its performance is expected to improve with larger sample sizes. In contrast, logistic regression is more suitable for primary hospitals due to its advantages of easy operation, fast computation, and interpretable results. The nomogram based on logistic regression is provided in the supplementary materials C. By comparison, the random forest model requires programming tools and professional deployment, making it more appropriate for secondary and tertiary hospitals.
The SHAP algorithm, a game theory-based method, elucidates features and models by calculating the contribution of each feature to the prediction outcomes. In past studies, the SHAP method has been utilized to gain a clear understanding of the decision-making process of machine learning models. For example, Wang Y et al. applied the SHAP method to interpret a multiparameter magnetic resonance imaging radiomic model predicting the efficacy of neoadjuvant chemotherapy in patients with advanced rectal cancer [32]. In this study, the SHAP algorithm was used to provide a visual explanation of the random forest prediction model. Feature importance analysis showed that perineural invasion significantly contributed to the risk of recurrence in patients with stage III colon cancer, which aligns with the findings of previous studies [33]. The adjuvant chemotherapy regimen and the number of chemotherapy cycles were negatively correlated with 5-year recurrence. In other words, combined chemotherapy and a full course of chemotherapy can reduce the risk of recurrence.
In recent years, deep learning techniques have also been applied to the field of colorectal cancer recurrence prediction. Domestic researchers have identified high-risk patients with locally advanced colorectal cancer based on CT images [34], achieving an AUC of approximately 0.85, which is comparable to the performance observed in our study. International scholars have combined pathological images with ctDNA testing to predict postoperative recurrence risk in colorectal cancer, yielding an AUC of 0.9 [35]. However, the widespread application of ctDNA is limited by its high detection costs. Moving forward, integrating imaging and pathological images into our current research framework could enable more accurate prediction of recurrence risk. Additionally, one aspect warrants further attention throughout the process. Several research reports have been published regarding other tumor types [36, 37]. The AUC of the random forest model on the training set was nearly 1, which raises concerns about potential overfitting. Subsequently, the 5-fold cross-validation we conducted revealed that the error rate fluctuated between 0.25 and 0.30, and the accuracy fluctuated between 0.70 and 0.75. For the independent test set, the AUC reached 0.845. Therefore, we assessed the model’s performance as good, with strong generalization ability. A similar situation was also reported in a study by Barrenada L [38]. This study included two cohorts with an acceptable sample size; however, it lacked validation data from multiple centres with larger sample sizes. Consequently, further optimization using multicentre datasets with larger sample sizes is essential to enhance accuracy and generalizability.

Conclusion

Conclusion
Among the six machine learning models constructed using preoperative laboratory tests and clinical and pathological features, the random forest model demonstrated optimal performance and the best model stability. This model can assist in predicting the 5-year recurrence risk of patients with stage III CC, providing potential reference value for the clinical development of individualized treatment strategies. Moreover, the machine learning model can be interpreted through the SHAP method, which helps in understanding the decision-making process of the model.

Supplementary Information

Supplementary Information

출처: PubMed Central (JATS). 라이선스는 원 publisher 정책을 따릅니다 — 인용 시 원문을 표기해 주세요.

🏷️ 같은 키워드 · 무료전문 — 이 논문 MeSH/keyword 기반

🟢 PMC 전문 열기