Development of a machine learning-based model for predicting postoperative survival in gastric cancer.

Lü YN; Liu D; Tao S; Wu J; Yu SJ; Yuan HL

doi:10.4240/wjgs.v18.i2.114951

← 뒤로

Development of a machine learning-based model for predicting postoperative survival in gastric cancer.

1/5 보강

World journal of gastrointestinal surgery 📖 저널 OA 100% 2021~2026 2026 Vol.18(2) p. 114951 OA

PICO 자동 추출 (휴리스틱, conf 2/4)

유사 논문

P · Population 대상 환자/모집단

304 patients with gastric adenocarcinoma.

I · Intervention 중재 / 시술

추출되지 않음

C · Comparison 대조 / 비교

추출되지 않음

O · Outcome 결과 / 결론

[CONCLUSION] A robust and interpretable predictive model integrating advanced data imputation was successfully developed. The deployed tool facilitates individualized prognostic assessment and shows potential for enhancing personalized treatment planning in gastric cancer.

Lü YN, Liu D, Tao S, Wu J, Yu SJ, Yuan HL

📖 무료 전문 🟢 PMC 전문 PMC12968659 🔓 OA PDF unpaywall · unknown

PubMed ↗ DOI ↗ BibTeX ↓ RIS ↓

📝 환자 설명용 한 줄

[BACKGROUND] Accurate prediction of postoperative survival is crucial for the personalized management of gastric cancer.

이 논문을 인용하기

↓ .bib ↓ .ris

APA Lü YN, Liu D, et al. (2026). Development of a machine learning-based model for predicting postoperative survival in gastric cancer.. World journal of gastrointestinal surgery, 18(2), 114951. https://doi.org/10.4240/wjgs.v18.i2.114951

MLA Lü YN, et al.. "Development of a machine learning-based model for predicting postoperative survival in gastric cancer.." World journal of gastrointestinal surgery, vol. 18, no. 2, 2026, pp. 114951.

PMID 41809349 ↗

DOI 10.4240/wjgs.v18.i2.114951

Abstract

[BACKGROUND] Accurate prediction of postoperative survival is crucial for the personalized management of gastric cancer. However, the development of robust predictive models is often constrained by incomplete clinical data, while their clinical utility is limited by poor interpretability and the absence of practical applications.

[AIM] To develop an interpretable machine learning model for predicting 3-year survival following gastric cancer surgery. A novel data imputation method was proposed to handle missing values, and a user-friendly online tool was developed to facilitate clinical decision-making.

[METHODS] A retrospective analysis was conducted on a group of 304 patients with gastric adenocarcinoma. A hybrid imputation method (HDI-MF-Gower) was developed and compared against conventional techniques. Key prognostic factors were identified by integrating least absolute shrinkage and selection operator regression with the Boruta algorithm. Subsequently, ten machine learning models were trained and validated.

[RESULTS] The proposed HDI-MF-Gower method demonstrated superior imputation accuracy. Seven features were selected for the final model. The extra trees classifier achieved the best performance on the independent validation set, with an area under the curve of 0.853 and an accuracy of 0.772. The optimal model was interpreted using SHapley Additive exPlanations analysis and deployed as an online prediction tool.

[CONCLUSION] A robust and interpretable predictive model integrating advanced data imputation was successfully developed. The deployed tool facilitates individualized prognostic assessment and shows potential for enhancing personalized treatment planning in gastric cancer.

🏷️ 키워드 / MeSH 📖 같은 키워드 OA만

📖 전문 본문 읽기 PMC JATS · ~63 KB · 영문

INTRODUCTION

INTRODUCTION
Gastric cancer represents a major global health challenge and remains a leading cause of cancer-related mortality worldwide[1,2]. For patients undergoing curative-intent radical gastrectomy, postoperative survival outcomes often exhibit significant heterogeneity[3]. Accurate prediction of individual survival is therefore crucial for tailoring personalized adjuvant therapy and follow-up strategies[4,5]. The American Joint Committee on Cancer tumor-node-metastasis (TNM) staging system serves as the cornerstone of current prognostic assessment. However, it primarily relies on the anatomical extent of the tumor and fails to adequately incorporate other critical clinical, pathological, and treatment-related variables. This limitation restricts its accuracy for individualized risk stratification[6-8]. Machine learning (ML) has emerged as a powerful tool for prognostic modeling due to its capacity to identify complex, nonlinear patterns within high-dimensional data[9,10]. Previous studies have applied various ML algorithms to predict survival in gastric cancer patients, demonstrating considerable promise[11,12]. Despite this potential, the clinical translation and practical application of these models face several key challenges. First, the handling of missing data in retrospective clinical groups presents a fundamental challenge. Traditional imputation methods are often simplistic, while advanced techniques like K-nearest neighbors (KNN) and multiple imputation by chained equations (MICE) can be limited by the curse of dimensionality or reliance on linear assumptions, thereby failing to capture complex nonlinear relationships in clinical data. Although missForest can handle mixed data types and nonlinearity, improper handling of its initial imputation may distort the underlying data distribution, compromising the efficiency and quality of subsequent iterative optimization[13-15]. Second, constructing robust and generalizable models depends on effective feature selection to identify a concise yet powerful set of predictors from numerous candidate variables. The absence of this step increases the risk of model overfitting and diminishes generalizability. Finally, the “black-box” nature of many high-performance ensemble models, coupled with a lack of interpretability and practical tools, hinders their acceptance by clinicians and integration into routine workflows. This ultimately prevents these models from providing effective, real-time decision support[16-18]. Although previous research has explored ML applications in gastric cancer prognosis, studies that specifically develop interpretable models while systematically addressing data incompleteness and clinical deployment issues remain significantly lacking. This study aims to address this gap by introducing a predictive model that utilizes a novel ML workflow to identify postoperative gastric cancer patients at high risk of mortality. Our objectives are threefold: (1) To employ a novel imputation technique for improving data quality; (2) To identify key prognostic factors through rigorous feature selection; and (3) To build and interpret an ML model for accurate prediction of 3-year survival. The ultimate goal is to provide a new method for the early identification of high-risk patients. The deployment of this model as a clinical online tool is intended to foster the practical application of ML in oncology. Subsequent external validation will be conducted to enhance the model’s reliability and facilitate its clinical adoption, thereby offering more scientific and precise decision support for gastric cancer patient management.

MATERIALS AND METHODS

MATERIALS AND METHODS

Data collection
This retrospective study consecutively enrolled 526 patients with primary gastric adenocarcinoma who underwent laparoscopic radical gastrectomy at the Zhongshan Hospital Affiliated to Dalian University from December 2011 to December 2018. Stringent inclusion and exclusion criteria were applied to ensure data homogeneity and analytical reliability. The inclusion criteria were: (1) A pathological diagnosis of primary gastric adenocarcinoma; and (2) Treatment with laparoscopic radical gastrectomy. Exclusion criteria were as follows: (1) Preoperative or intraoperative evidence of peritoneal dissemination or distant metastasis (M1 stage) - note that this specifically excludes patients with M1 disease, while those classified as stage IV due to locally advanced features (e.g., N3 status) without distant metastasis (M0) according to the American Joint Committee on Cancer 8th edition staging system were retained; (2) Diagnosis of gastric stump cancer; (3) Lack of a standardized preoperative contrast-enhanced abdominal computed tomography scan or a time interval exceeding one month between computed tomography and surgery; or (4) Incomplete clinical-pathological records, laboratory data, or follow-up information critical for the analysis. After this screening process, 304 eligible patients were included in the final analysis. The study protocol was approved by the Institutional Review Board of the Zhongshan Hospital Affiliated to Dalian University (Approval No. KY2023-002-2).

HDI-MF-Gower imputation method
Missing data are often unavoidable in retrospective clinical studies; the pattern of missing data in our group is illustrated in Figure 1. To assess the mechanism of the missing data, Little’s missing completely at random (MCAR) test was performed. Among the 85 statistical tests conducted, only 2 (2.35%) showed significant differences, a proportion far below the commonly accepted threshold of 10%-20%. Given the low overall missing rate (0.69%) and the strict significance level after Bonferroni correction, the data were considered to meet the MCAR hypothesis. To address the missing values, a novel hybrid imputation algorithm, HDI-MF-Gower, was developed. This algorithm employs a two-stage strategy. First, the Gower distance metric is utilized to identify the most similar sample for each instance with a missing value, providing an intelligent initial imputation. Second, the missForest algorithm is applied for iterative optimization. This step uses the initial estimates as a starting point to capture complex nonlinear relationships among variables. This hybrid design integrates the advantages of both local similarity-based and global pattern-learning imputation methods, thereby enhancing the accuracy of imputation for mixed-type clinical datasets.
The specific procedural steps are as follows: (1) Input: Training dataset D{x1,x2,…xn}, numerical feature set N, categorical feature set F, convergence threshold ε, and maximum number of iterations P; (2) Identify complete samples C and incomplete samples I; (3) For numerical features k∈N, the weight is computed as:

Where Var(Xk) represents the variance, MRk represents the missing rate of feature k; for categorical features k∈F, the weight is:

Where H(Xk) represents the information entropy. To prevent extreme weight values from dominating distance calculation, nonlinear compression and normalization are applied to the weights; (4) Using the adaptive Gower distance as the similarity metric, for each sample D in containing one or more missing values, identified the most similar complete sample in the dataset. The corresponding observed values from this matched sample were then used to impute the missing entries, which produced the initially imputed matrix ; (5) Sort all variables in ascending order according to their missing value rate. Denote this ordered list of variable names as the vector K; (6) Evaluate whether the convergence criterion ε or the maximum iteration count P has been reached. If either condition is satisfied, terminate the procedure and output the latest imputed matrix . Otherwise, proceed to repeat the iterative operations defined in step 7 to step 9; (7) Store the imputed data matrix obtained from the previous iteration and denote it as ; (8) For the variables in vector K, the random forest algorithm was sequentially applied to impute missing values, and the matrix was updated using the imputed values to obtain a new matrix ; (9) Calculate the difference metric between matrices and , and return to step 6; and (10) The final matrix obtained after iteration termination.
The iteration termination condition ε in the algorithm is determined by the difference between the imputed matrix and the pre-imputation matrix. The algorithm terminates when the difference between them falls below a pre-defined threshold. For the set of continuous variables N, the difference ΔN is defined as:

Where, N’ denotes the set of continuous variables with missing values, and Mj represents the set of missing-value positions for the j-th variable. For the set of categorical variables F, the difference ΔF
is defined as:

Where I(g) is the indicator function that takes the value 1 if the condition is satisfied and 0 otherwise. Supplementary Table 1 shows the pseudocode for the HDI-MF-Gower imputation method.

Model construction and validation
The dataset of 304 patients was imputed using the HDI-MF-Gower method and randomly split into training and validation sets at a 7:3 ratio. The training set was used for model development with hyperparameter tuning via 10-fold cross-validation, while the validation set was reserved for independently assessing the model’s generalization ability. To identify the most predictive features for three-year postoperative mortality and mitigate overfitting, a dual feature selection strategy was employed. Least absolute shrinkage and selection operator (LASSO) regression was applied to select key variables, while the Boruta algorithm - a random forest-based wrapper method - was used to identify all-relevant predictors[19-21]. The final feature set was defined as the intersection of the features retained by both methods, which helped enhance model accuracy, reduce overfitting, and eliminate irrelevant variables[22,23]. Using this optimal feature subset, ten ML algorithms were developed and compared: Logistic regression, random forest, extreme gradient boosting (XGBoost), light gradient boosting machine, support vector machine, multi-layer perceptron, extra trees, KNN, decision tree, and gradient boosting. The primary outcome was individual three-year mortality risk, area under the curve (AUC) and accuracy as the main performance metrics. Secondary metrics included specificity, recall (sensitivity), precision, and the F1-score. Model calibration was assessed using calibration curves and the Brier score, which quantifies the agreement between predicted probabilities and observed outcomes. Decision curve analysis was used to evaluate clinical net benefit across various probability thresholds. To improve interpretability, SHapley Additive exPlanations (SHAP) analysis was applied to illustrate the contribution of each feature to individual predictions. Finally, a user-friendly online prediction tool was developed to facilitate the clinical application of the optimal model.

Statistical analysis
Statistical analyses were performed using RStudio (version R4.4.1) and SPSS software (version 25.0). Continuous variables that followed a normal distribution were presented as mean ± SD, and inter-group comparisons were conducted using t-tests. For continuous variables that did not follow a normal distribution, values were expressed as median (interquartile range), and inter-group comparisons were performed using the Mann-Whitney U test. Categorical variables were presented as n (%), and inter-group comparisons were made using the χ2 test.

RESULTS

RESULTS

Baseline characteristics of patients
Following the screening process, 304 patients were included in the final analysis. The group comprised 211 males (69.41%) and 93 females (30.59%), with a mean age of 67 years. Based on the 3-year survival outcome, 165 patients (54.28%) were categorized into the survival group and 139 (45.72%) into the non-survival group. Significant intergroup differences (P < 0.05) were observed in the following parameters: Max tumor diameter, age, red blood cell count, hemoglobin, albumin, creatinine, carcinoembryonic antigen (CEA), intraoperative blood loss, sex, alcohol consumption (drinking), resection range, reconstruction method, complications, lymphovascular invasion, nerve infiltration, and TNM stage. The detailed baseline characteristics of the patients are summarized in Table 1.

Evaluation and comparison of imputation methods
To evaluate the proposed HDI-MF-Gower imputation method, we conducted an experimental study assessing its performance from two perspectives: Imputation accuracy and its downstream impact on the predictive models’ classification AUC. A complete dataset of 240 samples with 18 categorical and 15 continuous variables was first obtained by removing any samples with missing values from the original group of 304 patients. Subsequently, MCAR mechanisms were simulated by introducing missing values at rates of 5%, 10%, 15%, and 20%. The performance of HDI-MF-Gower was compared against several benchmark methods: KNN imputation, mean/mode imputation, missForest, and MICE. For continuous variables, the normalized root mean square error (NRMSE) was used to quantify the discrepancy between the imputed and true values. The NRMSE is defined as follows:

Where Xtrue is a vector containing the original true values of all numerical data points that were artificially set to missing, and Ximp is a vector containing the corresponding imputed values generated by the algorithm. The symbol std(g) represents the standard deviation of the vector used for calculation.
For categorical variables, the proportion of falsely classified (PFC) entries was used to measure the imputation accuracy. It is calculated directly as the proportion of incorrectly imputed categories to the total number of imputed entries. The formula is defined as:

Where Ctrue,i is the original true category of the i-th categorical data point that was artificially set to missing, Cimp,i is the corresponding imputed category generated by the algorithm for Ctrue,i, and n represents the total number of imputed categorical data points.
As shown in Table 2, the proposed HDI-MF-Gower method demonstrated superior imputation accuracy across all missingness rates compared to four benchmark methods - mean/mode, KNN, MICE, and missForest - yielding lower NRMSE for numerical variables and lower PFC entries for categorical variables. To further evaluate the practical impact of imputation quality on downstream predictive tasks, a decision tree classifier was trained on datasets processed by each method, with its performance assessed by the average AUC via 5-fold cross-validation. The model trained on HDI-MF-Gower-imputed data achieved the highest predictive AUC among all methods. These results confirm that the HDI-MF-Gower method not only more accurately imputes missing values, but also better preserves intrinsic data relationships, thereby substantially mitigating the negative effect of missing data on subsequent ML model performance.

Feature selection
Following data imputation with the HDI-MF-Gower method, key predictive variables were identified through a dual feature selection strategy. First, LASSO regression with 10-fold cross-validation was performed, yielding an optimal regularization parameter (λ_minimum) of 0.028 under the minimum criterion. This approach selected ten variables: Sex, complications, lymphovascular invasion, max tumor diameter, TNM stage, age, platelets, albumin, CEA, and intraoperative blood loss (Figure 2A and B).
Concurrently, the Boruta algorithm was run for 500 iterations to ensure stable feature importance evaluation, identifying nine features. Their importance ranking is visualized in Figure 2C. Seven features - age, CEA, albumin, TNM stage, max tumor diameter, lymphovascular invasion, and intraoperative blood loss - were confirmed as important, whereas Hemoglobin and red blood cell count remained tentative. The optimal feature subset was defined as the intersection of the features selected by both methods to ensure a concise set of variables with robust, consensus-based predictive power, thereby enhancing the model’s generalizability and clinical interpretability. Consequently, the final model incorporated seven features: CEA, albumin, TNM stage, age, intraoperative blood loss, lymphovascular invasion, and max tumor diameter, as summarized in Figure 2D.

Performance comparison of ML algorithms
The performance of ten predictive models, constructed using the selected key clinical features, was evaluated. The receiver operating characteristic curves (Figure 3A and B) showed that the extra trees model achieved the highest discriminative ability, AUC of 0.936 [95% confidence interval (CI): 0.904-0.963] on the training set and 0.853 (95%CI: 0.764-0.925), on the independent validation set. DeLong’s test revealed a statistically significant superiority in AUC for the extremely randomized trees (ET) model over KNN, support vector machine, and multi-layer perceptron models. In contrast, no significant difference in AUC was observed between ET and XGBoost, logistic regression, light gradient boosting machine, random forest, decision tree, or gradient boosting (P > 0.05). Considering both its superior AUC performance and the statistical test results, the ET model was selected as the final prediction model. Detailed pairwise comparison results are provided in Table 3. As summarized in Table 4, the ET model also exhibited the best overall performance, attaining the highest accuracy (0.772) and sensitivity (0.857).

Clinical significance
The calibration curves for the training and validation sets (Figure 3C and D) indicated strong predictive reliability for the extra trees model, which achieved Brier scores of 0.115 (95%CI: 0.096-0.135) and 0.162 (95%CI: 0.130-0.197), respectively, outperforming the other nine models. Decision curve analysis on the training set (Figure 3E) demonstrated that the ET model provided a substantially higher net benefit than the baseline strategy across threshold probabilities ranging from 0.1 to 0.9, and outperformed most other models over a wide threshold range. On the independent validation set (Figure 3F), the model maintained favorable clinical utility, exhibiting a high net benefit, particularly within the threshold probability range of 0.5 to 0.6. Furthermore, feature importance was analyzed using SHAP. The summary plot (Figure 4A) illustrated how each feature influenced the model’s output, revealing that decreased albumin levels, advanced TNM stage, the presence of lymphovascular invasion, and increased maximum tumor diameter, age, CEA, and intraoperative blood loss were all associated with an elevated risk of mortality. The ranking of feature importance based on mean absolute SHAP values (Figure 4B) confirmed TNM stage as the most influential predictor of postoperative death risk.
To elucidate the model’s decision-making for individual cases, a SHAP force plot was generated for a representative patient (Figure 4C), illustrating how each feature value contributed to the final prediction. This enhances model interpretability by quantifying and visualizing the driving factors behind each risk assessment. For practical deployment, we implemented a user-friendly web application using Streamlit. This online tool allows clinicians to input patient data directly into web form fields to obtain instant postoperative survival risk assessments.

DISCUSSION

DISCUSSION
ML applications in surgical research, while still evolving, show considerable potential for risk prediction[24]. A study by Lee et al[25] exemplifies this in the context of gastric cancer, where they demonstrated that ML models like random forest and XGBoost surpass traditional logistic regression in predicting postoperative complications. Their study not only confirms the predictive advantages of these algorithms but also provides a new perspective on preoperative assessment by identifying non-traditional risk factors, especially emphasizing the importance of hematological parameters. A key contribution to this work lies in the presentation of a novel HDI-MF-Gower imputation method that demonstrates excellent performance in processing missing data from clinical gastric cancer datasets. HDI-MF-Gower consistently achieves lower NRMSE and PFC values at different deletion rates compared to conventional methods. What’s more, the model trained with the data imputed by the method achieved higher categorical AUC values, highlighting the method’s enhanced ability in preserving variable intrinsic relationships, thereby minimizing information loss and statistical bias. One notable limitation of HDI-MF-Gower is its relatively high computational requirements, which can affect its efficiency when applied to very large datasets. By combining the Boruta algorithm and the two-feature selection strategy of LASSO regression, we identified a robust set of seven key prognostic factors: TNM stage, lymphovascular invasion, age, intraoperative bleeding, albumin level, CEA, and maximum tumor diameter. It is noteworthy that SHAP analysis not only quantified feature importance but also demonstrated that the direction of each feature’s influence aligns closely with established gastric cancer pathophysiological mechanisms. The primacy of TNM staging reflects its well-established role in anatomical disease assessment, where advanced stages directly indicate greater tumor burden, local invasion, and metastatic potential. The confirmed importance of lymphovascular invasion corresponds to its recognized association with metastatic dissemination, as this pathological feature provides direct evidence of tumor cells invading vascular structures and entering the circulation. The biological plausibility of other selected features further supports the model’s clinical relevance: Advanced age typically correlates with diminished physiological reserve and increased comorbidities, affecting tolerance to surgery and adjuvant therapies; substantial intraoperative blood loss may indicate technically challenging procedures or significant tumor adhesion, while also potentially triggering inflammatory responses and immunosuppression that impair recovery; hypoalbuminemia serves as a marker of both malnutrition and systemic inflammation, compromising immune function and tissue repair capacity; elevated CEA levels reflect increased tumor burden and biological aggressiveness; and larger tumor diameter directly indicates more advanced local disease progression. Thus, the high-weight predictors identified by our model provide a multidimensional explanation for their adverse prognostic impact, encompassing tumor biological behavior, host physiological status, and treatment-related factors. Among the ML algorithms evaluated, the extra trees model demonstrated the best overall performance, achieving an AUC value of 0.853 on the validation set with high accuracy, precision, recall, and F1-scores. The model’s ability to capture complex nonlinear relationships makes it superior to traditional statistical models. The ET model also showed satisfactory calibration, with a Brier score of less than 0.25 and a calibration curve indicating that the prediction probability was in good agreement with the observations. To bridge the gap between model complexity and clinical utility, we employ SHAP analysis to visualize feature contributions and develop an easy-to-use online prediction tool. This tool aims to facilitate rapid, individualized prognostic assessment, thereby helping clinicians plan early interventions and develop personalized treatment strategies.
This study has several limitations. First, the single-center retrospective design may introduce selection bias and limit the generalizability of the findings. Consequently, the model requires external validation in a multicenter, prospective group to confirm its robustness and clinical applicability. Second, although the selected feature set is clinically comprehensive, it lacks molecular biomarkers that could further enhance prognostic accuracy. Future research should therefore focus on multicenter validation and incorporate more biologically relevant variables.

CONCLUSION

CONCLUSION
In summary, this study developed and validated a ML-based model for predicting 3-year mortality risk after gastric cancer surgery. The model demonstrated high predictive accuracy and clinical interpretability, underpinned by several key innovations. The novel HDI-MF-Gower imputation method effectively handled missing clinical data, thereby enhancing model performance. A robust set of seven clinically significant prognostic factors was identified through a dual feature selection strategy. The extra trees classifier emerged as the optimal algorithm, and its integration with SHAP analysis and a web-based tool ensured both transparency and practical utility. Future multi-center, large-scale prospective studies are warranted to validate and extend the model’s applicability across diverse populations and timeframes, ultimately facilitating its integration into clinical practice for personalized postoperative management.

출처: PubMed Central (JATS). 라이선스는 원 publisher 정책을 따릅니다 — 인용 시 원문을 표기해 주세요.

🏷️ 같은 키워드 · 무료전문 — 이 논문 MeSH/keyword 기반

Advances in Targeted Therapy for Human Epidermal Growth Factor Receptor 2-Low Tumors: From Trastuzumab to Antibody-Drug Conjugates.
World journal of oncology 2026 Zheng ZN 외 📖 OA
Nanotechnology-Assisted Molecular Profiling: Emerging Advances in Circulating Tumor DNA Detection.
International journal of nanomedicine 2026 Kang J 외 📖 OA
Building Hybrid Pharmacometric-Machine Learning Models in Oncology Drug Development: Current State and Recommendations.
CPT: pharmacometrics & systems pharmacology 2026 Fochesato A 외 📖 OA
Machine learning integrating MRI and clinical features predicts early recurrence of hepatocellular carcinoma after resection.
Scientific reports 2026 Feng L 외 📖 unpaywall
Blocking SHP2 benefits FGFR2 inhibitor and overcomes its resistance in -amplified gastric cancer.
eLife 2026 Zhang Y 외 📖 unpaywall
Association of preoperative frailty and prognostic nutritional index with postoperative delirium in elderly gastric cancer patients: A single-center observational study.
Medicine 2026 Sun D 외 📖 unpaywall