Machine learning-driven risk stratification for distant metastasis in gastric cancer: A comparative study of clinical features and composite indices integrated models.
[OBJECTIVE] Distant metastasis (DM) of gastric cancer (GC) represents a significant health challenge due to its high mortality rates, necessitating advancements in early detection and management strat
- 95% CI 0.922-0.962
APA
Yang S, Lei H (2025). Machine learning-driven risk stratification for distant metastasis in gastric cancer: A comparative study of clinical features and composite indices integrated models.. PloS one, 20(10), e0335258. https://doi.org/10.1371/journal.pone.0335258
MLA
Yang S, et al.. "Machine learning-driven risk stratification for distant metastasis in gastric cancer: A comparative study of clinical features and composite indices integrated models.." PloS one, vol. 20, no. 10, 2025, pp. e0335258.
PMID
41166244
Abstract
[OBJECTIVE] Distant metastasis (DM) of gastric cancer (GC) represents a significant health challenge due to its high mortality rates, necessitating advancements in early detection and management strategies. The objective of this study was to create a machine learning (ML) model that is interpretable for preoperative prediction of DM in GC.
[METHODS] We retrospectively analyzed 1,009 GC patients, of which 769 were from Zhejiang Cancer Hospital as development cohort and 240 from Zhejiang Provincial Hospital of Chinese Medicine as external test cohort. Nine clinical features, and four composite indices derived from ten laboratory indicators were selected as candidate features. The dataset was balanced using the borderline Synthetic Minority Over-sampling Technique (SMOTE) and the Edited Nearest Neighbors (ENN) under-sampling method. Univariate and multivariate analyses were used to identified key metastasis-related features. Based on the identified features, we developed predictive models incorporating five ML algorithms, with performance evaluated via receive operating characteristic (ROC) curves, recall, precision-recall (PR) curves. Ultimately, Shapley additive explanations (SHAP) analysis were applied to rank the feature importance and explain the final model.
[RESULTS] Univariate and multivariate analyses identified five metastasis-related features: cT stage, cN stage, differentiation grade, PLR and TMI. Logistic Regression emerged as the optimal predictive model with the highest area under the curve (AUC) of 0.942 (95% CI: 0.922-0.962), Recall of 0.895 (95% CI: 0.843-0.947), and AUPRC of 0.889 (95% CI: 0.867-0.911) among five models. Additionally, the internal and external test cohorts recorded AUC values of 0.935 (95% CI: 0.897-0.972) and 0.879 (95% CI: 0.833-0.926), respectively. The SHAP analysis revealed the features that played a significant role in the predictions made by the model.
[CONCLUSION] This ML model integrates clinical features and composite indices to predict GC metastasis risk, supported by an online tool to guide preoperative decision-making.
[METHODS] We retrospectively analyzed 1,009 GC patients, of which 769 were from Zhejiang Cancer Hospital as development cohort and 240 from Zhejiang Provincial Hospital of Chinese Medicine as external test cohort. Nine clinical features, and four composite indices derived from ten laboratory indicators were selected as candidate features. The dataset was balanced using the borderline Synthetic Minority Over-sampling Technique (SMOTE) and the Edited Nearest Neighbors (ENN) under-sampling method. Univariate and multivariate analyses were used to identified key metastasis-related features. Based on the identified features, we developed predictive models incorporating five ML algorithms, with performance evaluated via receive operating characteristic (ROC) curves, recall, precision-recall (PR) curves. Ultimately, Shapley additive explanations (SHAP) analysis were applied to rank the feature importance and explain the final model.
[RESULTS] Univariate and multivariate analyses identified five metastasis-related features: cT stage, cN stage, differentiation grade, PLR and TMI. Logistic Regression emerged as the optimal predictive model with the highest area under the curve (AUC) of 0.942 (95% CI: 0.922-0.962), Recall of 0.895 (95% CI: 0.843-0.947), and AUPRC of 0.889 (95% CI: 0.867-0.911) among five models. Additionally, the internal and external test cohorts recorded AUC values of 0.935 (95% CI: 0.897-0.972) and 0.879 (95% CI: 0.833-0.926), respectively. The SHAP analysis revealed the features that played a significant role in the predictions made by the model.
[CONCLUSION] This ML model integrates clinical features and composite indices to predict GC metastasis risk, supported by an online tool to guide preoperative decision-making.
MeSH Terms
Humans; Stomach Neoplasms; Machine Learning; Male; Female; Middle Aged; Retrospective Studies; Aged; Risk Assessment; Neoplasm Metastasis; ROC Curve; Adult
같은 제1저자의 인용 많은 논문 (5)
- Splicing factor SF3B4 promotes mitochondrial glutamine metabolism in hepatocellular carcinoma by regulating GLS1 isoform switching.
- The quality and reliability of short videos on acute myeloid leukemia on Bilibili and TikTok: A cross-sectional study.
- DNAJA1 as a modulator of CD8 T-cell function and prognosis in lung cancer: implications for immune regulation and therapeutic targeting.
- Application and Progress of Functional Lung Avoidance Radiotherapy for Lung Cancer.
- Surface expression of antitoxin on engineered bacteria neutralizes genotoxic colibactin in the gut.