Development and validation of an interpretable machine learning model for non-invasive screening of precancerous gastric lesions using symptom and lifestyle data: a multicentre cohort study.

Wang L; Tang K; Zhang P; Liu J; Wu B; Chen J; Li Y; Du S; Wang Y; Li S

doi:10.1016/j.eclinm.2026.103756

← 뒤로

Development and validation of an interpretable machine learning model for non-invasive screening of precancerous gastric lesions using symptom and lifestyle data: a multicentre cohort study.

코호트 1/5 보강

EClinicalMedicine 📖 저널 OA 100% 2022~2026 2026 Vol.92() p. 103756

PICO 자동 추출 (휴리스틱, conf 2/4)

유사 논문

P · Population 대상 환자/모집단

1034 participants recruited at two hospitals between Nov 16, 2022, and Apr 7, 2023.

I · Intervention 중재 / 시술

추출되지 않음

C · Comparison 대조 / 비교

추출되지 않음

O · Outcome 결과 / 결론

Innovation Team and Talents Cultivation Program of the National Administration of Traditional Chinese Medicine. Fundamental and Interdisciplinary Disciplines Breakthrough Plan of the Ministry of Education of China.

Wang L, Tang K, Zhang P, Liu J, Wu B, Chen J, Li Y, Du S, Wang Y, Li S

📖 무료 전문 🟢 PMC 전문 PMC12856190

PubMed ↗ DOI ↗ BibTeX ↓ RIS ↓

📝 환자 설명용 한 줄

[BACKGROUND] Precancerous gastric lesions (PLGC) are a critical stage in gastric cancer progression, where timely intervention can substantially reduce mortality.

🔬 핵심 임상 통계 (초록에서 자동 추출 — 원문 검증 권장)

표본수 (n) 620
p-value p < 0.001
95% CI 0.77-0.87

이 논문을 인용하기

↓ .bib ↓ .ris

APA Wang L, Tang K, et al. (2026). Development and validation of an interpretable machine learning model for non-invasive screening of precancerous gastric lesions using symptom and lifestyle data: a multicentre cohort study.. EClinicalMedicine, 92, 103756. https://doi.org/10.1016/j.eclinm.2026.103756

MLA Wang L, et al.. "Development and validation of an interpretable machine learning model for non-invasive screening of precancerous gastric lesions using symptom and lifestyle data: a multicentre cohort study.." EClinicalMedicine, vol. 92, 2026, pp. 103756.

PMID 41623856 ↗

DOI 10.1016/j.eclinm.2026.103756

Abstract

[BACKGROUND] Precancerous gastric lesions (PLGC) are a critical stage in gastric cancer progression, where timely intervention can substantially reduce mortality. However, current screening strategies are predominantly endoscopic, which are invasive, costly, and often inaccessible in resource-limited settings. We aimed to develop and validate an interpretable machine learning model for non-invasive PLGC screening using symptom and lifestyle data.

[METHODS] In this multicentre study, we enrolled eligible adult participants undergoing or scheduled to undergo upper gastrointestinal endoscopy with no prior diagnosis of malignancy. The development cohort comprised 1034 participants recruited at two hospitals between Nov 16, 2022, and Apr 7, 2023. Symptom and lifestyle data from this cohort were used to construct the development dataset, which was randomly split into a training set (n = 620), an internal validation set (n = 207), and a hold-out test set (n = 207). External performance was assessed in a retrospective hospital-based cohort from four additional hospitals (n = 630; May 21, 2018 to Jul 30, 2023) and a prospective community-based cohort from 32 screening sites (n = 847; June 21, 2023, to Nov 7, 2023). We developed a stacking ensemble model to predict the primary outcome (presence of PLGC) by integrating seven base learners (Gaussian Naïve Bayes, Logistic Regression, K-Nearest Neighbours, Gradient Boosting Classifier, eXtreme Gradient Boosting, Random Forest, Adaptive Boosting) and applied Shapley Additive Explanations (SHAP) for clinical interpretability. Model performance was compared with guideline-based screening strategies from the and the , using the area under the receiver operating characteristic curve (AUC; 95% CI), sensitivity, specificity, positive predictive value, and negative predictive value.

[FINDINGS] In total, 2511 participants (male: n = 871, 34.7%; female: n = 1640, 65.3%) were included. The primary outcome, PLGC, was present in 509 of 1034 participants (49.2%) in the development cohort, in 331 of 630 participants (52.5%) in the retrospective validation cohort, and in 312 of 847 participants (36.8%) in the prospective validation cohort. The model showed robust performance for non-invasive PLGC screening, with AUCs of 0.82 (95% CI: 0.77-0.87) in the internal hold-out test set, 0.80 (95% CI: 0.78-0.82) in the external retrospective validation set, and 0.79 (95% CI: 0.77-0.81) in the prospective validation set. With AUC improvements of 0.18-0.35, our model exceeded both guideline-based strategies across all datasets (internal hold-out test: 0.82 (95% CI: 0.77-0.87) vs. 0.47 (95% CI: 0.42-0.53)/0.48 (95% CI: 0.42-0.53); external retrospective validation: 0.80 (95% CI: 0.78-0.82) vs. 0.62 (95% CI: 0.60-0.64)/0.58 (95% CI: 0.55-0.60); prospective validation: 0.79 (95% CI: 0.77-0.81) vs. 0.57 (95% CI: 0.54-0.59)/0.52 (95% CI: 0.50-0.55); all p < 0.001). In a cost-effectiveness analysis, this translated into a 37.1% reduction in the average cost per detected PLGC case versus guideline-based tools. SHAP analysis further identified 15 key predictors, including infection, age, and melaena.

[INTERPRETATION] An interpretable machine learning model integrating symptom and lifestyle information, some of which were implicated by traditional medicine, achieved superior performance to guideline-based screening strategies for PLGC non-invasive screening in both hospital-based and community-based populations. However, the generalisability may be limited by the cohorts' age and regional distribution; further studies should incorporate more non-invasive metrics to optimise the screening model and pursue broader external validation and real-world implementation.

[FUNDING] National Natural Science Foundation of China. Innovation Team and Talents Cultivation Program of the National Administration of Traditional Chinese Medicine. Fundamental and Interdisciplinary Disciplines Breakthrough Plan of the Ministry of Education of China.

🏷️ 키워드 / MeSH 📖 같은 키워드 OA만

같은 제1저자의 인용 많은 논문 (5)

SLC2A1 tumour-associated macrophages spatially control CD8 T cell function and drive resistance to immunotherapy in non-small-cell lung cancer.
Nature cell biology 2026
Dual-stage pulmonary nodule detection in CT scans via cross-layer attention and adaptive multi-scale 3D CNN.
Digital health 2026
Self-assembly driven nano-salinomycin for high-efficiency cancer immunotherapy by reticulum stress mediated stemness suppression.
Biomaterials 2026
Predictive prioritization of enhancers associated with pancreatic disease risk.
Cell genomics 2026
DUSP22 dephosphorylates LGALS1 to enhance T cell-driven antitumor immunity.
Journal for immunotherapy of cancer 2026

🏷️ 같은 키워드 · 무료전문 — 이 논문 MeSH/keyword 기반

Nanotechnology-Assisted Molecular Profiling: Emerging Advances in Circulating Tumor DNA Detection.
International journal of nanomedicine 2026 Kang J 외 📖 OA
Building Hybrid Pharmacometric-Machine Learning Models in Oncology Drug Development: Current State and Recommendations.
CPT: pharmacometrics & systems pharmacology 2026 Fochesato A 외 📖 OA
Machine learning integrating MRI and clinical features predicts early recurrence of hepatocellular carcinoma after resection.
Scientific reports 2026 Feng L 외 📖 OA
Machine learning approaches to optimize the integration of sociodemographic factors for predicting cancer-specific survival among patients with high-risk prostate cancer.
Current urology 2026 Ajjawi I 외 📖 OA
Integrative Computational Approaches to Prostate Cancer with Conditional Reprogramming and AI-Driven Precision Medicine.
Cells 2026 Fadiel A 외 📖 OA
Dynamic changes in serum HER2-peptide-specific autoantibodies predict response to neoadjuvant therapy in HER2-positive breast cancer.
Breast cancer research : BCR 2026 Yoshida T 외 📖 OA

이 논문을 인용하기

Abstract 한글 요약

🏷️ 키워드 / MeSH 📖 같은 키워드 OA만

같은 제1저자의 인용 많은 논문 (5)

🏷️ 같은 키워드 · 무료전문 — 이 논문 MeSH/keyword 기반

Abstract