Predicting clinical outcomes in Helicobacter pylori-positive patients using supervised learning through the integration of demographic and genomic features.
1/5 보강
[BACKGROUND] infection is widespread globally and is linked to outcomes ranging from chronic gastritis to gastric cancer.
- 95% CI 0.637–0.830
APA
Narasimhan V, Pulakkat Warrier S, et al. (2026). Predicting clinical outcomes in Helicobacter pylori-positive patients using supervised learning through the integration of demographic and genomic features.. BMC gastroenterology, 26(1), 143. https://doi.org/10.1186/s12876-025-04595-3
MLA
Narasimhan V, et al.. "Predicting clinical outcomes in Helicobacter pylori-positive patients using supervised learning through the integration of demographic and genomic features.." BMC gastroenterology, vol. 26, no. 1, 2026, pp. 143.
PMID
41606475 ↗
Abstract 한글 요약
[BACKGROUND] infection is widespread globally and is linked to outcomes ranging from chronic gastritis to gastric cancer. However, only a minority of infected individuals progress to malignancy, influenced by a mix of bacterial, host, and environmental factors. Current predictive approaches are limited due to relying mainly on clinical and lifestyle data. Genomic approaches have been sparsely used, and thus their incorporation into machine learning models could ensure early and personalized detection. This study aimed to evaluate the impact of integrating host metadata with genomic features from to predict gastric cancer outcomes and identify associated variables.
[METHODS] One thousand three hundred sixty-three publicly available genomes with associated host information between 1991 and 2024 were collected from NCBI and EnteroBase. Demographic features, virulence genes, sequence-derived and variant-based features were extracted. Machine learning models were then developed to classify infection outcomes into gastric cancer and non-gastric cancer and trained using internal cross-validation folds within the training set comprising 80% of the dataset. Logistic regression, an interpretable baseline model, was compared against higher-performance ensemble models (XGBoost, Random Forest). Final model performance was assessed on the held-out test set using recall, precision, AUROC, and AUPRC curves.
[RESULTS] The logistic regression model achieved a recall of 0.737 (95% CI: 0.637–0.830) for gastric cancer and an AUROC of 0.830 (95% CI: 0.779–0.880). Both XGBoost and Random Forest models outperformed the baseline model with AUROC values ranging from 0.950 to 0.954 (95% CI: 0.904–0.976). Black-box model recall for gastric cancer detection improved compared to the baseline by 8.14% for XGBoost (0.797, 95% CI: 0.711–0.877), and 11.3% for Random Forest (0.820, 95% CI: 0.734–0.896). Across models, patient age consistently emerged as the strongest predictor of gastric cancer, with several sequence-derived genomic features beyond pre-established virulence genes contributing to the infection outcome differences.
[CONCLUSION] This study demonstrates that combining pathogen genomics with host demographics uncovers novel risk factors and ensures early detection with high predictive power. The use of explainability methods like SHAP allows for greater interpretability by clinical professionals and improves informed decision-making processes. While internal validation showed strong performance, external validation on independent data and translation into clinical practice is necessary using broader, diverse datasets, along with the inclusion of additional host and lifestyle variables.
[SUPPLEMENTARY INFORMATION] The online version contains supplementary material available at 10.1186/s12876-025-04595-3.
[METHODS] One thousand three hundred sixty-three publicly available genomes with associated host information between 1991 and 2024 were collected from NCBI and EnteroBase. Demographic features, virulence genes, sequence-derived and variant-based features were extracted. Machine learning models were then developed to classify infection outcomes into gastric cancer and non-gastric cancer and trained using internal cross-validation folds within the training set comprising 80% of the dataset. Logistic regression, an interpretable baseline model, was compared against higher-performance ensemble models (XGBoost, Random Forest). Final model performance was assessed on the held-out test set using recall, precision, AUROC, and AUPRC curves.
[RESULTS] The logistic regression model achieved a recall of 0.737 (95% CI: 0.637–0.830) for gastric cancer and an AUROC of 0.830 (95% CI: 0.779–0.880). Both XGBoost and Random Forest models outperformed the baseline model with AUROC values ranging from 0.950 to 0.954 (95% CI: 0.904–0.976). Black-box model recall for gastric cancer detection improved compared to the baseline by 8.14% for XGBoost (0.797, 95% CI: 0.711–0.877), and 11.3% for Random Forest (0.820, 95% CI: 0.734–0.896). Across models, patient age consistently emerged as the strongest predictor of gastric cancer, with several sequence-derived genomic features beyond pre-established virulence genes contributing to the infection outcome differences.
[CONCLUSION] This study demonstrates that combining pathogen genomics with host demographics uncovers novel risk factors and ensures early detection with high predictive power. The use of explainability methods like SHAP allows for greater interpretability by clinical professionals and improves informed decision-making processes. While internal validation showed strong performance, external validation on independent data and translation into clinical practice is necessary using broader, diverse datasets, along with the inclusion of additional host and lifestyle variables.
[SUPPLEMENTARY INFORMATION] The online version contains supplementary material available at 10.1186/s12876-025-04595-3.
🏷️ 키워드 / MeSH 📖 같은 키워드 OA만
🏷️ 같은 키워드 · 무료전문 — 이 논문 MeSH/keyword 기반
- Advances in Targeted Therapy for Human Epidermal Growth Factor Receptor 2-Low Tumors: From Trastuzumab to Antibody-Drug Conjugates.
- Blocking SHP2 benefits FGFR2 inhibitor and overcomes its resistance in -amplified gastric cancer.
- Association of preoperative frailty and prognostic nutritional index with postoperative delirium in elderly gastric cancer patients: A single-center observational study.
- Complete response to Nivolumab-based chemotherapy in a case of advanced gastric cancer with multiple immune-related adverse events.
- Apatinib and silver nanoparticles synergize against gastric cancer through the PI3K/Akt signaling pathway-mediated ferroptosis.
- Correction: Survival disparities and predictors in gastric cancer: a population-based study from Kazakhstan (2012-2023).