The Efficacy of Electronic Health Record-Based Artificial Intelligence Models for Early Detection of Pancreatic Cancer: A Systematic Review and Meta-Analysis.
메타분석
1/5 보강
[BACKGROUND] The persistently low 5-year survival rate for pancreatic cancer (PC) underscores the critical need for early detection.
- 95% CI 0.759-0.810
- 연구 설계 meta-analysis
APA
Makiev GG, Samoylenko IV, et al. (2026). The Efficacy of Electronic Health Record-Based Artificial Intelligence Models for Early Detection of Pancreatic Cancer: A Systematic Review and Meta-Analysis.. Cancers, 18(2). https://doi.org/10.3390/cancers18020315
MLA
Makiev GG, et al.. "The Efficacy of Electronic Health Record-Based Artificial Intelligence Models for Early Detection of Pancreatic Cancer: A Systematic Review and Meta-Analysis.." Cancers, vol. 18, no. 2, 2026.
PMID
41595234
Abstract
[BACKGROUND] The persistently low 5-year survival rate for pancreatic cancer (PC) underscores the critical need for early detection. However, population-wide screening remains impractical. Artificial Intelligence (AI) models using electronic health record (EHR) data offer a promising avenue for pre-symptomatic risk stratification.
[OBJECTIVE] To systematically review and meta-analyze the performance of AI models for PC prediction based exclusively on structured EHR data.
[METHODS] We systematically searched PubMed, MedRxiv, BioRxiv, and Google Scholar (2010-2025). Inclusion criteria encompassed studies using EHR-derived data (excluding imaging/genomics), applying AI for PC prediction, reporting AUC, and including a non-cancer cohort. Two reviewers independently extracted data. Random-effects meta-analysis was performed for AUC, sensitivity (Se), and specificity (Sp) using R software version 4.5.1. Heterogeneity was assessed using I statistics and publication bias was evaluated.
[RESULTS] Of 946 screened records, 19 studies met the inclusion criteria. The pooled AUC across all models was 0.785 (95% CI: 0.759-0.810), indicating good overall discriminatory ability. Neural Network (NN) models demonstrated a statistically significantly higher pooled AUC (0.826) compared to Logistic Regression (LogReg, 0.799), Random Forests (RF, 0.762), and XGBoost (XGB, 0.779) (all < 0.001). In analyses with sufficient data, models like Light Gradient Boosting (LGB) showed superior Se and Sp (99% and 98.7%, respectively) compared to NNs and LogReg, though based on limited studies. Meta-analysis of Se and Sp revealed extreme heterogeneity (I ≥ 99.9%), and the positive predictive values (PPVs) reported across studies were consistently low (often < 1%), reflecting the challenge of screening a low-prevalence disease.
[CONCLUSIONS] AI models using EHR data show significant promise for early PC detection, with NNs achieving the highest pooled AUC. However, high heterogeneity and typically low PPV highlight the need for standardized methodologies and a targeted risk-stratification approach rather than general population screening. Future prospective validation and integration into clinical decision-support systems are essential.
[OBJECTIVE] To systematically review and meta-analyze the performance of AI models for PC prediction based exclusively on structured EHR data.
[METHODS] We systematically searched PubMed, MedRxiv, BioRxiv, and Google Scholar (2010-2025). Inclusion criteria encompassed studies using EHR-derived data (excluding imaging/genomics), applying AI for PC prediction, reporting AUC, and including a non-cancer cohort. Two reviewers independently extracted data. Random-effects meta-analysis was performed for AUC, sensitivity (Se), and specificity (Sp) using R software version 4.5.1. Heterogeneity was assessed using I statistics and publication bias was evaluated.
[RESULTS] Of 946 screened records, 19 studies met the inclusion criteria. The pooled AUC across all models was 0.785 (95% CI: 0.759-0.810), indicating good overall discriminatory ability. Neural Network (NN) models demonstrated a statistically significantly higher pooled AUC (0.826) compared to Logistic Regression (LogReg, 0.799), Random Forests (RF, 0.762), and XGBoost (XGB, 0.779) (all < 0.001). In analyses with sufficient data, models like Light Gradient Boosting (LGB) showed superior Se and Sp (99% and 98.7%, respectively) compared to NNs and LogReg, though based on limited studies. Meta-analysis of Se and Sp revealed extreme heterogeneity (I ≥ 99.9%), and the positive predictive values (PPVs) reported across studies were consistently low (often < 1%), reflecting the challenge of screening a low-prevalence disease.
[CONCLUSIONS] AI models using EHR data show significant promise for early PC detection, with NNs achieving the highest pooled AUC. However, high heterogeneity and typically low PPV highlight the need for standardized methodologies and a targeted risk-stratification approach rather than general population screening. Future prospective validation and integration into clinical decision-support systems are essential.