Machine learning predicts hepatocellular carcinoma risk from routine clinical data: a large population-based multicentric study.
Hepatocellular carcinoma (HCC) is a highly fatal tumor, for which risk stratification is crucial, yet remains challenging.
APA
Clusmann J, Koop PH, et al. (2026). Machine learning predicts hepatocellular carcinoma risk from routine clinical data: a large population-based multicentric study.. Cancer discovery. https://doi.org/10.1158/2159-8290.CD-25-1323
MLA
Clusmann J, et al.. "Machine learning predicts hepatocellular carcinoma risk from routine clinical data: a large population-based multicentric study.." Cancer discovery, 2026.
PMID
41881847
Abstract
Hepatocellular carcinoma (HCC) is a highly fatal tumor, for which risk stratification is crucial, yet remains challenging. Here, we develop an interpretable machine-learning framework for HCC risk stratification based on routinely collected clinical data. We utilize prospectively collected multimodal data from over 900,000 individuals and 983 cases of HCC across two population-scale cohorts: the "UK Biobank study" (development) and the "All of Us Research Program" (external testing). We assess individual and cumulative contributions of data modalities including demographics, lifestyle, health records, blood, genomics, and metabolomics. Our final, random-forest-based models significantly outperform all publicly available state-of-the-art risk-scores on both internal and external test sets. We demonstrate robustness across ethnic subgroups, provide comprehensive interpretability and release all code, model weights and a web-calculator for external validation and agentic integration. Our study presents PRE-Screen-HCC, a robust and interpretable machine-learning framework for HCC risk stratification and early detection.