Abstract
Hepatocellular carcinoma (HCC) is a highly fatal tumor, for which risk stratification is crucial, yet remains challenging. Here, we develop an interpretable machine-learning framework for HCC risk stratification based on routinely collected clinical data. We utilize prospectively collected multimodal data from over 900,000 individuals and 983 cases of HCC across two population-scale cohorts: the "UK Biobank study" (development) and the "All of Us Research Program" (external testing). We assess individual and cumulative contributions of data modalities including demographics, lifestyle, health records, blood, genomics, and metabolomics. Our final, random-forest-based models significantly outperform all publicly available state-of-the-art risk-scores on both internal and external test sets. We demonstrate robustness across ethnic subgroups, provide comprehensive interpretability and release all code, model weights and a web-calculator for external validation and agentic integration. Our study presents PRE-Screen-HCC, a robust and interpretable machine-learning framework for HCC risk stratification and early detection.</p>