Artificial intelligence predicts outcome-related molecular profiles and vascular invasion in hepatocellular carcinoma.
1/5 보강
[BACKGROUND & AIMS] Advances in digital pathology and artificial intelligence (AI) are driving progress toward personalized clinical management.
- 표본수 (n) 363
APA
Seraphin TP, Mesropian A, et al. (2025). Artificial intelligence predicts outcome-related molecular profiles and vascular invasion in hepatocellular carcinoma.. JHEP reports : innovation in hepatology, 7(12), 101592. https://doi.org/10.1016/j.jhepr.2025.101592
MLA
Seraphin TP, et al.. "Artificial intelligence predicts outcome-related molecular profiles and vascular invasion in hepatocellular carcinoma.." JHEP reports : innovation in hepatology, vol. 7, no. 12, 2025, pp. 101592.
PMID
41321933 ↗
Abstract 한글 요약
[BACKGROUND & AIMS] Advances in digital pathology and artificial intelligence (AI) are driving progress toward personalized clinical management. In hepatocellular carcinoma (HCC), AI-based models using digitized H&E slides can be a robust tool to predict outcome-related molecular profiles and presence of microvascular invasion (mVI), with potential clinical utility.
[METHODS] A transformer-based deep-learning (DL) model was deployed using digitized H&E slides from 431 resected HCC cases (training cohort). Five-fold cross-validation was applied, and the model was tested on two external cohorts: TCGA-LIHC (n = 363) and advanced-stage HCC cohort (n = 64).
[RESULTS] The DL model effectively predicted outcome-related molecular profiles, distinguishing poor-prognosis (/ proliferation) from good-prognosis ( non-proliferation) subclasses. In internal cross-validation, mean areas under the curves (AUCs) were 0.75 for proliferation and 0.79 for non-proliferation subclasses This performance was reproduced in the TCGA test set, with AUCs ranging from 0.72-0.80, and in the advanced-stage HCC cohort, with AUCs ranging from 0.76-0.81. In these test sets, the AI-predicted non-proliferation subclass was associated with a longer median OS compared with the proliferation subclass (5.8 3.5 years in TCGA; = 0.02). For mVI prediction, the DL model achieved a mean AUC of 0.70 in the internal cross-validation and 0.62 in the TCGA. AI-predicted mVI was associated with shorter OS (4.9 7.6 years for non-mVI; 0.003) and an immunosuppressive microenvironment ( 0.002).
[CONCLUSIONS] Our H&E-based AI model enables accurate prediction of outcome-related molecular subtypes of poor prognosis and presence of mVI, offering a scalable and accessible tool to extract clinically relevant features from routine histology.
[IMPACT AND IMPLICATIONS] Outcome-related molecular profiles and the presence of microvascular invasion (mVI) are critical determinants of prognosis and treatment decisions in hepatocellular carcinoma (HCC). This study presents an artificial intelligence (AI)-based method that analyzes routine H&E-stained slides and accurately predicts: (a) biologically relevant HCC molecular subtypes associated with patient outcomes, and (b) the presence of mVI, a well-established predictor of poor outcomes and risk of recurrence, that currently requires meticulous pathological assessment of multiple H&E slides. These AI tools can offer a scalable method to support personalized treatment decisions, such as transplant eligibility, trial enrollment, or neo/adjuvant therapy planning, and may improve clinical management of HCC. Our findings lay the groundwork for incorporating AI-assisted pathology into future prospective studies aimed at improving HCC clinical management.
[METHODS] A transformer-based deep-learning (DL) model was deployed using digitized H&E slides from 431 resected HCC cases (training cohort). Five-fold cross-validation was applied, and the model was tested on two external cohorts: TCGA-LIHC (n = 363) and advanced-stage HCC cohort (n = 64).
[RESULTS] The DL model effectively predicted outcome-related molecular profiles, distinguishing poor-prognosis (/ proliferation) from good-prognosis ( non-proliferation) subclasses. In internal cross-validation, mean areas under the curves (AUCs) were 0.75 for proliferation and 0.79 for non-proliferation subclasses This performance was reproduced in the TCGA test set, with AUCs ranging from 0.72-0.80, and in the advanced-stage HCC cohort, with AUCs ranging from 0.76-0.81. In these test sets, the AI-predicted non-proliferation subclass was associated with a longer median OS compared with the proliferation subclass (5.8 3.5 years in TCGA; = 0.02). For mVI prediction, the DL model achieved a mean AUC of 0.70 in the internal cross-validation and 0.62 in the TCGA. AI-predicted mVI was associated with shorter OS (4.9 7.6 years for non-mVI; 0.003) and an immunosuppressive microenvironment ( 0.002).
[CONCLUSIONS] Our H&E-based AI model enables accurate prediction of outcome-related molecular subtypes of poor prognosis and presence of mVI, offering a scalable and accessible tool to extract clinically relevant features from routine histology.
[IMPACT AND IMPLICATIONS] Outcome-related molecular profiles and the presence of microvascular invasion (mVI) are critical determinants of prognosis and treatment decisions in hepatocellular carcinoma (HCC). This study presents an artificial intelligence (AI)-based method that analyzes routine H&E-stained slides and accurately predicts: (a) biologically relevant HCC molecular subtypes associated with patient outcomes, and (b) the presence of mVI, a well-established predictor of poor outcomes and risk of recurrence, that currently requires meticulous pathological assessment of multiple H&E slides. These AI tools can offer a scalable method to support personalized treatment decisions, such as transplant eligibility, trial enrollment, or neo/adjuvant therapy planning, and may improve clinical management of HCC. Our findings lay the groundwork for incorporating AI-assisted pathology into future prospective studies aimed at improving HCC clinical management.
🏷️ 키워드 / MeSH 📖 같은 키워드 OA만
📖 전문 본문 읽기 PMC JATS · ~64 KB · 영문
Introduction
Introduction
Hepatocellular carcinoma (HCC) is the most prevalent primary liver cancer, with a global five-year overall survival (OS) rate of approximately 30%, making it a highly lethal disease.1 Early detection significantly enhances survival rates by enabling potentially curative interventions, including surgical resection, liver transplantation and radiofrequency ablation. However, only about 20% of tumors are resectable at diagnosis, and among these, nearly 50% experience recurrence within 3 years.2
Understanding the molecular landscape of HCC is key to stratifying patients based on tumor biology. Although tumors are highly heterogeneous, molecular studies have identified conserved trunk mutations3 and molecular and inflamed classes, based on transcriptomic and histological analysis.[4], [5], [6], [7], [8], [9], [10] These molecular classifications integrate multiple dimensions of tumor aggressiveness, including genomic alterations, histological features, immune context, and vascular invasion (VI), providing a biologically grounded framework for risk stratification.[11], [12], [13] Specifically, the proliferative molecular classes, namely the S1-stromal and S2-stemness subclasses, represent ∼50% of the cases, and are associated with high alpha-fetoprotein (AFP) levels, VI, and poor prognosis, whereas the S3-differentiated subclass is associated with better outcome.11,12 Despite their prognostic potential, the integration of transcriptomic-based molecular classes into routine practice remains limited due to their reliance on high-quality tissue and sequencing infrastructure. Scalable tools capable of inferring these molecular traits from routinely available data could help translate these insights into clinical decision-making.
In the clinical context, several variables predict recurrence and survival after resection (tumor size, number of nodules, AFP levels, poor differentiation degree), the most prominent being the presence of microvascular invasion (mVI).1,14,15 The presence of mVI promotes tumor cell migration, metastasis, satellitosis, and an immunosuppressive tumor microenvironment enriched in exhausted CD8+ T cells and immunosuppressive TREM2+ macrophages.15,16 While gene expression signatures have demonstrated potential for predicting mVI, fully leveraging their prognostic value currently relies on exhaustive pathological evaluation of non-tumoral adjacent tissues.5,14,17
Artificial intelligence (AI), particularly deep learning (DL) algorithms, are powerful tools in digital histopathology.18 Whole slide images (WSIs) of pathology glass slides contain extensive data beyond what is used in clinical routine, enabling, for example, the prediction of outcome directly from images. This could not only potentially reduce costly genetic tests or specialized tissue stainings but also guide clinical decisions and patient stratification. Recent advances in self-supervised learning methods coupled with increased computational capacities have led to the development of methods to effectively predict molecular changes in histological slides.[18], [19], [20] Specifically, models that have been trained on thousands of WSIs and millions of image patches are capable of exploiting features detected in histological slides and relating them to clinically relevant biomarkers.[21], [22], [23], [24]
Our aim was to develop DL models to predict poor outcome-associated features, by capturing molecular subtypes, as well as to directly predict mVI by using one single routine hematoxylin and eosin (H&E)-stained pathology slide. By targeting clinically relevant biological traits associated to overall survival, we aimed to develop a scalable, interpretable, and widely applicable tool for histology-based patient risk stratification.
Hepatocellular carcinoma (HCC) is the most prevalent primary liver cancer, with a global five-year overall survival (OS) rate of approximately 30%, making it a highly lethal disease.1 Early detection significantly enhances survival rates by enabling potentially curative interventions, including surgical resection, liver transplantation and radiofrequency ablation. However, only about 20% of tumors are resectable at diagnosis, and among these, nearly 50% experience recurrence within 3 years.2
Understanding the molecular landscape of HCC is key to stratifying patients based on tumor biology. Although tumors are highly heterogeneous, molecular studies have identified conserved trunk mutations3 and molecular and inflamed classes, based on transcriptomic and histological analysis.[4], [5], [6], [7], [8], [9], [10] These molecular classifications integrate multiple dimensions of tumor aggressiveness, including genomic alterations, histological features, immune context, and vascular invasion (VI), providing a biologically grounded framework for risk stratification.[11], [12], [13] Specifically, the proliferative molecular classes, namely the S1-stromal and S2-stemness subclasses, represent ∼50% of the cases, and are associated with high alpha-fetoprotein (AFP) levels, VI, and poor prognosis, whereas the S3-differentiated subclass is associated with better outcome.11,12 Despite their prognostic potential, the integration of transcriptomic-based molecular classes into routine practice remains limited due to their reliance on high-quality tissue and sequencing infrastructure. Scalable tools capable of inferring these molecular traits from routinely available data could help translate these insights into clinical decision-making.
In the clinical context, several variables predict recurrence and survival after resection (tumor size, number of nodules, AFP levels, poor differentiation degree), the most prominent being the presence of microvascular invasion (mVI).1,14,15 The presence of mVI promotes tumor cell migration, metastasis, satellitosis, and an immunosuppressive tumor microenvironment enriched in exhausted CD8+ T cells and immunosuppressive TREM2+ macrophages.15,16 While gene expression signatures have demonstrated potential for predicting mVI, fully leveraging their prognostic value currently relies on exhaustive pathological evaluation of non-tumoral adjacent tissues.5,14,17
Artificial intelligence (AI), particularly deep learning (DL) algorithms, are powerful tools in digital histopathology.18 Whole slide images (WSIs) of pathology glass slides contain extensive data beyond what is used in clinical routine, enabling, for example, the prediction of outcome directly from images. This could not only potentially reduce costly genetic tests or specialized tissue stainings but also guide clinical decisions and patient stratification. Recent advances in self-supervised learning methods coupled with increased computational capacities have led to the development of methods to effectively predict molecular changes in histological slides.[18], [19], [20] Specifically, models that have been trained on thousands of WSIs and millions of image patches are capable of exploiting features detected in histological slides and relating them to clinically relevant biomarkers.[21], [22], [23], [24]
Our aim was to develop DL models to predict poor outcome-associated features, by capturing molecular subtypes, as well as to directly predict mVI by using one single routine hematoxylin and eosin (H&E)-stained pathology slide. By targeting clinically relevant biological traits associated to overall survival, we aimed to develop a scalable, interpretable, and widely applicable tool for histology-based patient risk stratification.
Patients and methods
Patients and methods
Study design and patient cohorts
Patient cohorts
We used as a training dataset 431 HCC samples collected from patients who had undergone curative liver resection, and been profiled at the transcriptomic level.9,25,26 The Cancer Genome Atlas Liver Hepatocellular Carcinoma (TCGA-LIHC) dataset (n = 363) was used as a test set for patients with resected HCC. H&E slides without available microns per pixel values were discarded. As a second test set, we used H&E scans from 64 tumors from patients with advanced HCC (mostly BCLC stage C), including archived resection specimens and biopsies prior to the initiation of systemic therapies, for which the transcriptome was also available.27
Molecular subclass identification
Gene expression profiles were classified into the proliferation subclasses S1 and S2, and the S3 non-proliferation subclass12 using the nearest template prediction method. Cases not reaching an adjusted false discovery rate q-value <0.05 were considered unclassified and excluded from model training. A breakdown of the number of patients included in the study is explained in the supplementary methods and Tables S1 and S2.
Histological assessment of mVI
The presence of mVI was defined as the presence of tumor cells within a vascular space lined by endothelium14 and the pathological evaluation per patient was collected from their clinical databases. For in-house samples, expert liver pathologists evaluated ∼10 adjacent non-tumoral parenchyma sections from the tumor-liver interface. Sampling included areas at least 1 cm from the tumor margin to improve detection sensitivity, in line with established protocols.14 In the TCGA-LIHC test set, mVI labels were obtained from public clinicopathological annotations.
Study design
The workflow of the study and the samples utilized for training, validation and testing across each cohort and each DL model are detailed in Fig. 1 and Tables S1 and S2. The inclusion criteria for all sample analyses were: i) histological diagnosis of HCC confirmed by a liver pathologist, ii) available clinicopathological data, including mVI evaluation, and iii) available H&E slides for high-resolution digitization. Cases with missing outcome, transcriptomic or mVI data were excluded from the analysis. For variables with missing values, the proportion of missing data is reported. Further details on sample processing, mVI histological assessment and transcriptomic analysis are described in the supplementary methods.
DL pipeline and experimental setup
We used the “solid tumor associative modeling in pathology” (STAMP) pipeline (https://github.com/KatherLab/STAMP.v1.0.3), a protocol that enables prediction of biomarkers directly from WSIs.23 Following this protocol, WSIs were tiled and processed for subsequent feature extraction using the pretrained transformer-based foundation model UNI.21 For DL model training, we started with a 5-fold cross-validation of the training cohort, where data were split on a patient level into 5 different folds, stratified by the target variable.22,23 The resulting HCC molecular subclass-based outcome prediction model trained on the full training set was deployed on the TCGA-LIHC test set. While training was based on resected HCC specimens (n = 431), we included an external cohort of advanced-stage tumors (n = 64), including biopsies, to evaluate the potential generalizability of the model to preoperative or non-curative settings. The mVI-predicting model was deployed on the TCGA-LIHC test set only. Model performance was evaluated using areas under the curves (AUCs) with 95% CIs. For technical details on the DL pipeline employed for model development and experimental setup, please see the supplementary methods.
Explainability of trained DL models
To gain deeper insights into our model’s decision-making, we generated WSI-level heatmaps using STAMP. Briefly, tile-level prediction scores were mapped onto the WSI, and regions were visually emphasized according to their GradCAM importance scores.28 This approach highlights the spatial distribution of areas most relevant to the model’s prediction. For each prediction label for the molecular subclasses and mVI, we selected six patients whose WSIs were correctly classified with the highest confidence scores and extracted the most predictive (high-attention) tiles. These top tiles were then reviewed by an expert pathologist (CM), who was blinded to the final prediction and ground truth labels. Histological patterns such as tumor viability, density, differentiation, immune infiltrate and vascular features were assessed to identify consistent or differing patterns across prediction groups, providing biological explainability to the model’s outputs. Additionally, the pathologist reviewed tiles from areas with minimal GradCAM activation (low-attention areas) to confirm their negligible contribution to the model’s decision-making. These regions were histologically non-informative (e.g. empty slide areas, necrotic tissue, non-specific stroma) and contributed minimally to the model’s predictions. Confidence percentages reflect the model’s overall score in assigning a class, based on features extracted primarily from high-attention areas. For details on statistical analyses, visualization, transparency measures and the ethical statement, please see the supplementary methods.
Study design and patient cohorts
Patient cohorts
We used as a training dataset 431 HCC samples collected from patients who had undergone curative liver resection, and been profiled at the transcriptomic level.9,25,26 The Cancer Genome Atlas Liver Hepatocellular Carcinoma (TCGA-LIHC) dataset (n = 363) was used as a test set for patients with resected HCC. H&E slides without available microns per pixel values were discarded. As a second test set, we used H&E scans from 64 tumors from patients with advanced HCC (mostly BCLC stage C), including archived resection specimens and biopsies prior to the initiation of systemic therapies, for which the transcriptome was also available.27
Molecular subclass identification
Gene expression profiles were classified into the proliferation subclasses S1 and S2, and the S3 non-proliferation subclass12 using the nearest template prediction method. Cases not reaching an adjusted false discovery rate q-value <0.05 were considered unclassified and excluded from model training. A breakdown of the number of patients included in the study is explained in the supplementary methods and Tables S1 and S2.
Histological assessment of mVI
The presence of mVI was defined as the presence of tumor cells within a vascular space lined by endothelium14 and the pathological evaluation per patient was collected from their clinical databases. For in-house samples, expert liver pathologists evaluated ∼10 adjacent non-tumoral parenchyma sections from the tumor-liver interface. Sampling included areas at least 1 cm from the tumor margin to improve detection sensitivity, in line with established protocols.14 In the TCGA-LIHC test set, mVI labels were obtained from public clinicopathological annotations.
Study design
The workflow of the study and the samples utilized for training, validation and testing across each cohort and each DL model are detailed in Fig. 1 and Tables S1 and S2. The inclusion criteria for all sample analyses were: i) histological diagnosis of HCC confirmed by a liver pathologist, ii) available clinicopathological data, including mVI evaluation, and iii) available H&E slides for high-resolution digitization. Cases with missing outcome, transcriptomic or mVI data were excluded from the analysis. For variables with missing values, the proportion of missing data is reported. Further details on sample processing, mVI histological assessment and transcriptomic analysis are described in the supplementary methods.
DL pipeline and experimental setup
We used the “solid tumor associative modeling in pathology” (STAMP) pipeline (https://github.com/KatherLab/STAMP.v1.0.3), a protocol that enables prediction of biomarkers directly from WSIs.23 Following this protocol, WSIs were tiled and processed for subsequent feature extraction using the pretrained transformer-based foundation model UNI.21 For DL model training, we started with a 5-fold cross-validation of the training cohort, where data were split on a patient level into 5 different folds, stratified by the target variable.22,23 The resulting HCC molecular subclass-based outcome prediction model trained on the full training set was deployed on the TCGA-LIHC test set. While training was based on resected HCC specimens (n = 431), we included an external cohort of advanced-stage tumors (n = 64), including biopsies, to evaluate the potential generalizability of the model to preoperative or non-curative settings. The mVI-predicting model was deployed on the TCGA-LIHC test set only. Model performance was evaluated using areas under the curves (AUCs) with 95% CIs. For technical details on the DL pipeline employed for model development and experimental setup, please see the supplementary methods.
Explainability of trained DL models
To gain deeper insights into our model’s decision-making, we generated WSI-level heatmaps using STAMP. Briefly, tile-level prediction scores were mapped onto the WSI, and regions were visually emphasized according to their GradCAM importance scores.28 This approach highlights the spatial distribution of areas most relevant to the model’s prediction. For each prediction label for the molecular subclasses and mVI, we selected six patients whose WSIs were correctly classified with the highest confidence scores and extracted the most predictive (high-attention) tiles. These top tiles were then reviewed by an expert pathologist (CM), who was blinded to the final prediction and ground truth labels. Histological patterns such as tumor viability, density, differentiation, immune infiltrate and vascular features were assessed to identify consistent or differing patterns across prediction groups, providing biological explainability to the model’s outputs. Additionally, the pathologist reviewed tiles from areas with minimal GradCAM activation (low-attention areas) to confirm their negligible contribution to the model’s decision-making. These regions were histologically non-informative (e.g. empty slide areas, necrotic tissue, non-specific stroma) and contributed minimally to the model’s predictions. Confidence percentages reflect the model’s overall score in assigning a class, based on features extracted primarily from high-attention areas. For details on statistical analyses, visualization, transparency measures and the ethical statement, please see the supplementary methods.
Results
Results
Direct prediction of prognosis-related molecular profiles from H&E slides using AI based models
To predict HCC outcome-related molecular subclasses from histological data, H&E slides containing tumor resections were scanned at high resolution to generate WSIs, which were used to train a transformer-based DL model. To this end, we used a multicenter training cohort of 431 patients with HCC and available gene expression profiling and H&E scans (Table S1), with the main clinical characteristics described in Table 1. Briefly, patients were predominantly male (n = 328; 78%), with a median age of 66 years (IQR 61–72). The most common etiology was HCV infection (n = 181; 42%), followed by non-viral etiologies, including alcohol-associated liver disease (ALD; n = 41; 10%) and metabolic dysfunction-associated steatotic liver disease (MASLD; n = 43; 10%). Most patients had compensated liver function (Child-Pugh A: n = 373; 87%) and early-stage disease (BCLC stage 0–A: n = 358; 83%). Median OS was 6.4 years (95% CI 5.6–7.4 years) (Fig. S1A). Of these, 308 cases were assigned to a molecular subclass based on transcriptomic analysis. Among tumors from the whole cohort, 36% (n = 156) were classified as poor-prognosis proliferation class (20% were S1 [n = 85] and 16% S2 [n = 71]) and 35% (n = 152) as good-prognosis non-proliferation (S3) subclass, with a median OS of 6.3 years (5-year OS: 61% [95% CI 54-70%]) (Table S1, Fig. S1B).
To develop a model capable of predicting these biologically grounded and prognosis-related molecular profiles, the transformer-based DL model was trained using matched WSIs and molecular subclass labels. The model was trained using five-fold cross-validation, stratified by subclass (Table S1). It achieved mean AUC values of 0.75 ± 0.11 and 0.75 ± 0.07 for the proliferation subclasses, respectively, and 0.79 ± 0.10, for the non-proliferation subclass (Fig. 2A). Calibration curves, generated by pooling predictions from the test sets of all five cross-validation folds for the non-proliferation subclass, demonstrated good calibration, with good alignment between predicted and observed probabilities (Fig. S2A), supporting model reliability. These results indicate that HCC molecular profiles can be captured by histological features in H&E slides and can be inferred using weakly supervised DL.
External validation of the AI-generated prognosis-related molecular subclassification model
To evaluate the generalizability of our model, we deployed it on two test sets of HCC samples: the TCGA-LIHC cohort of resected HCCs (n = 363) and another independent multicenter cohort of 64 advanced-stage HCC cases (Table 1). In the TCGA cohort, patients were predominantly male (n = 244; 67%), with a median age of 61 years. Non-viral etiologies (ALD, MASLD, and other) of HCC were most prevalent (n = 143; 39%), followed by HBV (n = 83; 23%) and HCV (n = 30; 8%). Most patients had preserved liver function (Child-Pugh A; n = 215; 59%), and the overall median OS was 4.6 years (95% CI 3.8–6.9 years) (Fig. S1C). Transcriptomically, the prevalence of poor-prognosis proliferation subclasses was 24% and 19%, respectively, and of good-prognosis non-proliferation subclass S3 was 41% (Table S1). Consistent with findings in the training set, patients classified in the non-proliferation subclass had a median OS of 5.8 years (5-year OS: 54% [95% CI 43-68%]) (Fig. S1D). When applied to TCGA WSIs, the DL model achieved AUCs of 0.78 ± 0.05 and 0.72 ± 0.07 for the proliferation subclasses, respectively, and 0.80 ± 0.05 for the non-proliferation subclass (Fig. 2B). The model showed an average precision of 0.77 ± 0.07 for the prediction of the non-proliferation subclass (Fig. S2B), with a sensitivity (recall) of 82% and specificity of 60% (Fig. S2C).
In the second test set of advanced HCC, the prevalence of the proliferation subclasses was 32% for S1 and 16% for S2, while the non-proliferation subclass S3 accounted for 53% of cases (Table S1). Remarkably, the DL model successfully classified the advanced HCC WSIs into the molecular subclasses and maintained its discriminative performance, achieving AUCs of 0.76 ± 0.12 and 0.81 ± 0.16 for the proliferation subclasses, respectively, and 0.78 ± 0.12 for the non-proliferation subclass (Fig. 2C). A sub-analysis comparing biopsies and resections confirmed that the model performed consistently across both sample types, with an AUC of 0.77 and 0.80, respectively, for non-proliferation subclass prediction (Fig. S2D). Notably, even though 44% of the samples in this cohort were biopsies, the model maintained robust discriminative performance, achieving an average precision of 0.82 ± 0.12 for the prediction of the non-proliferation subclass (Fig. S2E). These results suggest that transcriptome-based prognostic molecular traits can be detected in H&E slides across independent, real-world datasets, including biopsy material, demonstrating its potential utility in stratifying patients with favorable biology and good prognosis.
AI-predicted non-proliferative tumors from H&E are associated with better survival and immune features
Association with survival
To explore the prognostic utility of the DL model in a clinical context, we evaluated the association between DL-predicted molecular subclasses and patient survival. In the TCGA cohort, patients predicted to belong to the non-proliferation subclass showed a significantly longer median OS of 5.8 years (5-year OS: 52%) compared to 3.5 years (5-year OS: 44%) in those predicted to belong to the proliferation subclasses (p = 0.02) (Fig. 2D). Similarly, patients with advanced HCC classified into the non-proliferation subclass by the DL model had a significantly improved 3-year OS of 89% (95% CI 71-100%) compared to 42% (95% CI 25-71%) in the proliferation group (p = 0.02) (Fig. 2E). Altogether, these results indicate that the AI model derives relevant phenotypic information from histopathology linked to favorable tumor biology and associated with a better clinical outcome of the patient.
Association with immune features
To confirm the biological relevance of the DL-based subclass predictions, we explored the distinct immune profiles by transcriptomic analyses. In the TCGA cohort, tumors predicted to belong to the good-prognosis non-proliferation subclass showed lower enrichment in the 'inflamed' gene signature (p <0.01 vs. the proliferation group) (Fig. S3A), which is associated with increased immune infiltration and inflammation.9 Deconvolution analysis using single-cell RNA sequencing-derived cell type signatures further revealed that predicted non-proliferative tumors had increased enrichment of the cell fraction of hepatocytes/tumor cells (p <0.001 vs. the proliferation group) (Fig. S3B), consistent with their higher degree of differentiation. These tumors showed reduced immune cell infiltration,12 particularly of immunosuppressive TREM2+ macrophages (p <0.001 vs. the proliferation group) (Fig. S3C), previously linked to relapse after resection and poor response to immunotherapy.29,30 Pathologist review of high-attention regions from the DL model further supported these findings. In this regard, tiles classified as non-proliferation subclass revealed moderate tumor density, hepatocyte-like differentiation and mild immune cell infiltrate (Fig. 3A), aligning with the well-differentiated and less inflammatory biology of the non-proliferation class.1,8,9,12 In contrast, tiles from the proliferation/poor prognosis class displayed mild/marked tumor density and greater immune cell infiltrate (Fig. 3B), consistent with their more poorly differentiated and inflamed tumor histology.1,8,9,12
AI-predicted microvascular invasion from H&E is associated with poor survival and immunosuppressive features
Association with survival
We next explored the capacity of AI to predict mVI from one single H&E tumoral slide per patient. For this purpose, we used a second DL model trained to predict mVI from WSIs from the training set with available pathological evaluation of mVI, with 38% of the patients with pathologically confirmed presence of mVI. In this cohort, median OS was significantly worse in patients with mVI (4.3 years vs. 7.6 years in patients without mVI; p <0.0001) (Fig. S4A). The presence of pathologically detected mVI was an independent predictor of OS (hazard ratio 1.65; 95% CI 1.22-2.23; p = 0.001) in the multivariate analysis when adjusted for common predictor factors of survival in HCC such as tumor size >5 cm, AFP >400 ng/ml, multinodularity, and satellitosis (Fig. S4C). The presence of pathologically detected mVI was also higher in the molecular subclasses associated with proliferation and poor prognosis (40-58%) compared to non-proliferation/good prognosis (24%) (Fig. S4D). The presence of mVI in the TCGA test was 29% but was not significantly enriched in any of the molecular prognosis subtypes, and was not associated with survival (Fig. S4B and E).
The DL model was trained using features extracted from the WSIs and the corresponding mVI pathological labels, available for 410 patients from the training set (Table S2). In cross-validation, the best-performing model reached an AUC of 0.80 (mean AUC 0.70 ± 0.08) (Fig. 4A) and showed generalizability to the TCGA test set with moderate discriminative performance, achieving an AUC of 0.62 ± 0.07 (Fig. 4B), consistent with previously reported challenges in mVI detection across multi-center cohorts.17,31 Despite this, the DL model demonstrated significant prognostic value: patients predicted to have mVI had a median OS of 4.9 years (5-year OS: 48% [95% CI 39-60%]), compared to 7.6 years (5-year OS: 66% [95% CI 53-83%]) in those predicted without mVI (p = 0.003) (Fig. 4C), highlighting the relevance of AI-predicted mVI status in survival stratification. In addition, in multivariable analysis including other available prognostic variables such as AFP >400 ng/ml, degree of differentiation, and age, DL-predicted mVI remained independently associated with OS, along with age (Fig. S5).
Association with immune features
Transcriptomic analyses of TCGA tumors with predicted mVI showed enrichment of a cycling T-cell gene signature (p <0.001, Fig. S6A), previously associated with exhausted CD8+ T cells and an immunosuppressive microenvironment.16 Accordingly, single cell-informed deconvolution analysis further revealed increased proliferative hepatocytes/tumor cells and immunosuppressive and pro-tumorigenic immune cells, including SPP1+ macrophages and CD14+ monocytes (p <0.001) (Fig. S6B–D).29,30 In contrast, inflammatory macrophages (CXCL10+) were not enriched (Fig. S6E), while CD8+ Temra cells were selectively enriched in DL-predicted mVI+ tumors (Fig. S6F). These findings suggest that the AI model captures mVI-associated histological correlates of an inflamed yet immunosuppressed microenvironment.16,29,32 Accordingly, expert pathologist review of high-attention tiles revealed increased tumor density in cases predicted to have mVI, compared to the absence of mVI, consistent with previously described invasive and proliferative traits linked to mVI tumors (Fig. 5A,B).15,16
Direct prediction of prognosis-related molecular profiles from H&E slides using AI based models
To predict HCC outcome-related molecular subclasses from histological data, H&E slides containing tumor resections were scanned at high resolution to generate WSIs, which were used to train a transformer-based DL model. To this end, we used a multicenter training cohort of 431 patients with HCC and available gene expression profiling and H&E scans (Table S1), with the main clinical characteristics described in Table 1. Briefly, patients were predominantly male (n = 328; 78%), with a median age of 66 years (IQR 61–72). The most common etiology was HCV infection (n = 181; 42%), followed by non-viral etiologies, including alcohol-associated liver disease (ALD; n = 41; 10%) and metabolic dysfunction-associated steatotic liver disease (MASLD; n = 43; 10%). Most patients had compensated liver function (Child-Pugh A: n = 373; 87%) and early-stage disease (BCLC stage 0–A: n = 358; 83%). Median OS was 6.4 years (95% CI 5.6–7.4 years) (Fig. S1A). Of these, 308 cases were assigned to a molecular subclass based on transcriptomic analysis. Among tumors from the whole cohort, 36% (n = 156) were classified as poor-prognosis proliferation class (20% were S1 [n = 85] and 16% S2 [n = 71]) and 35% (n = 152) as good-prognosis non-proliferation (S3) subclass, with a median OS of 6.3 years (5-year OS: 61% [95% CI 54-70%]) (Table S1, Fig. S1B).
To develop a model capable of predicting these biologically grounded and prognosis-related molecular profiles, the transformer-based DL model was trained using matched WSIs and molecular subclass labels. The model was trained using five-fold cross-validation, stratified by subclass (Table S1). It achieved mean AUC values of 0.75 ± 0.11 and 0.75 ± 0.07 for the proliferation subclasses, respectively, and 0.79 ± 0.10, for the non-proliferation subclass (Fig. 2A). Calibration curves, generated by pooling predictions from the test sets of all five cross-validation folds for the non-proliferation subclass, demonstrated good calibration, with good alignment between predicted and observed probabilities (Fig. S2A), supporting model reliability. These results indicate that HCC molecular profiles can be captured by histological features in H&E slides and can be inferred using weakly supervised DL.
External validation of the AI-generated prognosis-related molecular subclassification model
To evaluate the generalizability of our model, we deployed it on two test sets of HCC samples: the TCGA-LIHC cohort of resected HCCs (n = 363) and another independent multicenter cohort of 64 advanced-stage HCC cases (Table 1). In the TCGA cohort, patients were predominantly male (n = 244; 67%), with a median age of 61 years. Non-viral etiologies (ALD, MASLD, and other) of HCC were most prevalent (n = 143; 39%), followed by HBV (n = 83; 23%) and HCV (n = 30; 8%). Most patients had preserved liver function (Child-Pugh A; n = 215; 59%), and the overall median OS was 4.6 years (95% CI 3.8–6.9 years) (Fig. S1C). Transcriptomically, the prevalence of poor-prognosis proliferation subclasses was 24% and 19%, respectively, and of good-prognosis non-proliferation subclass S3 was 41% (Table S1). Consistent with findings in the training set, patients classified in the non-proliferation subclass had a median OS of 5.8 years (5-year OS: 54% [95% CI 43-68%]) (Fig. S1D). When applied to TCGA WSIs, the DL model achieved AUCs of 0.78 ± 0.05 and 0.72 ± 0.07 for the proliferation subclasses, respectively, and 0.80 ± 0.05 for the non-proliferation subclass (Fig. 2B). The model showed an average precision of 0.77 ± 0.07 for the prediction of the non-proliferation subclass (Fig. S2B), with a sensitivity (recall) of 82% and specificity of 60% (Fig. S2C).
In the second test set of advanced HCC, the prevalence of the proliferation subclasses was 32% for S1 and 16% for S2, while the non-proliferation subclass S3 accounted for 53% of cases (Table S1). Remarkably, the DL model successfully classified the advanced HCC WSIs into the molecular subclasses and maintained its discriminative performance, achieving AUCs of 0.76 ± 0.12 and 0.81 ± 0.16 for the proliferation subclasses, respectively, and 0.78 ± 0.12 for the non-proliferation subclass (Fig. 2C). A sub-analysis comparing biopsies and resections confirmed that the model performed consistently across both sample types, with an AUC of 0.77 and 0.80, respectively, for non-proliferation subclass prediction (Fig. S2D). Notably, even though 44% of the samples in this cohort were biopsies, the model maintained robust discriminative performance, achieving an average precision of 0.82 ± 0.12 for the prediction of the non-proliferation subclass (Fig. S2E). These results suggest that transcriptome-based prognostic molecular traits can be detected in H&E slides across independent, real-world datasets, including biopsy material, demonstrating its potential utility in stratifying patients with favorable biology and good prognosis.
AI-predicted non-proliferative tumors from H&E are associated with better survival and immune features
Association with survival
To explore the prognostic utility of the DL model in a clinical context, we evaluated the association between DL-predicted molecular subclasses and patient survival. In the TCGA cohort, patients predicted to belong to the non-proliferation subclass showed a significantly longer median OS of 5.8 years (5-year OS: 52%) compared to 3.5 years (5-year OS: 44%) in those predicted to belong to the proliferation subclasses (p = 0.02) (Fig. 2D). Similarly, patients with advanced HCC classified into the non-proliferation subclass by the DL model had a significantly improved 3-year OS of 89% (95% CI 71-100%) compared to 42% (95% CI 25-71%) in the proliferation group (p = 0.02) (Fig. 2E). Altogether, these results indicate that the AI model derives relevant phenotypic information from histopathology linked to favorable tumor biology and associated with a better clinical outcome of the patient.
Association with immune features
To confirm the biological relevance of the DL-based subclass predictions, we explored the distinct immune profiles by transcriptomic analyses. In the TCGA cohort, tumors predicted to belong to the good-prognosis non-proliferation subclass showed lower enrichment in the 'inflamed' gene signature (p <0.01 vs. the proliferation group) (Fig. S3A), which is associated with increased immune infiltration and inflammation.9 Deconvolution analysis using single-cell RNA sequencing-derived cell type signatures further revealed that predicted non-proliferative tumors had increased enrichment of the cell fraction of hepatocytes/tumor cells (p <0.001 vs. the proliferation group) (Fig. S3B), consistent with their higher degree of differentiation. These tumors showed reduced immune cell infiltration,12 particularly of immunosuppressive TREM2+ macrophages (p <0.001 vs. the proliferation group) (Fig. S3C), previously linked to relapse after resection and poor response to immunotherapy.29,30 Pathologist review of high-attention regions from the DL model further supported these findings. In this regard, tiles classified as non-proliferation subclass revealed moderate tumor density, hepatocyte-like differentiation and mild immune cell infiltrate (Fig. 3A), aligning with the well-differentiated and less inflammatory biology of the non-proliferation class.1,8,9,12 In contrast, tiles from the proliferation/poor prognosis class displayed mild/marked tumor density and greater immune cell infiltrate (Fig. 3B), consistent with their more poorly differentiated and inflamed tumor histology.1,8,9,12
AI-predicted microvascular invasion from H&E is associated with poor survival and immunosuppressive features
Association with survival
We next explored the capacity of AI to predict mVI from one single H&E tumoral slide per patient. For this purpose, we used a second DL model trained to predict mVI from WSIs from the training set with available pathological evaluation of mVI, with 38% of the patients with pathologically confirmed presence of mVI. In this cohort, median OS was significantly worse in patients with mVI (4.3 years vs. 7.6 years in patients without mVI; p <0.0001) (Fig. S4A). The presence of pathologically detected mVI was an independent predictor of OS (hazard ratio 1.65; 95% CI 1.22-2.23; p = 0.001) in the multivariate analysis when adjusted for common predictor factors of survival in HCC such as tumor size >5 cm, AFP >400 ng/ml, multinodularity, and satellitosis (Fig. S4C). The presence of pathologically detected mVI was also higher in the molecular subclasses associated with proliferation and poor prognosis (40-58%) compared to non-proliferation/good prognosis (24%) (Fig. S4D). The presence of mVI in the TCGA test was 29% but was not significantly enriched in any of the molecular prognosis subtypes, and was not associated with survival (Fig. S4B and E).
The DL model was trained using features extracted from the WSIs and the corresponding mVI pathological labels, available for 410 patients from the training set (Table S2). In cross-validation, the best-performing model reached an AUC of 0.80 (mean AUC 0.70 ± 0.08) (Fig. 4A) and showed generalizability to the TCGA test set with moderate discriminative performance, achieving an AUC of 0.62 ± 0.07 (Fig. 4B), consistent with previously reported challenges in mVI detection across multi-center cohorts.17,31 Despite this, the DL model demonstrated significant prognostic value: patients predicted to have mVI had a median OS of 4.9 years (5-year OS: 48% [95% CI 39-60%]), compared to 7.6 years (5-year OS: 66% [95% CI 53-83%]) in those predicted without mVI (p = 0.003) (Fig. 4C), highlighting the relevance of AI-predicted mVI status in survival stratification. In addition, in multivariable analysis including other available prognostic variables such as AFP >400 ng/ml, degree of differentiation, and age, DL-predicted mVI remained independently associated with OS, along with age (Fig. S5).
Association with immune features
Transcriptomic analyses of TCGA tumors with predicted mVI showed enrichment of a cycling T-cell gene signature (p <0.001, Fig. S6A), previously associated with exhausted CD8+ T cells and an immunosuppressive microenvironment.16 Accordingly, single cell-informed deconvolution analysis further revealed increased proliferative hepatocytes/tumor cells and immunosuppressive and pro-tumorigenic immune cells, including SPP1+ macrophages and CD14+ monocytes (p <0.001) (Fig. S6B–D).29,30 In contrast, inflammatory macrophages (CXCL10+) were not enriched (Fig. S6E), while CD8+ Temra cells were selectively enriched in DL-predicted mVI+ tumors (Fig. S6F). These findings suggest that the AI model captures mVI-associated histological correlates of an inflamed yet immunosuppressed microenvironment.16,29,32 Accordingly, expert pathologist review of high-attention tiles revealed increased tumor density in cases predicted to have mVI, compared to the absence of mVI, consistent with previously described invasive and proliferative traits linked to mVI tumors (Fig. 5A,B).15,16
Discussion
Discussion
Identification of poor-prognosis tumors across distinct HCC stages remains a clinical challenge. Similarly, recognition of critical prognostic features, such as mVI, from a tumor tissue sample remains a need in the context of surgical procedures. Although transcriptomic analyses have revealed complex genomic traits linked to poor outcome and vascular invasion, these are difficult to translate into clinical practice. The advent of AI-driven recognition of complex features from a single H&E-stained slide provides a path to develop tools that can be readily used. This study marks an advancement in applying AI within digital pathology for HCC, since we tackled both challenges.
In the current study, by using a weakly-supervised transformer-based deep learning model, we accurately predicted outcome-related molecular profiles, which integrate multi-dimensional biological features such as genetic, histological, immunological, and clinical variables, directly from high-resolution H&E-stained slide images. In fact, based on these slides, the DL model demonstrated a high discriminative capacity in classifying patients into HCC molecular-based prognostic subtypes with an AUC of 0.75-0.79, in the internal cross-validation. Importantly, this performance remained consistent across three independent cohorts of distinct HCC stages, regardless of etiology, specimen type (resection or biopsy), staining protocol, slide scanner, encoding format or the gene expression profiling technique used in the training cohort, reinforcing its robustness and potential for broad clinical applicability. Moreover, patients classified by DL as belonging to the non-proliferation subclass demonstrated favorable tumor biology, with increased OS compared to those in the proliferation class (5.8 vs. 3.5 years). Transcriptomic deconvolution of these tumors confirmed an immunologically beneficial landscape, with depletion of immunosuppressive cell types, such as TREM2+ macrophages,29,32 in line with other studies.[11], [12], [13] These findings support the biological validity of the DL predictions and their potential utility for stratifying patients in addition to standard risk assessment.
Over the years, several efforts have been made to provide an HCC classification associated with outcome based upon complex molecular characteristics,3,4,[6], [7], [8], [9], [10],12 and, more recently, AI tools have shown promise in predicting patient survival directly from H&E slides.24 Our study advances this field by: (1) using a single routine pathological slide per patient to accurately capture molecularly complex prognostic features, (2) demonstrating applicability in biopsy samples from advanced HCC, suggesting feasibility for preoperative use, and (3) simultaneously predicting key biological features, including mVI, molecular traits and survival. Focusing on molecular subclasses and mVI enhances biological interpretability and offers more actionable insights for future personalized treatment strategies.
Similarly, although mVI is a well-established predictor of poor outcome and risk of recurrence, its detection can only be performed on adjacent tumor tissue obtained through meticulous pathological assessment of several H&E slides from liver resection.14 This limits its utility for preoperative decision-making, such as selecting patients with more aggressive tumors (at risk of recurrence), who may be candidates for neoadjuvant therapy in the setting of resection or liver transplantation. Thus, there is an urgent need for practical tools to stratify patients by risk and prognosis. Although attempts have been made to predict mVI based on gene expression signatures,5 these efforts have been hard to scale for clinical practice. Thus, we developed a second AI-based model to detect mVI from the same routine histological input, enabling the assessment of complex pathological features in a more accessible and scalable manner. For mVI prediction, the DL model achieved a mean AUC of 0.7 in the internal cross-validation and retained predictive performance in the TCGA test set (AUC: 0.62), which was moderate due to known variability in mVI assessment protocols and sampling limitations.17,31 DL-predicted mVI was associated with poorer survival, with an estimated median OS of 4.9 years for mVI HCCs vs. 7.6 years for non-mVI HCCs. In addition, we consistently showed that mVI cases were significantly associated with an immunosuppressive microenvironment. In fact, our AI-based prediction of mVI was linked to a higher enrichment of immunosuppressive and pro-tumorigenic cells, such as SPP1+ macrophages and CD14+ monocytes, and mVI-linked exhausted CD8+ T cells, consistent with previous findings.16,29,30 While the model does not aim to replace pathologist evaluation, it may serve as a reproducible adjunct, particularly for identifying high-risk features beyond visible mVI.
In the field of automated mVI detection in HCC, recent research has advanced with DL models; however, our model circumvents some of the previous limitations, such as: (1) the requirements for highly supervised models, (2) predictions based on less heterogeneous populations, and (3) the opacity of DL models. First, our weakly supervised approach does not require detailed cell-level pathological annotations or highly curated, image-guided annotations, in contrast to highly supervised models that have previously been published.33,34 Weakly supervised models, such as ours, can benefit from larger datasets, which help reduce label noise without requiring complex annotations for precise localization of mVI.35 Second, our mVI-predicting model was trained and validated on more heterogeneous cohorts in terms of etiology and clinical background, compared to previous models developed in more uniform study populations, with a clear enrichment in hepatitis B-related HCC.34,36 Our model’s prediction performance was comparable with predictions of mVI using our reported mVI gene expression-based signatures, which had an accuracy of 69%.5 Third, we aimed to make our DL model explainable by examining the features influencing the model’s predictions through the generation of attention heatmaps to visualize focal areas within WSIs. These maps mitigate the opacity of DL layers by highlighting histologically relevant regions, which were reviewed by an expert pathologist. Finally, our model leverages the STAMP pipeline and the foundation model UNI to extract robust histological features from diverse HCC specimens. In this context, although not yet sufficient for full clinical deployment, our AI model offers powerful means to predict high- or low-risk molecular and histological profiles from routine histopathology, supporting patient stratification in clinical trials evaluating perioperative systemic therapies.
Nonetheless, some limitations should be acknowledged. First, prospective validation of the DL models’ prognostic utility on large biopsy-based cohorts could meaningfully inform clinical decisions, such as intensifying surveillance in high-risk patients or tailoring treatment strategies accordingly. Importantly, our model was trained using only a single H&E-stained slide per case, reflecting the scenario where only biopsy tissue is available in preoperative cases, demonstrating the model’s ability to learn relevant prognostic features regardless of tumor content. Second, heterogeneous detection of mVI in published cohorts (15% to over 60%),14,17,31 was also observed in our training vs. TCGA cohort (38% vs. 29%, respectively). This variability reflects differences in the assessment protocol used in detecting mVI, which is based upon examining a range from 1 to 10 H&E slides, depending on the study.14,17,31 By contrast, our method provides a standardized prediction of surrogate histological features significantly associated with mVI that is independent of the number of slides evaluated, offering a scalable alternative to mitigate sampling-related variability in pathological assessment. Thus, this might explain why the performance of our AI-based identification of mVI (mean AUC of 0.7 in the internal cross-validation) decreased in the TCGA test set (AUC of 0.62). Moreover, key clinical variables such as tumor size, number of nodules, and satellitosis from the TCGA cohort were unavailable, which are necessary to compare the model’s prognostic capability with conventional clinical prognostic models. Finally, this study introduces a novel application by building upon previous work in the field of AI,33,34,36 using weakly supervised models trained on a single H&E slide to predict mVI, and serves as a proof-of-concept for predicting molecular subtypes with prognostic relevance. These findings lay the groundwork for future studies to benchmark these models against established clinical scores in treatment decision-making.
Overall, our AI-based models accurately predict prognosis based on recognition of distinct molecular subclasses and predicts the presence of mVI, thus enabling capturing complex molecular processes with clinical implications. While not designed to replace existing staging systems, our models may offer a practical and translational tool and provide complementary prognostic insight to guide personalized treatment directly from H&E slides.
Identification of poor-prognosis tumors across distinct HCC stages remains a clinical challenge. Similarly, recognition of critical prognostic features, such as mVI, from a tumor tissue sample remains a need in the context of surgical procedures. Although transcriptomic analyses have revealed complex genomic traits linked to poor outcome and vascular invasion, these are difficult to translate into clinical practice. The advent of AI-driven recognition of complex features from a single H&E-stained slide provides a path to develop tools that can be readily used. This study marks an advancement in applying AI within digital pathology for HCC, since we tackled both challenges.
In the current study, by using a weakly-supervised transformer-based deep learning model, we accurately predicted outcome-related molecular profiles, which integrate multi-dimensional biological features such as genetic, histological, immunological, and clinical variables, directly from high-resolution H&E-stained slide images. In fact, based on these slides, the DL model demonstrated a high discriminative capacity in classifying patients into HCC molecular-based prognostic subtypes with an AUC of 0.75-0.79, in the internal cross-validation. Importantly, this performance remained consistent across three independent cohorts of distinct HCC stages, regardless of etiology, specimen type (resection or biopsy), staining protocol, slide scanner, encoding format or the gene expression profiling technique used in the training cohort, reinforcing its robustness and potential for broad clinical applicability. Moreover, patients classified by DL as belonging to the non-proliferation subclass demonstrated favorable tumor biology, with increased OS compared to those in the proliferation class (5.8 vs. 3.5 years). Transcriptomic deconvolution of these tumors confirmed an immunologically beneficial landscape, with depletion of immunosuppressive cell types, such as TREM2+ macrophages,29,32 in line with other studies.[11], [12], [13] These findings support the biological validity of the DL predictions and their potential utility for stratifying patients in addition to standard risk assessment.
Over the years, several efforts have been made to provide an HCC classification associated with outcome based upon complex molecular characteristics,3,4,[6], [7], [8], [9], [10],12 and, more recently, AI tools have shown promise in predicting patient survival directly from H&E slides.24 Our study advances this field by: (1) using a single routine pathological slide per patient to accurately capture molecularly complex prognostic features, (2) demonstrating applicability in biopsy samples from advanced HCC, suggesting feasibility for preoperative use, and (3) simultaneously predicting key biological features, including mVI, molecular traits and survival. Focusing on molecular subclasses and mVI enhances biological interpretability and offers more actionable insights for future personalized treatment strategies.
Similarly, although mVI is a well-established predictor of poor outcome and risk of recurrence, its detection can only be performed on adjacent tumor tissue obtained through meticulous pathological assessment of several H&E slides from liver resection.14 This limits its utility for preoperative decision-making, such as selecting patients with more aggressive tumors (at risk of recurrence), who may be candidates for neoadjuvant therapy in the setting of resection or liver transplantation. Thus, there is an urgent need for practical tools to stratify patients by risk and prognosis. Although attempts have been made to predict mVI based on gene expression signatures,5 these efforts have been hard to scale for clinical practice. Thus, we developed a second AI-based model to detect mVI from the same routine histological input, enabling the assessment of complex pathological features in a more accessible and scalable manner. For mVI prediction, the DL model achieved a mean AUC of 0.7 in the internal cross-validation and retained predictive performance in the TCGA test set (AUC: 0.62), which was moderate due to known variability in mVI assessment protocols and sampling limitations.17,31 DL-predicted mVI was associated with poorer survival, with an estimated median OS of 4.9 years for mVI HCCs vs. 7.6 years for non-mVI HCCs. In addition, we consistently showed that mVI cases were significantly associated with an immunosuppressive microenvironment. In fact, our AI-based prediction of mVI was linked to a higher enrichment of immunosuppressive and pro-tumorigenic cells, such as SPP1+ macrophages and CD14+ monocytes, and mVI-linked exhausted CD8+ T cells, consistent with previous findings.16,29,30 While the model does not aim to replace pathologist evaluation, it may serve as a reproducible adjunct, particularly for identifying high-risk features beyond visible mVI.
In the field of automated mVI detection in HCC, recent research has advanced with DL models; however, our model circumvents some of the previous limitations, such as: (1) the requirements for highly supervised models, (2) predictions based on less heterogeneous populations, and (3) the opacity of DL models. First, our weakly supervised approach does not require detailed cell-level pathological annotations or highly curated, image-guided annotations, in contrast to highly supervised models that have previously been published.33,34 Weakly supervised models, such as ours, can benefit from larger datasets, which help reduce label noise without requiring complex annotations for precise localization of mVI.35 Second, our mVI-predicting model was trained and validated on more heterogeneous cohorts in terms of etiology and clinical background, compared to previous models developed in more uniform study populations, with a clear enrichment in hepatitis B-related HCC.34,36 Our model’s prediction performance was comparable with predictions of mVI using our reported mVI gene expression-based signatures, which had an accuracy of 69%.5 Third, we aimed to make our DL model explainable by examining the features influencing the model’s predictions through the generation of attention heatmaps to visualize focal areas within WSIs. These maps mitigate the opacity of DL layers by highlighting histologically relevant regions, which were reviewed by an expert pathologist. Finally, our model leverages the STAMP pipeline and the foundation model UNI to extract robust histological features from diverse HCC specimens. In this context, although not yet sufficient for full clinical deployment, our AI model offers powerful means to predict high- or low-risk molecular and histological profiles from routine histopathology, supporting patient stratification in clinical trials evaluating perioperative systemic therapies.
Nonetheless, some limitations should be acknowledged. First, prospective validation of the DL models’ prognostic utility on large biopsy-based cohorts could meaningfully inform clinical decisions, such as intensifying surveillance in high-risk patients or tailoring treatment strategies accordingly. Importantly, our model was trained using only a single H&E-stained slide per case, reflecting the scenario where only biopsy tissue is available in preoperative cases, demonstrating the model’s ability to learn relevant prognostic features regardless of tumor content. Second, heterogeneous detection of mVI in published cohorts (15% to over 60%),14,17,31 was also observed in our training vs. TCGA cohort (38% vs. 29%, respectively). This variability reflects differences in the assessment protocol used in detecting mVI, which is based upon examining a range from 1 to 10 H&E slides, depending on the study.14,17,31 By contrast, our method provides a standardized prediction of surrogate histological features significantly associated with mVI that is independent of the number of slides evaluated, offering a scalable alternative to mitigate sampling-related variability in pathological assessment. Thus, this might explain why the performance of our AI-based identification of mVI (mean AUC of 0.7 in the internal cross-validation) decreased in the TCGA test set (AUC of 0.62). Moreover, key clinical variables such as tumor size, number of nodules, and satellitosis from the TCGA cohort were unavailable, which are necessary to compare the model’s prognostic capability with conventional clinical prognostic models. Finally, this study introduces a novel application by building upon previous work in the field of AI,33,34,36 using weakly supervised models trained on a single H&E slide to predict mVI, and serves as a proof-of-concept for predicting molecular subtypes with prognostic relevance. These findings lay the groundwork for future studies to benchmark these models against established clinical scores in treatment decision-making.
Overall, our AI-based models accurately predict prognosis based on recognition of distinct molecular subclasses and predicts the presence of mVI, thus enabling capturing complex molecular processes with clinical implications. While not designed to replace existing staging systems, our models may offer a practical and translational tool and provide complementary prognostic insight to guide personalized treatment directly from H&E slides.
Abbreviations
Abbreviations
AI, artificial intelligence; AUC, area under the curve; DL, deep learning; HCC, hepatocellular carcinoma; HR, hazard ratio; LIHC, Liver Hepatocellular Carcinoma; MIL, multiple instance learning; MVI, macrovascular invasion; mVI, microvascular invasion; OS, overall survival; STAMP, solid tumor associative modeling in pathology; TCGA, The Cancer Genome Atlas; VI, vascular invasion; WSI, whole slide image.
AI, artificial intelligence; AUC, area under the curve; DL, deep learning; HCC, hepatocellular carcinoma; HR, hazard ratio; LIHC, Liver Hepatocellular Carcinoma; MIL, multiple instance learning; MVI, macrovascular invasion; mVI, microvascular invasion; OS, overall survival; STAMP, solid tumor associative modeling in pathology; TCGA, The Cancer Genome Atlas; VI, vascular invasion; WSI, whole slide image.
Financial support
Financial support
10.13039/100016851TS was supported by the German 10.13039/501100002347Federal Ministry of Education and Research (TRANSFORM LIVER, 031L0312A). 10.13039/100015437AM was supported by 10.13039/501100002809Generalitat de Catalunya with a FI-SDUR fellowship (2021 FISDU 00338) from 10.13039/501100003030AGAUR and by mobility grants from the 10.13039/501100005774University of Barcelona, Montcelimar Foundation and Acadèmia de Ciències Mèdiques i de la Salut de Catalunya i de Balears Foundation. 10.13039/100016170JB was supported by the MODS project funded from the programme “Profilbildung 2020” [grant no. PROFILNRW-2020–107-A], an initiative of the Ministry of Culture and Science of the State of North Rhine-Westphalia. 10.13039/100006138EM was supported by Andrew K. Burroughs Short-Term Training Fellowship 2021 from EILF-EASL. RP was supported by the Fundació de Recerca Clínic Barcelona - IDIBAPS and by a grant from the Spanish National 10.13039/100018696Health Institute (10.13039/501100004837MICINN, PID2022-139365OB-I00). 10.13039/100000073AS received funding from the Ministry of Culture and Science of the State of North Rhine-Estphalia (CANTAR – NW21-062E). JNK is supported by the 10.13039/501100005972German Cancer Aid (DECADE, 70115166), the German 10.13039/501100002347Federal Ministry of Education and Research (PEARL, 01KD2104C; CAMINO, 01EO2101; SWAG, 01KD2215A; TRANSFORM LIVER, 031L0312A; TANGERINE, 01KT2302 through ERA-NET Transcan; Come2Data, 16DKZ2044A; DEEP-HCC, 031L0315A); DECIPHER-M, 01KD2420A; NextBIG, 01ZU2402A), the 10.13039/100021828German Academic Exchange Service (SECAI, 57616814), the German 10.13039/501100014840Federal Joint Committee (TransplantKI, 0110.13039/100014568VS.F21048) the European Union’s 10.13039/100018693Horizon Europe and innovation programme (ODELIA, 101057091; GENIAL, 101096312), the 10.13039/501100000781European Research Council (ERC; NADIR, 101114631), the 10.13039/100000002National Institutes of Health (EPICO, R01 CA263318) and the 10.13039/501100000272National Institute for Health and Care Research (10.13039/100006662NIHR, NIHR203331) 10.13039/501100018955Leeds Biomedical Research Centre. The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health and Social Care. This work was funded by the 10.13039/501100000780European Union. Views and opinions expressed are, however, those of the author(s) only and do not necessarily reflect those of the European Union. Neither the 10.13039/501100000780European Union nor the granting authority can be held responsible for them. TL is funded by the 10.13039/501100000781European Research Council (10.13039/100017325ERC): grant agreement 771083; the 10.13039/501100001659German Research Foundation (DFG): 279874820, 461704932, 440603844; the 10.13039/501100005972German Cancer Aid (10.13039/501100005972Deutsche Krebshilfe): 70114893, 70115166; and in part by the Ministry of Culture and Science of the State of North Rhine-Westphalia: NW21-062E (CANTAR), PROFILNRW-2020-107-A (MODS), and the German Federal Ministry of Education and Research: 031L0312B (Transform Liver). JML is supported by grants from 10.13039/501100000780European Commission (Horizon Europe-Mission Cancer, THRIVE, Ref. 101136622), the 10.13039/100000002NIH (R01-CA273932-01, R01DK56621 and R01DK128289); Samuel Waxman Cancer Research Foundation; the Spanish National 10.13039/100018696Health Institute (10.13039/501100004837MICINN, PID2022-139365OB-I00, funded by MICIU/AEI/10.13039/501100011033 and 10.13039/501100002924FEDER); 10.13039/501100000289Cancer Research UK (10.13039/501100000289CRUK), 10.13039/100020581Fondazione AIRC per la Ricerca sul Cancro and Fundación Científica de la Asociación Española Contra el Cáncer (FAECC) (Accelerator Award, HUNTER, Ref. C9380/A26813); “la Caixa” Foundation (Agreement LCF/PR/SP23/52950009); Fundación Científica de la Asociación Española Contra el Cáncer (FAECC; Proyectos Generales, Ref. PRYGN223117LLOV; Reto AECC 70% Supervivencia: Ref. RETOS245779LLOV; AECC-IDIBAPS Excellence Program Ref. EPAEC246711CLIN) and the Generalitat de Catalunya/AGAUR (2021 SGR 01347). Other authors have no relevant financial support statements to disclose.
10.13039/100016851TS was supported by the German 10.13039/501100002347Federal Ministry of Education and Research (TRANSFORM LIVER, 031L0312A). 10.13039/100015437AM was supported by 10.13039/501100002809Generalitat de Catalunya with a FI-SDUR fellowship (2021 FISDU 00338) from 10.13039/501100003030AGAUR and by mobility grants from the 10.13039/501100005774University of Barcelona, Montcelimar Foundation and Acadèmia de Ciències Mèdiques i de la Salut de Catalunya i de Balears Foundation. 10.13039/100016170JB was supported by the MODS project funded from the programme “Profilbildung 2020” [grant no. PROFILNRW-2020–107-A], an initiative of the Ministry of Culture and Science of the State of North Rhine-Westphalia. 10.13039/100006138EM was supported by Andrew K. Burroughs Short-Term Training Fellowship 2021 from EILF-EASL. RP was supported by the Fundació de Recerca Clínic Barcelona - IDIBAPS and by a grant from the Spanish National 10.13039/100018696Health Institute (10.13039/501100004837MICINN, PID2022-139365OB-I00). 10.13039/100000073AS received funding from the Ministry of Culture and Science of the State of North Rhine-Estphalia (CANTAR – NW21-062E). JNK is supported by the 10.13039/501100005972German Cancer Aid (DECADE, 70115166), the German 10.13039/501100002347Federal Ministry of Education and Research (PEARL, 01KD2104C; CAMINO, 01EO2101; SWAG, 01KD2215A; TRANSFORM LIVER, 031L0312A; TANGERINE, 01KT2302 through ERA-NET Transcan; Come2Data, 16DKZ2044A; DEEP-HCC, 031L0315A); DECIPHER-M, 01KD2420A; NextBIG, 01ZU2402A), the 10.13039/100021828German Academic Exchange Service (SECAI, 57616814), the German 10.13039/501100014840Federal Joint Committee (TransplantKI, 0110.13039/100014568VS.F21048) the European Union’s 10.13039/100018693Horizon Europe and innovation programme (ODELIA, 101057091; GENIAL, 101096312), the 10.13039/501100000781European Research Council (ERC; NADIR, 101114631), the 10.13039/100000002National Institutes of Health (EPICO, R01 CA263318) and the 10.13039/501100000272National Institute for Health and Care Research (10.13039/100006662NIHR, NIHR203331) 10.13039/501100018955Leeds Biomedical Research Centre. The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health and Social Care. This work was funded by the 10.13039/501100000780European Union. Views and opinions expressed are, however, those of the author(s) only and do not necessarily reflect those of the European Union. Neither the 10.13039/501100000780European Union nor the granting authority can be held responsible for them. TL is funded by the 10.13039/501100000781European Research Council (10.13039/100017325ERC): grant agreement 771083; the 10.13039/501100001659German Research Foundation (DFG): 279874820, 461704932, 440603844; the 10.13039/501100005972German Cancer Aid (10.13039/501100005972Deutsche Krebshilfe): 70114893, 70115166; and in part by the Ministry of Culture and Science of the State of North Rhine-Westphalia: NW21-062E (CANTAR), PROFILNRW-2020-107-A (MODS), and the German Federal Ministry of Education and Research: 031L0312B (Transform Liver). JML is supported by grants from 10.13039/501100000780European Commission (Horizon Europe-Mission Cancer, THRIVE, Ref. 101136622), the 10.13039/100000002NIH (R01-CA273932-01, R01DK56621 and R01DK128289); Samuel Waxman Cancer Research Foundation; the Spanish National 10.13039/100018696Health Institute (10.13039/501100004837MICINN, PID2022-139365OB-I00, funded by MICIU/AEI/10.13039/501100011033 and 10.13039/501100002924FEDER); 10.13039/501100000289Cancer Research UK (10.13039/501100000289CRUK), 10.13039/100020581Fondazione AIRC per la Ricerca sul Cancro and Fundación Científica de la Asociación Española Contra el Cáncer (FAECC) (Accelerator Award, HUNTER, Ref. C9380/A26813); “la Caixa” Foundation (Agreement LCF/PR/SP23/52950009); Fundación Científica de la Asociación Española Contra el Cáncer (FAECC; Proyectos Generales, Ref. PRYGN223117LLOV; Reto AECC 70% Supervivencia: Ref. RETOS245779LLOV; AECC-IDIBAPS Excellence Program Ref. EPAEC246711CLIN) and the Generalitat de Catalunya/AGAUR (2021 SGR 01347). Other authors have no relevant financial support statements to disclose.
Authors’ contributions
Authors’ contributions
TPS, JML, TL and JNK conceptualized, designed and supervised the study. AM, EM, RP and AGO designed the study and contributed to critical data interpretation. AM, LZ, JB, AGO, EM, and MvT contributed to the data analysis and/or provided technical support. CM contributed to the pathological revision of attention heatmaps and tiles from WSIs. AM, LZ, JB, TPS, RP, AGO, EM, UB, MPG, JHP, MK, ATS, CR, and JML provided scientific input. The manuscript was written by AM, LZ, and TPS, under the supervision of EM, AGO, RP, and JML. TPS, AM, LZ, EM, AGO, RP, and JML have been involved in critical revision of the manuscript.
TPS, JML, TL and JNK conceptualized, designed and supervised the study. AM, EM, RP and AGO designed the study and contributed to critical data interpretation. AM, LZ, JB, AGO, EM, and MvT contributed to the data analysis and/or provided technical support. CM contributed to the pathological revision of attention heatmaps and tiles from WSIs. AM, LZ, JB, TPS, RP, AGO, EM, UB, MPG, JHP, MK, ATS, CR, and JML provided scientific input. The manuscript was written by AM, LZ, and TPS, under the supervision of EM, AGO, RP, and JML. TPS, AM, LZ, EM, AGO, RP, and JML have been involved in critical revision of the manuscript.
Data availability
Data availability
Transcriptomic data are available in public, open access repositories. RNAseq data from samples collected in the in-house training cohort was deposited at the European Genome-Phenome Archive (EGA; Accession code EGAS00001005364), at GEO with accession number GSE63898 and GSE20140. RNAseq data from samples collected in the in-house advanced HCC cohort was deposited at EGA (EGAS00001005477). RNAseq data from the TCGA-LIHC cohort is publicly available (https://tcga-data.nci.nih.gov/).
Transcriptomic data are available in public, open access repositories. RNAseq data from samples collected in the in-house training cohort was deposited at the European Genome-Phenome Archive (EGA; Accession code EGAS00001005364), at GEO with accession number GSE63898 and GSE20140. RNAseq data from samples collected in the in-house advanced HCC cohort was deposited at EGA (EGAS00001005477). RNAseq data from the TCGA-LIHC cohort is publicly available (https://tcga-data.nci.nih.gov/).
Conflict of interest
Conflict of interest
EM has received speaker fees from Roche and Sirtex, and travel funding from MSD and Roche. MK declares honorary talks and travel support from Bracco Imaging and Canon Medical, furthermore he received a research grant by Bracco Imaging. JNK declares consulting services for Bioptimus, France; Panakeia, UK; AstraZeneca, UK; and MultiplexDx, Slovakia. Furthermore, he holds shares in StratifAI GmbH, Germany, Synagen GmbH, Germany, and Ignition Lab, Germany; has received an institutional research grant by GSK; and has received honoraria by AstraZeneca, Bayer, Daiichi Sankyo, Eisai, Janssen, Merck, MSD, BMS, Roche, Pfizer and Fresenius. TL declares consulting fees from AstraZeneca, BMS, EISAI, Incyte, MSD, Roche, HepaRegeniX and honorary talks and travel support from Abbvie and Gilead. JML is receiving research support, consulting fees from Eisai Inc., Merck, Roche, Genentech, AstraZeneca, Bayer Pharmaceuticals, Abbvie, Sanofi, Moderna, Glycotest, Exelixis, and Boehringer Ingelhim and Data Safety Monitoring Board for Bristol Myers Squibb. Other authors have no conflicts of interest to declare.
Please refer to the accompanying ICMJE disclosure forms for further details.
EM has received speaker fees from Roche and Sirtex, and travel funding from MSD and Roche. MK declares honorary talks and travel support from Bracco Imaging and Canon Medical, furthermore he received a research grant by Bracco Imaging. JNK declares consulting services for Bioptimus, France; Panakeia, UK; AstraZeneca, UK; and MultiplexDx, Slovakia. Furthermore, he holds shares in StratifAI GmbH, Germany, Synagen GmbH, Germany, and Ignition Lab, Germany; has received an institutional research grant by GSK; and has received honoraria by AstraZeneca, Bayer, Daiichi Sankyo, Eisai, Janssen, Merck, MSD, BMS, Roche, Pfizer and Fresenius. TL declares consulting fees from AstraZeneca, BMS, EISAI, Incyte, MSD, Roche, HepaRegeniX and honorary talks and travel support from Abbvie and Gilead. JML is receiving research support, consulting fees from Eisai Inc., Merck, Roche, Genentech, AstraZeneca, Bayer Pharmaceuticals, Abbvie, Sanofi, Moderna, Glycotest, Exelixis, and Boehringer Ingelhim and Data Safety Monitoring Board for Bristol Myers Squibb. Other authors have no conflicts of interest to declare.
Please refer to the accompanying ICMJE disclosure forms for further details.
출처: PubMed Central (JATS). 라이선스는 원 publisher 정책을 따릅니다 — 인용 시 원문을 표기해 주세요.
🏷️ 같은 키워드 · 무료전문 — 이 논문 MeSH/keyword 기반
- Clinical and Liquid Biomarkers of 20-Year Prostate Cancer Risk in Men Aged 45 to 70 Years.
- Association of patient health education with the postoperative health related quality of life in low- intermediate recurrence risk differentiated thyroid cancer patients.
- SpNeigh: spatial neighborhood and differential expression analysis for high-resolution spatial transcriptomics.
- Overall survival and prognostic factors in young women with breast cancer: a retrospective cohort study from Southern Thailand.
- Age at First Pregnancy, Adult Weight Gain and Postmenopausal Breast Cancer Risk: The PROCAS Study (United Kingdom).
- Overcoming Chemoresistance in Glioblastoma: Mechanisms, Therapeutic Strategies, and Functional Precision Medicine.