Development and validation of a SHAP-explainable preoperative predictive model for microvascular invasion in hepatocellular carcinoma.
1/5 보강
PICO 자동 추출 (휴리스틱, conf 3/4)
유사 논문P · Population 대상 환자/모집단
308 patients with HCC (132 MVI-positive and 176 MVI-negative) who underwent curative hepatectomy from January 2020 to December 2023, randomly divided into training ( = 216) and validation ( = 92) cohorts (7:3 ratio).
I · Intervention 중재 / 시술
curative hepatectomy from January 2020 to December 2023, randomly divided into training ( = 216) and validation ( = 92) cohorts (7:3 ratio)
C · Comparison 대조 / 비교
추출되지 않음
O · Outcome 결과 / 결론
[CONCLUSION] The XGBoost model accurately predicted MVI preoperatively in HCC, with SHAP-based interpretability supporting personalised surgical decision-making. [SUPPLEMENTARY INFORMATION] The online version contains supplementary material available at 10.1186/s12885-026-15839-0.
[BACKGROUND] Hepatocellular carcinoma (HCC) exhibits high recurrence rates despite curative treatments.
APA
Guo J, Tang H, et al. (2026). Development and validation of a SHAP-explainable preoperative predictive model for microvascular invasion in hepatocellular carcinoma.. BMC cancer, 26(1). https://doi.org/10.1186/s12885-026-15839-0
MLA
Guo J, et al.. "Development and validation of a SHAP-explainable preoperative predictive model for microvascular invasion in hepatocellular carcinoma.." BMC cancer, vol. 26, no. 1, 2026.
PMID
41803754 ↗
Abstract 한글 요약
[BACKGROUND] Hepatocellular carcinoma (HCC) exhibits high recurrence rates despite curative treatments. Microvascular invasion (MVI) is a key predictor; however, current diagnostic methods are invasive and delayed. We aimed to develop explainable machine learning models for non-invasive, preoperative MVI prediction.
[METHODS] We retrospectively analysed 308 patients with HCC (132 MVI-positive and 176 MVI-negative) who underwent curative hepatectomy from January 2020 to December 2023, randomly divided into training ( = 216) and validation ( = 92) cohorts (7:3 ratio). Independent risk factors were identified using univariate and multivariate logistic regression analyses. The least absolute shrinkage and selection operator (LASSO) regression was used to select predictive features. Ten machine learning models were constructed and evaluated using receiver operating characteristic (ROC), calibration, and decision curves. Model explainability was assessed using SHapley Additive exPlanations (SHAP).
[RESULTS] Hepatitis viral load, alpha-fetoprotein, gamma-glutamyl transferase level, tumour size, and radiogenomic venous invasion (RVI) were significant independent risk factors for MVI. LASSO regression identified 12 key features. The extreme gradient boosting (XGBoost) model performed best, with a training set area under the ROC curve (AUC) of 0.852. The accuracy, sensitivity, specificity, and F1 score were 0.792, 0.812, 0.775, and 0.776, respectively. The validation set AUC was 0.815. The accuracy, sensitivity, specificity, and F1 score were 0.750, 0.677, 0.805, and 0.700, respectively. SHAP revealed hepatitis viral load, RVI, alpha-fetoprotein, tumour size, and pseudocapsule integrity as the most influential predictors.
[CONCLUSION] The XGBoost model accurately predicted MVI preoperatively in HCC, with SHAP-based interpretability supporting personalised surgical decision-making.
[SUPPLEMENTARY INFORMATION] The online version contains supplementary material available at 10.1186/s12885-026-15839-0.
[METHODS] We retrospectively analysed 308 patients with HCC (132 MVI-positive and 176 MVI-negative) who underwent curative hepatectomy from January 2020 to December 2023, randomly divided into training ( = 216) and validation ( = 92) cohorts (7:3 ratio). Independent risk factors were identified using univariate and multivariate logistic regression analyses. The least absolute shrinkage and selection operator (LASSO) regression was used to select predictive features. Ten machine learning models were constructed and evaluated using receiver operating characteristic (ROC), calibration, and decision curves. Model explainability was assessed using SHapley Additive exPlanations (SHAP).
[RESULTS] Hepatitis viral load, alpha-fetoprotein, gamma-glutamyl transferase level, tumour size, and radiogenomic venous invasion (RVI) were significant independent risk factors for MVI. LASSO regression identified 12 key features. The extreme gradient boosting (XGBoost) model performed best, with a training set area under the ROC curve (AUC) of 0.852. The accuracy, sensitivity, specificity, and F1 score were 0.792, 0.812, 0.775, and 0.776, respectively. The validation set AUC was 0.815. The accuracy, sensitivity, specificity, and F1 score were 0.750, 0.677, 0.805, and 0.700, respectively. SHAP revealed hepatitis viral load, RVI, alpha-fetoprotein, tumour size, and pseudocapsule integrity as the most influential predictors.
[CONCLUSION] The XGBoost model accurately predicted MVI preoperatively in HCC, with SHAP-based interpretability supporting personalised surgical decision-making.
[SUPPLEMENTARY INFORMATION] The online version contains supplementary material available at 10.1186/s12885-026-15839-0.
🏷️ 키워드 / MeSH 📖 같은 키워드 OA만
같은 제1저자의 인용 많은 논문 (5)
- A multifunctional PD-L1 modulator for metabolic reprogramming to induce pyroptosis and enhance glutamine inhibition-mediated antitumor immunotherapy.
- Absorbable microspheres co-delivering HIF-1α inhibitor augment transarterial chemoembolization by reversing tumor hypoxia and immunosuppression.
- Pan-cancer and multi-omics analyses reveal the diagnostic and prognostic value of BAZ2B in cancer.
- The origin of hepatocellular carcinoma depends on metabolic zonation.
- The dual roles of ferroptosis in digestive tract tumors: mechanisms, microenvironment regulation, and therapeutic integration with emphasis on immune interactions.
📖 전문 본문 읽기 PMC JATS · ~92 KB · 영문
Introduction
Introduction
Hepatocellular carcinoma (HCC) is the fifth most common malignancy in China and the second leading cause of mortality [1]. Currently, for patients with early- to mid-stage HCC who meet the Milan criteria, curative hepatectomy and liver transplantation are potential curative treatment options [2, 3]. However, the 5-year tumour recurrence rate after surgical resection is as high as 70%, and the recurrence rate after transplantation reaches 35% [4, 5]. These high recurrence rates lead to short survival periods and poor prognoses for patients.
Currently, microvascular invasion (MVI) is a key pathological feature guiding HCC recurrence prevention and treatment. MVI is a marker of the aggressive behaviour of HCC and is closely associated with tumour recurrence, metastasis, and postoperative survival [6]. Invasion is commonly found in heterogeneous representative areas at the tumour periphery. In HCC, these regions contain concentrated populations of highly invasive cells and are hotspots for satellite nodule formation. During metastasis, tumour cells first invade adjacent microvessels, and utilise them to metastasise. The incidence of MVI in patients with HCC ranges from 34.6% to 70.4% [7, 8]. However, a definitive diagnosis of MVI can only be achieved through a postoperative histopathological examination or a liver biopsy. Real-time guidance for pre- and intraoperative treatment decisions made from histopathological examination is hindered by the delay caused while receiving a postoperative pathological diagnosis. Furthermore, the inherent subjectivity in specimen collection and invasive nature of liver biopsy limit its clinical utility. Therefore, an urgent need exists for a reliable and non-invasive method that can precisely and preoperatively predict MVI.
Although current studies have developed MVI risk prediction models based on clinical features, most existing methods are limited to traditional linear modelling paradigms, such as logistic regression. These methods have a limited ability to capture complex, nonlinear relationships [9, 10]. Notably, machine learning (ML) algorithms offer significant advantages in managing high-dimensional feature interactions for medical prediction. However, current research on clinical prediction models for MVI has yet to incorporate ML techniques to enhance predictive performance. Moreover, existing models often lack explainability. Therefore, they fail to fully capture the complex relationships between variables and provide clinically actionable explanations.
To address these technical challenges, in this study, we innovatively constructed a multimodal fusion prediction framework by integrating preoperative clinical and computed tomography (CT) imaging features. Using a heterogeneous ML algorithm ensemble strategy, we developed an MVI prediction model with clinical explainability. The study focused on overcoming the limitations of the linear assumptions of traditional models by employing various ensemble learning algorithms to model the nonlinear interactive effects among high-dimensional features. The SHapley Additive exPlanations (SHAP) explainability framework was utilised to analyse feature contribution values quantitatively. This framework enabled visual mapping of the predictive decision-making process.
Hepatocellular carcinoma (HCC) is the fifth most common malignancy in China and the second leading cause of mortality [1]. Currently, for patients with early- to mid-stage HCC who meet the Milan criteria, curative hepatectomy and liver transplantation are potential curative treatment options [2, 3]. However, the 5-year tumour recurrence rate after surgical resection is as high as 70%, and the recurrence rate after transplantation reaches 35% [4, 5]. These high recurrence rates lead to short survival periods and poor prognoses for patients.
Currently, microvascular invasion (MVI) is a key pathological feature guiding HCC recurrence prevention and treatment. MVI is a marker of the aggressive behaviour of HCC and is closely associated with tumour recurrence, metastasis, and postoperative survival [6]. Invasion is commonly found in heterogeneous representative areas at the tumour periphery. In HCC, these regions contain concentrated populations of highly invasive cells and are hotspots for satellite nodule formation. During metastasis, tumour cells first invade adjacent microvessels, and utilise them to metastasise. The incidence of MVI in patients with HCC ranges from 34.6% to 70.4% [7, 8]. However, a definitive diagnosis of MVI can only be achieved through a postoperative histopathological examination or a liver biopsy. Real-time guidance for pre- and intraoperative treatment decisions made from histopathological examination is hindered by the delay caused while receiving a postoperative pathological diagnosis. Furthermore, the inherent subjectivity in specimen collection and invasive nature of liver biopsy limit its clinical utility. Therefore, an urgent need exists for a reliable and non-invasive method that can precisely and preoperatively predict MVI.
Although current studies have developed MVI risk prediction models based on clinical features, most existing methods are limited to traditional linear modelling paradigms, such as logistic regression. These methods have a limited ability to capture complex, nonlinear relationships [9, 10]. Notably, machine learning (ML) algorithms offer significant advantages in managing high-dimensional feature interactions for medical prediction. However, current research on clinical prediction models for MVI has yet to incorporate ML techniques to enhance predictive performance. Moreover, existing models often lack explainability. Therefore, they fail to fully capture the complex relationships between variables and provide clinically actionable explanations.
To address these technical challenges, in this study, we innovatively constructed a multimodal fusion prediction framework by integrating preoperative clinical and computed tomography (CT) imaging features. Using a heterogeneous ML algorithm ensemble strategy, we developed an MVI prediction model with clinical explainability. The study focused on overcoming the limitations of the linear assumptions of traditional models by employing various ensemble learning algorithms to model the nonlinear interactive effects among high-dimensional features. The SHapley Additive exPlanations (SHAP) explainability framework was utilised to analyse feature contribution values quantitatively. This framework enabled visual mapping of the predictive decision-making process.
Material and methods
Material and methods
This retrospective study involved standard care performed at a single medical institution. Ethics committee approval was granted by the local institutional ethics review board (LL-KY-2025126–01), and the requirement for written informed consent was waived. All procedures involving human participants were performed in accordance with the 1975 Helsinki declaration and its later amendments.
Patients
This study consecutively enrolled patients who underwent curative hepatectomy at our hospital between January 2020 and December 2023, and were postoperatively confirmed diagnosed with HCC through histopathological examinations. The inclusion criteria were as follows: postoperative pathological diagnosis of HCC with a clearly recorded status of MVI as positive or negative; a dynamic contrast-enhanced CT scan of the abdomen, including the arterial, portal venous, and delayed phases, within 2 weeks prior to surgery, with imaging parameters compliant with the Liver Imaging Reporting and Data System, version 2018 [11]; a whole-body Positron Emission Tomography–CT or bone scan to exclude the presence of extrahepatic metastasis; and no history of other malignant tumours.
The exclusion criteria were as follows: non-surgical treatments prior to surgery, such as transarterial chemoembolisation, radiofrequency ablation, molecular targeted therapy, or immunotherapy; presence of multiple intrahepatic lesions across different lobes, or distant metastasis; presence of macroscopic vascular invasion; and lack of complete clinicopathological data. A total of 426 patients diagnosed with HCC underwent surgical resection. Those who underwent other treatments prior to surgery (n = 45), had incomplete data (n = 36), concomitantly presented with other malignant tumours (n = 21), and had other reasons (n = 16) were excluded. Of the remaining 308 patients with HCC who met the inclusion and exclusion criteria, 132 (42.90%) and 176 (57.10%) were positive and negative for MVI, respectively. The patients were randomly assigned to training (n = 216) and validation (n = 92) sets using a random seed method in a ratio of approximately 7 to 3. Fig. 1 presents a flow chart detailing the procedure for participant selection.
Clinical features and data processing
Structured data were extracted from the hospital information system and the picture archiving and communication system. These data included basic patient information, laboratory markers, imaging features, and postoperative pathological data. All patients underwent surgical treatment within 2 weeks following the completion of laboratory testing and contrast-enhanced CT scans.
Basic Information
Basic information included the patient’s sex, age, cirrhosis status, and immunological markers for hepatitis B virus (HBV) and or hepatitis C virus (HCV).
Laboratory markers
Laboratory markers comprised hepatitis viral load, alpha-fetoprotein (AFP), prothrombin time, alanine aminotransferase, aspartate aminotransferase, total bilirubin, gamma-glutamyl transferase (GGT) level, Child–Pugh classification, platelet count, neutrophil count, and lymphocyte count. Due to significant right-skewness (Shapiro–Wilk test, p < 0.001), hepatitis viral load, AFP, and GGT were log10-transformed (log10(variable + 1)) to normalize their distribution before subsequent statistical analysis and model construction. The following inflammatory indicators were also calculated: neutrophil to lymphocyte ratio, platelet to lymphocyte ratio, and systemic immune–inflammation index (SII), where SII was calculated as: SII = (platelet count × neutrophil count)/lymphocyte count.
CT imaging characteristics
These included tumour size, number of tumours, tumour margin status, pseudocapsule integrity, arterial-phase peritumoural hypodensity (APPH), intratumoral necrosis and radiogenomic venous invasion (RVI) (Fig. 2). Tumor size was defined as the maximum diameter measured on arterial-phase contrast-enhanced CT images. Tumor margin status was assessed on portal venous-phase CT images and classified as either smooth or non-smooth. Intratumoral necrosis was defined as an irregular, non-enhancing area within the tumour on contrast-enhanced CT across all phases (arterial, portal venous, and delayed). RVI is a non-invasive imaging marker based on enhanced CT, identified by any one of three characteristic features: internal arteries, a feature which refers to discrete arterial enhancement persisting within the tumour during the venous phase of imaging; hypodense halo, a low-density rim partially or completely surrounding the tumour; and tumour–liver difference, which is a sharp focal or circumferential attenuation transition between the tumour and adjacent liver parenchyma in the absence of a hypodense halo [12]. All abdominal CT images were analysed according to a strict double-blind independent review process. A junior radiologist with 5 years of experience in abdominal radiodiagnosis conducted the preliminary analysis. Subsequently, a senior radiologist with over 10 years of experience reviewed the images. In cases of disagreement, consensus was reached through discussion. If multiple tumours were present, only the imaging features of the largest tumour were analysed. Inter-observer agreement was quantified using intraclass correlation coefficient (ICC) for continuous variables (tumour size) and kappa value for categorical variables (pseudocapsule integrity, tumour margin status, intratumoral necrosis, APPH, RVI). All ICC and kappa values exceeded 0.80, indicating excellent consistency.
Pathological histological parameters
These included tumour size, number of tumours, cirrhosis status, and MVI status. The pathological diagnostic criteria for MVI include the microscopic observation of cancer cell nests within endothelial cell-lined vascular channels, primarily in the peritumoural portal vein branches (including capsular vessels) [13, 14]. All postoperative pathological specimens were reviewed via double-blind independent assessment. A junior pathologist conducted the primary examination. Subsequently, a senior pathologist re-examined the specimens. Focus was placed on evaluating the MVI status.
Selection of variables
Least absolute shrinkage and selection operator (LASSO) regression analysis was employed to perform dimensionality reduction on the high-dimensional feature set and select the most valuable features. This approach enhances the predictive performance and stability of the model. As a mainstream ML feature selection method, LASSO regression compresses the feature coefficients through L1 regularization, which can effectively handle mild correlation between features, eliminate redundant variables, and reduce the impact of multicollinearity on the model. The regularisation process selects the optimal feature combination to maximise model performance. During the LASSO regression process, a tenfold cross-validation with 10 iterations was used to assess model performance. This strategy ensured the robustness of the process for selecting variables and the generalisability of the model. All feature selection steps were strictly restricted to the training set to avoid data leakage.
Model development and validation
Models were constructed using 10 representative ML algorithms: logistic regression, naive Bayes, support vector machine, k-nearest neighbour, random forest, extremely randomised trees, extreme gradient boosting (XGBoost), gradient boosting, adaptive boosting, and multilayer perceptron. These algorithms were selected to encompass a full spectrum of classical and state-of-the-art machine learning paradigms (linear, Bayesian, instance-based, tree-based ensembles, and neural networks). This approach allows for a comprehensive evaluation to identify the most suitable algorithm for our dataset. To optimise model performance, a tenfold cross-validation with 10 iterations was employed, alongside the default hyperparameter grid search provided by the caret package, to determine the final hyperparameters for each model on the optimal feature subset. Subsequently, models were refitted on the training set through the optimal feature subset and the final hyperparameters. All hyperparameter tuning was confined to the training set using cross-validation. The validation set was used only for final, independent evaluation.
Model performance comparison
The predictive performance of the models was assessed using receiver operating characteristic curves. Metrics, such as the area under the receiver operating characteristic curve (AUC), accuracy, sensitivity, specificity, positive predictive value, negative predictive value, recall, and F1 score were calculated. Sensitivity, specificity, and F1 scores were calculated using a classification threshold of 0.5. The model is positioned as a risk-stratification tool to inform clinical decision-making. Based on the overall performance of these evaluation metrics on both the training and validation sets, the best predictive model was selected. Calibration curves and the Hosmer–Lemeshow goodness of fit test were employed to assess the calibration of the model. Additional calibration metrics including the calibration slope, calibration intercept, and Brier score were reported. This involved evaluating the agreement between observed and predicted values. The clinical net benefit of the model at different decision thresholds was evaluated using decision curve analysis. The decision curve analysis evaluates net benefit across a clinically relevant threshold probability range of 0.10–0.90, which corresponds to scenarios where clinicians might consider adjuvant therapy or extended surgical margins.
Model explanation
The SHAP method is a framework designed to provide explainability for ML models. The SHAP algorithm calculates the importance of each predictor variable for the predictive model. A SHAP feature importance plot is generated by ranking predictor variables based on the average absolute SHAP values in descending order. This ranking indicates the contribution of each predictor variable to the overall model. Larger absolute SHAP values represent greater importance and signify a stronger influence on the output of the model. In the plot, each point represents a sample corresponding to a specific feature. The position of the point reflects the SHAP value for that sample. This value indicates the extent of the contribution of the feature to the risk model. Thus, by calculating the contribution of each feature to the predictive outcome, the SHAP method provides both local and global explanations. This, in turn, enhances the transparency and explainability of the model.
Statistical analysis
Statistical analysis was conducted using the SPSS Statistics for Windows, version 25.0 (IBM Corp., Armonk, NY, USA). For quantitative data, different descriptive methods were employed according to their distribution characteristics. Data following a normal distribution are expressed as mean ± standard deviation, whereas data not following a normal distribution are expressed as the median and interquartile range (M [Q1, Q3]). Qualitative data are described with frequency (n) and percentage (%). For comparisons between groups, an independent samples t-test was used for quantitative data following a normal distribution whereas the Mann–Whitney U test was applied to non-normally distributed data. Comparisons of qualitative data between groups were conducted using the χ2 test. Standardised mean differences (SMD) were calculated to assess baseline balance between cohorts. Univariate and multivariate logistic regression analyses were used to identify independent risk factors for MVI. Multivariate logistic regression adopted the forward stepwise selection method (inclusion criterion p < 0.05). Bonferroni correction was used for multiple comparisons. Multicollinearity was assessed using variance inflation factor (VIF), with VIF > 10 indicating significant collinearity. To ensure the reliability of the evaluation results, the intraclass correlation coefficient was used to analyse the consistency of the evaluations conducted by two physicians. Both intra- and inter-group intraclass correlation coefficient values were calculated. An intraclass correlation coefficient greater than 0.75 indicates good consistency between evaluators. In the ML analysis, the R software (version 3.6.1; R Foundation for Statistical Computing, Vienna, Austria) was used to implement the various ML algorithms. These algorithms included logistic regression, naive Bayes, support vector machine, k-nearest neighbour, random forest, extremely randomised trees, XGBoost, gradient boosting, adaptive boosting, and multilayer perceptron. All statistical analyses were considered significant at p < 0.05.
This retrospective study involved standard care performed at a single medical institution. Ethics committee approval was granted by the local institutional ethics review board (LL-KY-2025126–01), and the requirement for written informed consent was waived. All procedures involving human participants were performed in accordance with the 1975 Helsinki declaration and its later amendments.
Patients
This study consecutively enrolled patients who underwent curative hepatectomy at our hospital between January 2020 and December 2023, and were postoperatively confirmed diagnosed with HCC through histopathological examinations. The inclusion criteria were as follows: postoperative pathological diagnosis of HCC with a clearly recorded status of MVI as positive or negative; a dynamic contrast-enhanced CT scan of the abdomen, including the arterial, portal venous, and delayed phases, within 2 weeks prior to surgery, with imaging parameters compliant with the Liver Imaging Reporting and Data System, version 2018 [11]; a whole-body Positron Emission Tomography–CT or bone scan to exclude the presence of extrahepatic metastasis; and no history of other malignant tumours.
The exclusion criteria were as follows: non-surgical treatments prior to surgery, such as transarterial chemoembolisation, radiofrequency ablation, molecular targeted therapy, or immunotherapy; presence of multiple intrahepatic lesions across different lobes, or distant metastasis; presence of macroscopic vascular invasion; and lack of complete clinicopathological data. A total of 426 patients diagnosed with HCC underwent surgical resection. Those who underwent other treatments prior to surgery (n = 45), had incomplete data (n = 36), concomitantly presented with other malignant tumours (n = 21), and had other reasons (n = 16) were excluded. Of the remaining 308 patients with HCC who met the inclusion and exclusion criteria, 132 (42.90%) and 176 (57.10%) were positive and negative for MVI, respectively. The patients were randomly assigned to training (n = 216) and validation (n = 92) sets using a random seed method in a ratio of approximately 7 to 3. Fig. 1 presents a flow chart detailing the procedure for participant selection.
Clinical features and data processing
Structured data were extracted from the hospital information system and the picture archiving and communication system. These data included basic patient information, laboratory markers, imaging features, and postoperative pathological data. All patients underwent surgical treatment within 2 weeks following the completion of laboratory testing and contrast-enhanced CT scans.
Basic Information
Basic information included the patient’s sex, age, cirrhosis status, and immunological markers for hepatitis B virus (HBV) and or hepatitis C virus (HCV).
Laboratory markers
Laboratory markers comprised hepatitis viral load, alpha-fetoprotein (AFP), prothrombin time, alanine aminotransferase, aspartate aminotransferase, total bilirubin, gamma-glutamyl transferase (GGT) level, Child–Pugh classification, platelet count, neutrophil count, and lymphocyte count. Due to significant right-skewness (Shapiro–Wilk test, p < 0.001), hepatitis viral load, AFP, and GGT were log10-transformed (log10(variable + 1)) to normalize their distribution before subsequent statistical analysis and model construction. The following inflammatory indicators were also calculated: neutrophil to lymphocyte ratio, platelet to lymphocyte ratio, and systemic immune–inflammation index (SII), where SII was calculated as: SII = (platelet count × neutrophil count)/lymphocyte count.
CT imaging characteristics
These included tumour size, number of tumours, tumour margin status, pseudocapsule integrity, arterial-phase peritumoural hypodensity (APPH), intratumoral necrosis and radiogenomic venous invasion (RVI) (Fig. 2). Tumor size was defined as the maximum diameter measured on arterial-phase contrast-enhanced CT images. Tumor margin status was assessed on portal venous-phase CT images and classified as either smooth or non-smooth. Intratumoral necrosis was defined as an irregular, non-enhancing area within the tumour on contrast-enhanced CT across all phases (arterial, portal venous, and delayed). RVI is a non-invasive imaging marker based on enhanced CT, identified by any one of three characteristic features: internal arteries, a feature which refers to discrete arterial enhancement persisting within the tumour during the venous phase of imaging; hypodense halo, a low-density rim partially or completely surrounding the tumour; and tumour–liver difference, which is a sharp focal or circumferential attenuation transition between the tumour and adjacent liver parenchyma in the absence of a hypodense halo [12]. All abdominal CT images were analysed according to a strict double-blind independent review process. A junior radiologist with 5 years of experience in abdominal radiodiagnosis conducted the preliminary analysis. Subsequently, a senior radiologist with over 10 years of experience reviewed the images. In cases of disagreement, consensus was reached through discussion. If multiple tumours were present, only the imaging features of the largest tumour were analysed. Inter-observer agreement was quantified using intraclass correlation coefficient (ICC) for continuous variables (tumour size) and kappa value for categorical variables (pseudocapsule integrity, tumour margin status, intratumoral necrosis, APPH, RVI). All ICC and kappa values exceeded 0.80, indicating excellent consistency.
Pathological histological parameters
These included tumour size, number of tumours, cirrhosis status, and MVI status. The pathological diagnostic criteria for MVI include the microscopic observation of cancer cell nests within endothelial cell-lined vascular channels, primarily in the peritumoural portal vein branches (including capsular vessels) [13, 14]. All postoperative pathological specimens were reviewed via double-blind independent assessment. A junior pathologist conducted the primary examination. Subsequently, a senior pathologist re-examined the specimens. Focus was placed on evaluating the MVI status.
Selection of variables
Least absolute shrinkage and selection operator (LASSO) regression analysis was employed to perform dimensionality reduction on the high-dimensional feature set and select the most valuable features. This approach enhances the predictive performance and stability of the model. As a mainstream ML feature selection method, LASSO regression compresses the feature coefficients through L1 regularization, which can effectively handle mild correlation between features, eliminate redundant variables, and reduce the impact of multicollinearity on the model. The regularisation process selects the optimal feature combination to maximise model performance. During the LASSO regression process, a tenfold cross-validation with 10 iterations was used to assess model performance. This strategy ensured the robustness of the process for selecting variables and the generalisability of the model. All feature selection steps were strictly restricted to the training set to avoid data leakage.
Model development and validation
Models were constructed using 10 representative ML algorithms: logistic regression, naive Bayes, support vector machine, k-nearest neighbour, random forest, extremely randomised trees, extreme gradient boosting (XGBoost), gradient boosting, adaptive boosting, and multilayer perceptron. These algorithms were selected to encompass a full spectrum of classical and state-of-the-art machine learning paradigms (linear, Bayesian, instance-based, tree-based ensembles, and neural networks). This approach allows for a comprehensive evaluation to identify the most suitable algorithm for our dataset. To optimise model performance, a tenfold cross-validation with 10 iterations was employed, alongside the default hyperparameter grid search provided by the caret package, to determine the final hyperparameters for each model on the optimal feature subset. Subsequently, models were refitted on the training set through the optimal feature subset and the final hyperparameters. All hyperparameter tuning was confined to the training set using cross-validation. The validation set was used only for final, independent evaluation.
Model performance comparison
The predictive performance of the models was assessed using receiver operating characteristic curves. Metrics, such as the area under the receiver operating characteristic curve (AUC), accuracy, sensitivity, specificity, positive predictive value, negative predictive value, recall, and F1 score were calculated. Sensitivity, specificity, and F1 scores were calculated using a classification threshold of 0.5. The model is positioned as a risk-stratification tool to inform clinical decision-making. Based on the overall performance of these evaluation metrics on both the training and validation sets, the best predictive model was selected. Calibration curves and the Hosmer–Lemeshow goodness of fit test were employed to assess the calibration of the model. Additional calibration metrics including the calibration slope, calibration intercept, and Brier score were reported. This involved evaluating the agreement between observed and predicted values. The clinical net benefit of the model at different decision thresholds was evaluated using decision curve analysis. The decision curve analysis evaluates net benefit across a clinically relevant threshold probability range of 0.10–0.90, which corresponds to scenarios where clinicians might consider adjuvant therapy or extended surgical margins.
Model explanation
The SHAP method is a framework designed to provide explainability for ML models. The SHAP algorithm calculates the importance of each predictor variable for the predictive model. A SHAP feature importance plot is generated by ranking predictor variables based on the average absolute SHAP values in descending order. This ranking indicates the contribution of each predictor variable to the overall model. Larger absolute SHAP values represent greater importance and signify a stronger influence on the output of the model. In the plot, each point represents a sample corresponding to a specific feature. The position of the point reflects the SHAP value for that sample. This value indicates the extent of the contribution of the feature to the risk model. Thus, by calculating the contribution of each feature to the predictive outcome, the SHAP method provides both local and global explanations. This, in turn, enhances the transparency and explainability of the model.
Statistical analysis
Statistical analysis was conducted using the SPSS Statistics for Windows, version 25.0 (IBM Corp., Armonk, NY, USA). For quantitative data, different descriptive methods were employed according to their distribution characteristics. Data following a normal distribution are expressed as mean ± standard deviation, whereas data not following a normal distribution are expressed as the median and interquartile range (M [Q1, Q3]). Qualitative data are described with frequency (n) and percentage (%). For comparisons between groups, an independent samples t-test was used for quantitative data following a normal distribution whereas the Mann–Whitney U test was applied to non-normally distributed data. Comparisons of qualitative data between groups were conducted using the χ2 test. Standardised mean differences (SMD) were calculated to assess baseline balance between cohorts. Univariate and multivariate logistic regression analyses were used to identify independent risk factors for MVI. Multivariate logistic regression adopted the forward stepwise selection method (inclusion criterion p < 0.05). Bonferroni correction was used for multiple comparisons. Multicollinearity was assessed using variance inflation factor (VIF), with VIF > 10 indicating significant collinearity. To ensure the reliability of the evaluation results, the intraclass correlation coefficient was used to analyse the consistency of the evaluations conducted by two physicians. Both intra- and inter-group intraclass correlation coefficient values were calculated. An intraclass correlation coefficient greater than 0.75 indicates good consistency between evaluators. In the ML analysis, the R software (version 3.6.1; R Foundation for Statistical Computing, Vienna, Austria) was used to implement the various ML algorithms. These algorithms included logistic regression, naive Bayes, support vector machine, k-nearest neighbour, random forest, extremely randomised trees, XGBoost, gradient boosting, adaptive boosting, and multilayer perceptron. All statistical analyses were considered significant at p < 0.05.
Results
Results
Baseline clinical information
In total, 308 patients with HCC who underwent curative hepatectomy were included in this study to construct the model cohort. This cohort was divided into training and validation sets comprising 216 and 92 patients, respectively. A comparison of baseline characteristics between the training and validation sets revealed no significant differences for most baseline characteristics (p > 0.05; Table 1). There were statistically significant differences in sex (p = 0.010), GGT level (p = 0.030), lymphocyte count (p = 0.040), and tumor number (p = 0.005) between the training and validation sets. All these variables were included in the model feature pool and screened by LASSO regression, and the model could capture and correct such baseline bias by learning the nonlinear relationships among variables. We further calculated the SMD to quantify the degree of baseline imbalance, with SMD values of 0.32, 0.30, 0.29, and 0.27 for the above variables, all < 0.5, indicating that the baseline imbalance was within an acceptable range and had a limited impact on the model validation results.
Independent risk factors
The study systematically explored independent risk factors for MVI across the entire cohort. Univariate logistic regression analysis identified 12 potential risk factors, and multivariate logistic regression analysis ultimately determined five independent risk factors associated with MVI: hepatitis viral load, AFP, GGT level, tumour size, and RVI (p < 0.05; Table 2). Multicollinearity assessment showed that the VIF values of all variables included in the multivariate regression were < 3, indicating no significant multicollinearity (Table 2).
For example, each 1-log10 increase in hepatitis viral load (i.e., a tenfold increase in original viral load) was associated with an 89% increased risk of MVI (OR = 1.89, 95%CI:1.32–2.70, p = 0.022).
Selection of predictor variables
The LASSO regression algorithm was employed to select predictor variables. The regularisation parameter λ was optimised based on tenfold cross-validation to achieve a dimensionality reduction in the high-dimensional feature space and select key predictor variables. A LASSO penalised logistic regression model (α = 1) was constructed to apply sparse modelling to the 25 clinical, imaging, and molecular features included initially. Regularisation path analysis (Fig. 3) indicated that when the λ value was set to the optimal solution corresponding to the 1-standard error criterion—specifically, λ = 0.0256—a total of 12 non-zero coefficient features were selected. These features included hepatitis viral load, RVI, pseudocapsule integrity, AFP, tumour size, GGT level, APPH, lymphocyte count, SII, number of tumours, intratumoural necrosis, and sex. The subset of features selected was then used to develop the ML predictive model.
Model development, performance comparison, and sensitivity analysis
Ten ML predictive models were constructed through tenfold cross-validation with 10 iterations, and receiver operating characteristic curves were plotted (Fig. 4). The AUC, accuracy, sensitivity, and specificity of the predictive models for both the training and validation sets were calculated (Table 3). The AUCs for the XGBoost, adaptive boosting, gradient boosting, extremely randomised trees, random forest, logistic regression, naive Bayes, multilayer perceptron, k-nearest neighbour, and support vector machine models were 0.852 (95% confidence interval [CI], 0.756–0.947), 0.825 (95% CI, 0.729–0.922), 0.818 (95% CI, 0.717–0.919), 0.810 (95% CI, 0.711–0.909), 0.804 (95% CI, 0.695–0.913), 0.784 (95% CI, 0.674–0.894), 0.772 (95% CI, 0.665–0.879), 0.771 (95% CI, 0.652–0.891), 0.760 (95% CI, 0.651–0.869), and 0.727 (95% CI, 0.600–0.855), respectively. The highest AUC value was achieved with the XGBoost model, which demonstrated high accuracy (0.792), sensitivity (0.812), specificity (0.775), and F1 score (0.776). The AUC value for the validation set was 0.815 (95% CI, 0.719–0.912). The accuracy, sensitivity, specificity, and F1 score for the validation set were 0.750, 0.677, 0.805, and 0.700, respectively. The moderate sensitivity (0.677) in the validation set indicates the model is conservative, prioritizing specificity to reduce false positives, which is suitable for a risk-stratification tool. Therefore, the XGBoost model was selected as the optimal ML predictive model. The model calibration curve (Fig. 5) showed that the predicted values were generally consistent with the observed values, and the model passed the Hosmer–Lemeshow goodness of fit test (χ2 = 3.538; p = 0.895). Additional calibration metrics were: calibration slope = 0.92, calibration intercept = 0.08, and Brier score = 0.16. The waterfall plot of sample prediction scores (Fig. 6) demonstrated that the XGBoost model’s predicted MVI probability was highly spatially co-located with the actual MVI status. This model accurately distinguished between patients with and without MVI (Spearman ρ = 0.71; p < 0.05). These results indicated the ability of the model to provide precise risk stratification. Decision curve analysis indicated robust net benefit across a wide range of threshold probabilities (Fig. 7). Specifically, within the threshold probability range of 0.10–0.90, the net benefit of the XGBoost model was significantly higher than that via both the all-intervention and no-intervention strategies, demonstrating the model’s high clinical utility.
Model explanation
The SHAP method was used to visualize the effect of predictive variables on the outcome. Specifically, the influence of variables on the outcome could be intuitively interpreted through the magnitude of SHAP values (indicated by colour changes) and the trend along the variable’s horizontal axis (probability of an adverse outcome developing). The SHAP feature importance plot illustrates the output contributions of 12 predictor variables in the XGBoost model (Fig. 8a). Among these, hepatitis viral load, RVI, pseudocapsule integrity, AFP, and tumour size were the five most influential variables affecting the predictive ability of the XGBoost model. Local explanation plots (Fig. 8b, c) revealed that hepatitis viral load, AFP, RVI, tumour size, and GGT level were positive contributors, whereas pseudocapsule integrity, lymphocyte count, and SII were protective factors against MVI.
Baseline clinical information
In total, 308 patients with HCC who underwent curative hepatectomy were included in this study to construct the model cohort. This cohort was divided into training and validation sets comprising 216 and 92 patients, respectively. A comparison of baseline characteristics between the training and validation sets revealed no significant differences for most baseline characteristics (p > 0.05; Table 1). There were statistically significant differences in sex (p = 0.010), GGT level (p = 0.030), lymphocyte count (p = 0.040), and tumor number (p = 0.005) between the training and validation sets. All these variables were included in the model feature pool and screened by LASSO regression, and the model could capture and correct such baseline bias by learning the nonlinear relationships among variables. We further calculated the SMD to quantify the degree of baseline imbalance, with SMD values of 0.32, 0.30, 0.29, and 0.27 for the above variables, all < 0.5, indicating that the baseline imbalance was within an acceptable range and had a limited impact on the model validation results.
Independent risk factors
The study systematically explored independent risk factors for MVI across the entire cohort. Univariate logistic regression analysis identified 12 potential risk factors, and multivariate logistic regression analysis ultimately determined five independent risk factors associated with MVI: hepatitis viral load, AFP, GGT level, tumour size, and RVI (p < 0.05; Table 2). Multicollinearity assessment showed that the VIF values of all variables included in the multivariate regression were < 3, indicating no significant multicollinearity (Table 2).
For example, each 1-log10 increase in hepatitis viral load (i.e., a tenfold increase in original viral load) was associated with an 89% increased risk of MVI (OR = 1.89, 95%CI:1.32–2.70, p = 0.022).
Selection of predictor variables
The LASSO regression algorithm was employed to select predictor variables. The regularisation parameter λ was optimised based on tenfold cross-validation to achieve a dimensionality reduction in the high-dimensional feature space and select key predictor variables. A LASSO penalised logistic regression model (α = 1) was constructed to apply sparse modelling to the 25 clinical, imaging, and molecular features included initially. Regularisation path analysis (Fig. 3) indicated that when the λ value was set to the optimal solution corresponding to the 1-standard error criterion—specifically, λ = 0.0256—a total of 12 non-zero coefficient features were selected. These features included hepatitis viral load, RVI, pseudocapsule integrity, AFP, tumour size, GGT level, APPH, lymphocyte count, SII, number of tumours, intratumoural necrosis, and sex. The subset of features selected was then used to develop the ML predictive model.
Model development, performance comparison, and sensitivity analysis
Ten ML predictive models were constructed through tenfold cross-validation with 10 iterations, and receiver operating characteristic curves were plotted (Fig. 4). The AUC, accuracy, sensitivity, and specificity of the predictive models for both the training and validation sets were calculated (Table 3). The AUCs for the XGBoost, adaptive boosting, gradient boosting, extremely randomised trees, random forest, logistic regression, naive Bayes, multilayer perceptron, k-nearest neighbour, and support vector machine models were 0.852 (95% confidence interval [CI], 0.756–0.947), 0.825 (95% CI, 0.729–0.922), 0.818 (95% CI, 0.717–0.919), 0.810 (95% CI, 0.711–0.909), 0.804 (95% CI, 0.695–0.913), 0.784 (95% CI, 0.674–0.894), 0.772 (95% CI, 0.665–0.879), 0.771 (95% CI, 0.652–0.891), 0.760 (95% CI, 0.651–0.869), and 0.727 (95% CI, 0.600–0.855), respectively. The highest AUC value was achieved with the XGBoost model, which demonstrated high accuracy (0.792), sensitivity (0.812), specificity (0.775), and F1 score (0.776). The AUC value for the validation set was 0.815 (95% CI, 0.719–0.912). The accuracy, sensitivity, specificity, and F1 score for the validation set were 0.750, 0.677, 0.805, and 0.700, respectively. The moderate sensitivity (0.677) in the validation set indicates the model is conservative, prioritizing specificity to reduce false positives, which is suitable for a risk-stratification tool. Therefore, the XGBoost model was selected as the optimal ML predictive model. The model calibration curve (Fig. 5) showed that the predicted values were generally consistent with the observed values, and the model passed the Hosmer–Lemeshow goodness of fit test (χ2 = 3.538; p = 0.895). Additional calibration metrics were: calibration slope = 0.92, calibration intercept = 0.08, and Brier score = 0.16. The waterfall plot of sample prediction scores (Fig. 6) demonstrated that the XGBoost model’s predicted MVI probability was highly spatially co-located with the actual MVI status. This model accurately distinguished between patients with and without MVI (Spearman ρ = 0.71; p < 0.05). These results indicated the ability of the model to provide precise risk stratification. Decision curve analysis indicated robust net benefit across a wide range of threshold probabilities (Fig. 7). Specifically, within the threshold probability range of 0.10–0.90, the net benefit of the XGBoost model was significantly higher than that via both the all-intervention and no-intervention strategies, demonstrating the model’s high clinical utility.
Model explanation
The SHAP method was used to visualize the effect of predictive variables on the outcome. Specifically, the influence of variables on the outcome could be intuitively interpreted through the magnitude of SHAP values (indicated by colour changes) and the trend along the variable’s horizontal axis (probability of an adverse outcome developing). The SHAP feature importance plot illustrates the output contributions of 12 predictor variables in the XGBoost model (Fig. 8a). Among these, hepatitis viral load, RVI, pseudocapsule integrity, AFP, and tumour size were the five most influential variables affecting the predictive ability of the XGBoost model. Local explanation plots (Fig. 8b, c) revealed that hepatitis viral load, AFP, RVI, tumour size, and GGT level were positive contributors, whereas pseudocapsule integrity, lymphocyte count, and SII were protective factors against MVI.
Discussion
Discussion
MVI is an independent prognostic determinant for early recurrence and metastasis for HCC following surgery, with its accurate preoperative prediction becoming a focal topic in the field of precision diagnosis in liver cancer treatment. However, current clinical prediction paradigms for preoperative MVI face significant challenges [9, 10, 15]. This is the first study to develop and validate an ML prediction model for MVI to address the challenges posed by traditional models. Our model integrated multidimensional clinical and imaging features within an explainable artificial intelligence framework. The XGBoost algorithm demonstrated cross-set robustness in both the training (AUC = 0.852; 95% CI, 0.756–0.947) and validation (AUC = 0.815; 95% CI, 0.719–0.912) sets. SHAP attribution analysis revealed the asymmetric predictive contributions of factors, such as hepatitis viral load, RVI, and pseudocapsule integrity for the first time. Our model not only surpassed the limitations of traditional predictive tools but also provided evidence-based support for personalised neoadjuvant treatment strategies, ensuring both statistical efficacy and explainability through visual decision mapping (Fig. 8). As a risk-stratification tool, it assists clinicians in identifying high-risk patients who may benefit from more aggressive surgical margins or adjuvant therapies.
ML is an interdisciplinary field combining statistics and computer science. In this field, various mathematical functions are used to train data and adjust parameters to minimize the error between predicted output and actual values. ML has been widely applied in predicting the occurrence, progression, and clinical outcomes of various diseases, and holds great promise for constructing predictive models [16–18]. Employing ensemble learning with multiple ML models mitigates the issues of overfitting or underfitting, which are encountered when a single model is employed. This approach effectively enhances a model’s generalisability and predictive performance [19]. Current research on using ML to construct MVI prediction models for patients with HCC predominantly relies on radiomics. Although these models achieved high AUC values, the lack of standardisation in image acquisition protocols, segmentation methods, and radiomics tools poses significant challenges. This may result in subjective differences while obtaining measurements for imaging features. Additionally, the inherent black-box nature of these models severely hinders clinical translation. Collectively, these issues greatly limit the practical clinical utility of such models [20–22]. When more clinically significant factors or complex dataset structures are included, ML algorithms are superior in predicting outcomes. The present study systematically revealed the robustness of hepatitis viral load, AFP, GGT levels, tumour size, and RVI as independent predictors of MVI. Their robustness was confirmed through dual validation—using both traditional statistical methods and explainable ML algorithms. This model achieved innovations in both global and local explanations via the SHAP framework. Compared to previous research, this integrated strategy markedly improved the biological interpretability and clinical applicability of the model.
This study identified serum hepatitis viral load as the most important predictive feature variable in the MVI prediction model. This finding strongly aligns with the core biological role of serum hepatitis viral load in the occurrence and progression of HCC. Previous studies have shown that a high preoperative serum hepatitis viral load is an independent risk factor for MVI occurrence in patients with HCC [10], and preoperative antiviral treatment can significantly reduce the incidence of MVI [23–25]. HBV infection drives the pathological sequential progression from liver fibrosis to cirrhosis to HCC. The persistent replication of HBV exacerbates the invasive phenotype of tumours through multiple mechanisms. The unique genetic susceptibility of the Chinese population—through, for example, HLA-DP rs3077 polymorphism—can lead to reduced efficacy of antiviral immune responses, with more than 40% reduction in interferon-γ secretion. This, in turn, facilitates persistent viral infection and the formation of a chronic inflammatory microenvironment within the liver [26]. In this context, the HBV-encoded X protein, a multifunctional regulatory factor, may be involved in a multi-pathway synergy that promotes MVI occurrence. HBV infection of hepatocytes can upregulate the expression of genes, such as MMP-9 and LOXL2, which promotes vasculogenic mimicry among tumour cells. High viral load activates the NF-κB pathway via the HBV-encoded X protein, inducing the release of pro-inflammatory factors, such as IL-6 and TNF-α. This leads to the degradation of junction proteins, such as VE-cadherin, present between vascular endothelial cells, and increases the efficiency of transendothelial migration of tumour cells [27, 28].
MVI development follows a multifactorial pathogenesis model. The pathological mechanisms involve multidimensional interactions among virology, metabolomics, and the tumour microenvironment. The results of this study revealed that, in addition to hepatitis viral load, RVI, AFP, tumour size, and GGT level also served as independent positive predictor variables for MVI. These variables collectively form a risk stratification system for MVI development. RVI is a non-invasive predictive indicator based on contrast-enhanced CT imaging features. Previous multi-centre studies have shown a significant correlation between RVI positivity and MVI. The predictive performance of RVI achieves a sensitivity of 76%, a specificity of 84%, and an accuracy of 89% [12]. This finding is highly consistent with the specific findings of intraoperative frozen pathology examinations of a sleeve-like vascular invasion pattern in the tumour [29]. Therefore, RVI positivity can serve as a reliable imaging marker for preoperatively assessing MVI. Researchers have used traditional multivariate logistic regression methods to combine common indicators and construct a model that includes tumour size and AFP. The model achieved an AUC of 0.74 for predicting MVI [30]. The microvascular density in AFP-positive HCC cases is also significantly higher than that in AFP-negative cases [31]. A multi-centre clinical study [32] involving 1,073 patients indicated a positive correlation between tumour diameter and the incidence of MVI. Serum GGT is primarily produced by the liver and is significantly elevated when tumours compress or invade the bile duct and obstruct bile secretion. Therefore, persistently high GGT levels might indicate a potential MVI occurrence; however, research on the mechanisms linking serum GGT to MVI remains limited.
The SHAP attribution analysis of our ML model indicated that pseudocapsule integrity, lymphocyte count, and SII were significant protective factors against MVI. These findings are consistent with those of previous studies. A tumour pseudocapsule consists of inner and outer layers [33]; the dense inner layer acts as a physical barrier that confines tumour cells within the tumour boundary. The narrow blood vessels within the pseudocapsule prevent tumour cells from passing through [34], which inhibits HCC metastasis [10]. However, previous studies have found that the integrity of the HCC pseudocapsule is related to the expression of tumour angiogenesis-related factors, such as hypoxia-inducible factor-1α and vascular endothelial growth factor [35], which play a role in promoting tumour angiogenesis [36, 37]. When their expression levels are relatively low, the tumour grows expansively because of the reduced development of microvessels. This growth compresses surrounding tissues and increases fibrous tissue around the tumour. Consequently, a fibrous pseudocapsule is formed. Elevated lymphocyte count and SII indicate a stronger immune response. The mechanism of action may involve increasing local infiltration of immune cells and promoting effective presentation of antigens and activation of lymphocytes. This strengthens the immune response against malignant tumours and achieves a protective effect. As a result, the occurrence of MVI is inhibited [38].
This study presents a novel, explainable machine learning framework for the preoperative prediction of MVI in HCC. Its primary innovation lies not merely in achieving competitive predictive performance (validation AUC = 0.815), but in the integration of routinely available multimodal features (clinical, laboratory, CT imaging) with the SHAP explainability framework. This integration directly addresses the "black-box" limitation of many complex ML models, offering both global and local interpretability. By quantifying and visualizing feature contributions (e.g., identifying a patient's high risk as primarily due to high viral load and RVI), our model moves beyond prediction to provide actionable insights, potentially informing personalized surgical planning (e.g., determining the width of resection margins) and neoadjuvant strategy considerations (e.g., prioritizing antiviral therapy in patients with high viral load).
Compared to previous studies, our approach offers distinct advantages in three key aspects: (1) Feature Dimension: We incorporated a broader set of readily available clinical and semantic imaging features, avoiding dependency on complex radiomics pipelines that may hinder clinical adoption. (2) Model Explainability: The application of SHAP provides a transparent decision-making process, a critical advancement over previous non-interpretable models or those relying on nomograms with implicit linear assumptions. (3) Practical Utility: All predictors are objective and routinely collected in standard preoperative workups, enhancing the model's translational potential.
However, this study had some limitations, including its single-centre retrospective design with a limited sample size. While internal validation showed robust performance, external validation in multi-centre, prospective cohorts is essential to confirm the model's applicability across diverse populations and imaging protocols. Future studies should include larger, multi-centre prospective cohorts to advance translation of non-invasive MVI prediction into clinical practice, while also exploring cross-omics mechanisms and establishing standardised technical frameworks.
In conclusion, the ML model developed in this study integrates multimodal feature fusion with the SHAP explainability framework. This integration enables accurate preoperative prediction of MVI status in patients with HCC. The model’s technical approach and clinical translation strategy provide a novel paradigm for personalised surgical decision-making.
MVI is an independent prognostic determinant for early recurrence and metastasis for HCC following surgery, with its accurate preoperative prediction becoming a focal topic in the field of precision diagnosis in liver cancer treatment. However, current clinical prediction paradigms for preoperative MVI face significant challenges [9, 10, 15]. This is the first study to develop and validate an ML prediction model for MVI to address the challenges posed by traditional models. Our model integrated multidimensional clinical and imaging features within an explainable artificial intelligence framework. The XGBoost algorithm demonstrated cross-set robustness in both the training (AUC = 0.852; 95% CI, 0.756–0.947) and validation (AUC = 0.815; 95% CI, 0.719–0.912) sets. SHAP attribution analysis revealed the asymmetric predictive contributions of factors, such as hepatitis viral load, RVI, and pseudocapsule integrity for the first time. Our model not only surpassed the limitations of traditional predictive tools but also provided evidence-based support for personalised neoadjuvant treatment strategies, ensuring both statistical efficacy and explainability through visual decision mapping (Fig. 8). As a risk-stratification tool, it assists clinicians in identifying high-risk patients who may benefit from more aggressive surgical margins or adjuvant therapies.
ML is an interdisciplinary field combining statistics and computer science. In this field, various mathematical functions are used to train data and adjust parameters to minimize the error between predicted output and actual values. ML has been widely applied in predicting the occurrence, progression, and clinical outcomes of various diseases, and holds great promise for constructing predictive models [16–18]. Employing ensemble learning with multiple ML models mitigates the issues of overfitting or underfitting, which are encountered when a single model is employed. This approach effectively enhances a model’s generalisability and predictive performance [19]. Current research on using ML to construct MVI prediction models for patients with HCC predominantly relies on radiomics. Although these models achieved high AUC values, the lack of standardisation in image acquisition protocols, segmentation methods, and radiomics tools poses significant challenges. This may result in subjective differences while obtaining measurements for imaging features. Additionally, the inherent black-box nature of these models severely hinders clinical translation. Collectively, these issues greatly limit the practical clinical utility of such models [20–22]. When more clinically significant factors or complex dataset structures are included, ML algorithms are superior in predicting outcomes. The present study systematically revealed the robustness of hepatitis viral load, AFP, GGT levels, tumour size, and RVI as independent predictors of MVI. Their robustness was confirmed through dual validation—using both traditional statistical methods and explainable ML algorithms. This model achieved innovations in both global and local explanations via the SHAP framework. Compared to previous research, this integrated strategy markedly improved the biological interpretability and clinical applicability of the model.
This study identified serum hepatitis viral load as the most important predictive feature variable in the MVI prediction model. This finding strongly aligns with the core biological role of serum hepatitis viral load in the occurrence and progression of HCC. Previous studies have shown that a high preoperative serum hepatitis viral load is an independent risk factor for MVI occurrence in patients with HCC [10], and preoperative antiviral treatment can significantly reduce the incidence of MVI [23–25]. HBV infection drives the pathological sequential progression from liver fibrosis to cirrhosis to HCC. The persistent replication of HBV exacerbates the invasive phenotype of tumours through multiple mechanisms. The unique genetic susceptibility of the Chinese population—through, for example, HLA-DP rs3077 polymorphism—can lead to reduced efficacy of antiviral immune responses, with more than 40% reduction in interferon-γ secretion. This, in turn, facilitates persistent viral infection and the formation of a chronic inflammatory microenvironment within the liver [26]. In this context, the HBV-encoded X protein, a multifunctional regulatory factor, may be involved in a multi-pathway synergy that promotes MVI occurrence. HBV infection of hepatocytes can upregulate the expression of genes, such as MMP-9 and LOXL2, which promotes vasculogenic mimicry among tumour cells. High viral load activates the NF-κB pathway via the HBV-encoded X protein, inducing the release of pro-inflammatory factors, such as IL-6 and TNF-α. This leads to the degradation of junction proteins, such as VE-cadherin, present between vascular endothelial cells, and increases the efficiency of transendothelial migration of tumour cells [27, 28].
MVI development follows a multifactorial pathogenesis model. The pathological mechanisms involve multidimensional interactions among virology, metabolomics, and the tumour microenvironment. The results of this study revealed that, in addition to hepatitis viral load, RVI, AFP, tumour size, and GGT level also served as independent positive predictor variables for MVI. These variables collectively form a risk stratification system for MVI development. RVI is a non-invasive predictive indicator based on contrast-enhanced CT imaging features. Previous multi-centre studies have shown a significant correlation between RVI positivity and MVI. The predictive performance of RVI achieves a sensitivity of 76%, a specificity of 84%, and an accuracy of 89% [12]. This finding is highly consistent with the specific findings of intraoperative frozen pathology examinations of a sleeve-like vascular invasion pattern in the tumour [29]. Therefore, RVI positivity can serve as a reliable imaging marker for preoperatively assessing MVI. Researchers have used traditional multivariate logistic regression methods to combine common indicators and construct a model that includes tumour size and AFP. The model achieved an AUC of 0.74 for predicting MVI [30]. The microvascular density in AFP-positive HCC cases is also significantly higher than that in AFP-negative cases [31]. A multi-centre clinical study [32] involving 1,073 patients indicated a positive correlation between tumour diameter and the incidence of MVI. Serum GGT is primarily produced by the liver and is significantly elevated when tumours compress or invade the bile duct and obstruct bile secretion. Therefore, persistently high GGT levels might indicate a potential MVI occurrence; however, research on the mechanisms linking serum GGT to MVI remains limited.
The SHAP attribution analysis of our ML model indicated that pseudocapsule integrity, lymphocyte count, and SII were significant protective factors against MVI. These findings are consistent with those of previous studies. A tumour pseudocapsule consists of inner and outer layers [33]; the dense inner layer acts as a physical barrier that confines tumour cells within the tumour boundary. The narrow blood vessels within the pseudocapsule prevent tumour cells from passing through [34], which inhibits HCC metastasis [10]. However, previous studies have found that the integrity of the HCC pseudocapsule is related to the expression of tumour angiogenesis-related factors, such as hypoxia-inducible factor-1α and vascular endothelial growth factor [35], which play a role in promoting tumour angiogenesis [36, 37]. When their expression levels are relatively low, the tumour grows expansively because of the reduced development of microvessels. This growth compresses surrounding tissues and increases fibrous tissue around the tumour. Consequently, a fibrous pseudocapsule is formed. Elevated lymphocyte count and SII indicate a stronger immune response. The mechanism of action may involve increasing local infiltration of immune cells and promoting effective presentation of antigens and activation of lymphocytes. This strengthens the immune response against malignant tumours and achieves a protective effect. As a result, the occurrence of MVI is inhibited [38].
This study presents a novel, explainable machine learning framework for the preoperative prediction of MVI in HCC. Its primary innovation lies not merely in achieving competitive predictive performance (validation AUC = 0.815), but in the integration of routinely available multimodal features (clinical, laboratory, CT imaging) with the SHAP explainability framework. This integration directly addresses the "black-box" limitation of many complex ML models, offering both global and local interpretability. By quantifying and visualizing feature contributions (e.g., identifying a patient's high risk as primarily due to high viral load and RVI), our model moves beyond prediction to provide actionable insights, potentially informing personalized surgical planning (e.g., determining the width of resection margins) and neoadjuvant strategy considerations (e.g., prioritizing antiviral therapy in patients with high viral load).
Compared to previous studies, our approach offers distinct advantages in three key aspects: (1) Feature Dimension: We incorporated a broader set of readily available clinical and semantic imaging features, avoiding dependency on complex radiomics pipelines that may hinder clinical adoption. (2) Model Explainability: The application of SHAP provides a transparent decision-making process, a critical advancement over previous non-interpretable models or those relying on nomograms with implicit linear assumptions. (3) Practical Utility: All predictors are objective and routinely collected in standard preoperative workups, enhancing the model's translational potential.
However, this study had some limitations, including its single-centre retrospective design with a limited sample size. While internal validation showed robust performance, external validation in multi-centre, prospective cohorts is essential to confirm the model's applicability across diverse populations and imaging protocols. Future studies should include larger, multi-centre prospective cohorts to advance translation of non-invasive MVI prediction into clinical practice, while also exploring cross-omics mechanisms and establishing standardised technical frameworks.
In conclusion, the ML model developed in this study integrates multimodal feature fusion with the SHAP explainability framework. This integration enables accurate preoperative prediction of MVI status in patients with HCC. The model’s technical approach and clinical translation strategy provide a novel paradigm for personalised surgical decision-making.
Supplementary Information
Supplementary Information
출처: PubMed Central (JATS). 라이선스는 원 publisher 정책을 따릅니다 — 인용 시 원문을 표기해 주세요.
🏷️ 같은 키워드 · 무료전문 — 이 논문 MeSH/keyword 기반
- Raman Spectroscopic Signatures of Hepatic Carcinoma: Progress and Future Prospect.
- Nanotechnology-Assisted Molecular Profiling: Emerging Advances in Circulating Tumor DNA Detection.
- Building Hybrid Pharmacometric-Machine Learning Models in Oncology Drug Development: Current State and Recommendations.
- Heat Shock Protein 47 as a Novel Predictive and Diagnostic Biomarker for Thrombosis in Hepatocellular Carcinoma.
- Crosstalk Between -Regulatory Elements and Metabolism Reprogramming in Hepatocellular Carcinoma.
- TAZ WW Domain-Mediated Regulation of Gluconeogenesis and Tumorigenesis in Hepatocellular Carcinoma through Interaction with the Glucocorticoid Receptor.