Development and Validation of an Interpretable Machine Learning Model for Predicting Tumor Recurrence After Microwave Ablation in Small Hepatocellular Carcinoma.

Ma Y; Zhao L; Zhao Q

doi:10.2147/JHC.S602760

← 뒤로

Development and Validation of an Interpretable Machine Learning Model for Predicting Tumor Recurrence After Microwave Ablation in Small Hepatocellular Carcinoma.

1/5 보강

Journal of hepatocellular carcinoma 📖 저널 OA 100% 2024~2026 2026 Vol.13() p. 602760 OA

PICO 자동 추출 (휴리스틱, conf 2/4)

유사 논문

P · Population 대상 환자/모집단

환자: sHCC treated with ultrasound-guided MWA between July 2017 and July 2023

I · Intervention 중재 / 시술

추출되지 않음

C · Comparison 대조 / 비교

추출되지 않음

O · Outcome 결과 / 결론

[CONCLUSION] An interpretable XGBoost-based model using routinely available clinical variables accurately predicts recurrence after MWA in patients with sHCC. This model may serve as a clinically accessible approach for individualized risk stratification and post-ablation management; however, further external validation in multicenter cohorts is required before broader clinical application.

Ma Y, Zhao L, Zhao Q

📖 무료 전문 🟢 PMC 전문 PMC13082254 🔓 OA PDF unpaywall · unknown

PubMed ↗ DOI ↗ BibTeX ↓ RIS ↓

📝 환자 설명용 한 줄

[OBJECTIVE] Tumor recurrence remains a major clinical challenge following microwave ablation (MWA) for small hepatocellular carcinoma (sHCC).

이 논문을 인용하기

↓ .bib ↓ .ris

APA Ma Y, Zhao L, Zhao Q (2026). Development and Validation of an Interpretable Machine Learning Model for Predicting Tumor Recurrence After Microwave Ablation in Small Hepatocellular Carcinoma.. Journal of hepatocellular carcinoma, 13, 602760. https://doi.org/10.2147/JHC.S602760

MLA Ma Y, et al.. "Development and Validation of an Interpretable Machine Learning Model for Predicting Tumor Recurrence After Microwave Ablation in Small Hepatocellular Carcinoma.." Journal of hepatocellular carcinoma, vol. 13, 2026, pp. 602760.

PMID 41993242 ↗

DOI 10.2147/JHC.S602760

Abstract

[OBJECTIVE] Tumor recurrence remains a major clinical challenge following microwave ablation (MWA) for small hepatocellular carcinoma (sHCC). Accurate and interpretable prediction of post-ablation recurrence risk is essential for individualized surveillance and management strategies. This study aimed to develop and validate an interpretable machine learning (IML) model for predicting tumor recurrence after MWA in patients with sHCC.

[METHODS] This retrospective study included 536 consecutive patients with sHCC treated with ultrasound-guided MWA between July 2017 and July 2023. A two-stage feature selection strategy combining Boruta and LASSO regression was applied to identify the most robust predictors. Six ML algorithms were developed and compared, including support vector machine (SVM), random forest (RF), k-nearest neighbors (KNN), decision tree (DT), extreme gradient boosting (XGBoost), and logistic regression (LR). Model discrimination, calibration, and clinical utility were evaluated using receiver operating characteristic (ROC) analysis, calibration curves, decision curve analysis (DCA), and confusion matrices. Model interpretability was assessed using SHapley Additive exPlanations (SHAP).

[RESULTS] Tumor recurrence occurred in 29.1% of patients within 1 year after MWA. Seven variables, including maximum tumor diameter, Child-Pugh grade, cirrhosis, portal hypertension, platelet count (PLT), alpha-fetoprotein (AFP), and C-reactive protein (CRP), were selected as predictors. Among all models, XGBoost demonstrated the best performance, achieving an AUC of 0.889 in the validation set, with good calibration and favorable net clinical benefit on DCA. SHAP analysis provided transparent global and individualized explanations, identifying AFP and CRP as the most influential predictors and revealing nonlinear risk patterns.

[CONCLUSION] An interpretable XGBoost-based model using routinely available clinical variables accurately predicts recurrence after MWA in patients with sHCC. This model may serve as a clinically accessible approach for individualized risk stratification and post-ablation management; however, further external validation in multicenter cohorts is required before broader clinical application.

🏷️ 키워드 / MeSH 📖 같은 키워드 OA만

같은 제1저자의 인용 많은 논문 (5)

A Streamlined Protocol for Developing a Clinicopathological Prediction Model for Patient Survival of Post-Resection of Pancreatic Cancer.
Cancer medicine 2026
Thermosensitive spray-gel oxaliplatin delivery system for antitumor efficacy and mechanistic study in a mouse subcutaneous xenograft model.
International journal of pharmaceutics 2026
LAP2α drives breast tumorigenesis by mitigating replication stress.
Cell death & disease 2026
Short-course high-dose cytarabine consolidation therapy before allogeneic hematopoietic stem cell transplantation improves 2-year relapse-free survival but not overall survival in patients with acute myeloid leukemia: a single-center retrospective study.
Annals of hematology 2026
ISX promotes tumor migration and invasion in lung cancer by upregulating COL1A1 .
Molecular medicine reports 2026

📖 전문 본문 읽기 PMC JATS · ~97 KB · 영문

Introduction

Introduction
Hepatocellular carcinoma (HCC) remains one of the leading causes of cancer-related mortality worldwide, particularly in regions with a high prevalence of chronic liver disease and viral hepatitis.1 With the increasing implementation of surveillance programs, a growing proportion of patients are diagnosed at an early stage, commonly referred to as small hepatocellular carcinoma (sHCC), typically defined as tumors ≤3 cm in diameter. For these patients, curative-intent local therapies, including surgical resection, liver transplantation, and image-guided thermal ablation, are widely recommended by international guidelines.2–4 Among these, microwave ablation (MWA) has been widely adopted as an effective minimally invasive treatment, offering favorable local control with reduced procedural morbidity, especially in patients who are not ideal surgical candidates.5,6 Despite technical refinements and standardized ablation protocols, post-treatment tumor recurrence remains a frequent and clinically consequential event following MWA. Recurrence may occur as local tumor progression due to incomplete ablation or as intrahepatic distant recurrence reflecting underlying tumor aggressiveness and field cancerization in cirrhotic liver tissue.7,8 Early identification of patients at high risk of recurrence is essential. It can help optimize follow-up strategies, guide adjuvant treatment decisions, and improve long-term outcomes.
Most existing studies for predicting tumor recurrence after MWA are primarily based on traditional statistical models or their visualized extensions in the form of nomograms.9–12 These models are generally favored for their structural simplicity and ease of implementation in clinical practice. However, they are inherently constrained by assumptions of linearity, additivity, and independence among predictors, which may inadequately represent the complex and heterogeneous mechanisms underlying post-MWA recurrence.13 In real-world clinical settings, recurrence risk is often driven by nonlinear effects and high-order interactions among tumor characteristics, host-related factors, inflammatory status, and ablation-related technical parameters—relationships that are difficult to capture within conventional regression-based frameworks. Moreover, continuous variables are frequently categorized or simplified to facilitate model construction and visualization, potentially leading to information loss and reduced discrimination at the individual patient level.14 Collectively, these limitations restrict the ability of traditional nomogram-based models to fully characterize recurrence heterogeneity after MWA and underscore the need for more flexible modeling approaches capable of learning complex patterns while supporting individualized risk assessment. In addition, recent studies have explored molecular biomarkers and bioinformatics-based approaches to better understand hepatocellular carcinoma progression and prognosis, including hub gene identification,15 microRNA-regulated pathways,16 and protein expression–based prognostic markers.17 These findings highlight the biological heterogeneity of HCC and suggest that integrating multidimensional information may improve risk stratification. However, many of these approaches rely on molecular data that are not routinely available in clinical practice, which may limit their applicability in real-world settings.
Recent advances in machine learning (ML) have offered new opportunities to improve prognostic modeling in oncology by leveraging high-dimensional clinical, laboratory, and imaging-derived data.18 ML algorithms are particularly well suited to model nonlinear relationships and complex feature interactions, and several studies have demonstrated their superior performance over conventional regression-based approaches in predicting outcomes of tumor treatments.19 Nevertheless, most existing ML models function as “black boxes”, providing limited insight into how individual predictors contribute to model outputs. This lack of interpretability represents a significant barrier to clinical adoption, especially in high-stakes decision-making scenarios such as cancer recurrence prediction.20 Interpretable machine learning (IML) aims to bridge this gap by combining predictive accuracy with transparent and explainable decision mechanisms. IML frameworks allow the contribution of individual variables to be quantified and visualized, enabling clinicians to assess whether model behavior aligns with established clinical knowledge and to explore potential novel risk factors.21 Such interpretability is particularly relevant in post-ablation recurrence prediction, where therapeutic decisions may be influenced by nuanced risk stratification rather than binary outcomes.
To date, few studies have focused on developing IML models specifically for predicting tumor recurrence after MWA in patients with sHCC. Thus, the present study aimed to establish and validate an IML model for recurrence prediction following MWA in sHCC. By integrating routinely collected clinical and procedural variables with IML techniques, we sought to construct a robust and clinically meaningful tool capable of supporting individualized post-ablation management and surveillance strategies.

Materials and Methods

Materials and Methods

Study Population and Patient Selection
This retrospective study collected clinical data of patients with sHCC who underwent MWA at our institution between July 2017 and July 2023 using the hospital electronic medical record system. Inclusion criteria: (1) met the diagnostic criteria for for sHCC;22 (2) met the indications for MWA and underwent MWA with complete ablation of the lesion; (3) Child-Pugh liver function class A or B; (4) China Liver Cancer (CNLC) staging system stage Ia;23 (5) no prior treatment had been received at the time of the initial consultation; (6) complete clinical and follow-up data. Exclusion criteria: (1) concurrent other malignant tumors (eg, cervical cancer, colorectal cancer); (2) refractory massive ascites or cachexia; (3) severe decompensated liver function; (4) concurrent active infection; (5) presence of severe coagulation disorders or a tendency to bleed; (6) postoperative complications such as biliary tract injury or intra-abdominal infection After applying these criteria, a total of 536 eligible patients were included in the final analysis. To develop and internally validate the prediction model, the study cohort was randomly split at a ratio of 7:3 into a training set (n=375) and a validation set (n=161). The training set was used for model development and tuning, whereas the validation set was reserved for independent assessment of model performance. The detailed patient selection process, including reasons for exclusion at each step, is summarized in the patient screening flowchart (Figure 1).

Diagnosis of sHCC, Recurrence Definition, and Follow-Up
HCC was diagnosed in patients with an underlying liver disease history (eg, chronic hepatitis B/C or cirrhosis) based on serum biomarkers together with imaging and/or histopathology Specifically, HCC was considered when serum alpha-fetoprotein (AFP) was ≥400 ng/mL and a hepatic solid lesion was detected on ultrasound and/or cross-sectional imaging; lesions demonstrating typical HCC imaging hallmarks on contrast-enhanced CT or MRI were diagnosed noninvasively, whereas cases without typical imaging features were confirmed by percutaneous biopsy and histopathology. For eligibility in this study, sHCC was defined as either a solitary tumor with a maximum diameter ≤3 cm or 2–3 tumors with a cumulative tumor diameter ≤3 cm.22
All patients received standardized MWA after hospital admission; for those with residual tumor after the initial procedure, repeat ablation was performed. After complete ablation was confirmed, patients were followed for 1 year, with follow-up visits scheduled every 2–3 months. Follow-up assessments included serum tumor biomarkers, conventional ultrasound, and contrast-enhanced CT or multiparametric MRI (as well as contrast-enhanced ultrasound when indicated). Recurrence was defined as the detection, during scheduled surveillance, of a newly developed abnormally enhancing lesion outside the ablation zone or in other areas of the liver, with dynamic contrast-enhanced CT, MRI, or contrast-enhanced ultrasound demonstrating marked arterial-phase hyperenhancement. Patients who developed recurrence within 1 year after ablation were assigned to the recurrence group, whereas all remaining patients were assigned to the non-recurrence group.

Microwave Ablation Procedure
The KY-2000 microwave therapy device was employed, equipped with a 14G water-cooled MWA needle (Nanjing Kangyou Medical Technology Co., Ltd., frequency: 2450 MHz). Patients were positioned in the supine or lateral decubitus position according to tumor location. After skin disinfection at the puncture site and administration of local anesthesia, the microwave antenna was percutaneously advanced into the predefined target within the tumor under real-time ultrasound guidance. Continuous ultrasonographic monitoring was performed during needle insertion to avoid injury to adjacent major bile ducts, blood vessels, or other vital organs. Once accurate antenna placement was confirmed, microwave energy was delivered at a power output of 60–80 W for 15–20 min. Single-antenna ablation was applied for tumors <2 cm in diameter, whereas multipoint overlapping ablation was performed for tumors measuring 2–3 cm. Throughout the procedure, real-time ultrasound monitoring ensured that the ablation zone extended 0.5–1.0 cm beyond the tumor margin. At the end of the procedure, track ablation was routinely performed to prevent needle-track bleeding and tumor seeding. Postoperatively, patients received standard supportive treatment, including hepatoprotective therapy, anti-infective treatment, and maintenance of fluid and electrolyte balance. Contrast-enhanced CT was performed 3–4 weeks after ablation, and repeat MWA was carried out if residual tumor was detected at the original lesion site.

Clinical Data Collection
A standardized case report form for baseline data collection was designed. Clinical data were extracted from the electronic medical records by trained investigators and recorded using a double-check procedure. Collected variables included gender (male/female), age, body mass index (BMI), maximum tumor diameter, number of tumors (single or 2–3 lesions), Child-Pugh liver function class (A or B), hypertension, diabetes mellitus, hyperlipidemia, family history of cancer, smoking history, drinking history, history of hepatitis B virus infection, liver cirrhosis, portal hypertension, and ablation frequency (once or twice). Laboratory parameters at admission included white blood cell count (WBC), platelet count (PLT), aspartate aminotransferase (AST), alanine aminotransferase (ALT), total bilirubin (TBIL), AFP, gamma-glutamyl transpeptidase (GGT), fibrinogen (FIB), Des-γ-carboxy prothrombin (DCP), C-reactive protein (CRP), and serum albumin level.

Feature Selection, Model Development, and Interpretable Machine Learning Analysis
To reduce dimensionality and minimize overfitting, a two-step feature selection strategy was adopted. First, the Boruta algorithm, a wrapper method based on random forests, was applied to identify all relevant features by iteratively comparing original variables with their randomized shadow counterparts. Second, least absolute shrinkage and selection operator (LASSO) regression was performed, which applies L1 regularization to shrink regression coefficients and select variables with non-zero coefficients under optimal penalization. Only variables consistently identified by both Boruta and LASSO were retained, and the intersection of the two feature sets was used as the final input feature set for model development. Based on the selected features, six ML models were developed, including support vector machine (SVM), random forest (RF), k-nearest neighbors (KNN), decision tree (DT), extreme gradient boosting (XGBoost), and logistic regression (LR). All models were trained on the training set, with hyperparameters optimized through cross-validation to achieve optimal predictive performance.
The predictive performance of the six models was comprehensively evaluated and compared. Discrimination ability was primarily assessed using the area under the receiver operating characteristic (ROC) curve (AUC), which served as the main criterion for model comparison. In addition, other performance metrics, including F1 score, accuracy, precision, recall, sensitivity and specificity were calculated to provide a more complete assessment of model performance. The model demonstrating the highest AUC with balanced overall performance was selected as the final prediction model. The calibration performance of the selected model was assessed using calibration plots, which examined the consistency between predicted probabilities and observed outcomes. To evaluate the potential clinical value of the model, decision curve analysis (DCA) was conducted to estimate the net benefit across a range of clinically relevant threshold probabilities. Furthermore, a confusion matrix was generated to present detailed classification outcomes and error distributions.
To further enhance model transparency and facilitate clinical interpretability, Shapley Additive Explanations (SHAP) were applied to the final prediction model. SHAP is a game theory-based approach that attributes each model prediction to individual feature contributions by estimating the marginal contribution of each variable across all possible feature combinations. Global SHAP summary plots were generated to rank features according to their overall importance and to visualize the direction and magnitude of their effects on model output. In addition, SHAP dependence plots were used to explore potential nonlinear relationships and interaction effects between key predictors and the predicted outcome. At the individual level, SHAP force plots were employed to illustrate how specific features collectively influenced risk estimation for individual patients.

Quality Control Measures
This study was conducted in strict accordance with principles of good clinical research practice. A multidisciplinary research team with extensive professional expertise and clinical experience was established, and the final study objectives and methodological details were determined after comprehensive literature review and rigorous protocol development. For all patients with small hepatocellular carcinoma who underwent MWA and were included in the analysis, baseline clinical characteristics and serum biochemical parameters were complete and readily available. With respect to data quality control, dedicated personnel were responsible for follow-up management, with precise definition and accurate documentation of follow-up time points and study endpoints to ensure the reliability of the original data. Finally, standardized information workflows were applied for data collection, organization, and statistical analysis, thereby ensuring methodological rigor throughout data processing and enhancing the clinical relevance and interpretability of the study findings.

Statistical Analysis
All analyses were conducted in R software (version 4.3.1). Continuous variables were assessed for distributional characteristics using the Shapiro–Wilk test and summarized accordingly. Variables following a normal distribution were expressed as mean ± standard deviation (SD) and compared using the Student’s t-test. Non-normally distributed variables were summarized as median and interquartile range (IQR) and compared using the Mann–Whitney U-test. Categorical variables were presented as counts and percentages. Group comparisons were performed using the chi-square test, or Fisher’s exact test when the expected cell count was less than five.
To assess multicollinearity among candidate predictors, variance inflation factor (VIF) analysis was conducted prior to model construction. Variables with a VIF exceeding 5 were considered to exhibit significant collinearity and were carefully evaluated or excluded to ensure model stability. For machine learning model development, hyperparameter optimization was performed using a grid search strategy combined with k-fold cross-validation (k = 5) within the training dataset. For each algorithm, a predefined grid of candidate hyperparameter values was evaluated, and the optimal combination was selected based on cross-validated performance (primarily AUC). To improve reproducibility and transparency, the main hyperparameters and their corresponding search ranges are provided in Supplementary Table S1.
Data preprocessing and visualization were performed using “tidyverse”. Feature selection was implemented using “glmnet” (LASSO) and “Boruta”. Model training and tuning were performed using “caret” and “tidymodels”. Discrimination was assessed using “pROC”; clinical utility was evaluated using “rmda”; confusion matrices were produced using “caret”. IML analyses were performed using “shapviz” and “fastshap” packages. To ensure reproducibility, a fixed random seed (set to 4321) was used throughout all stochastic procedures, including cohort splitting, cross-validation, and hyperparameter optimization. All statistical tests were two-sided, and statistical significance was defined as a P value < 0.05.

Results

Results

Baseline Profile of the Study Cohort
The study cohort consisted of 536 patients, of whom 375 were assigned to the training set and 161 to the validation set. Tumor recurrence was observed in 156 patients (29.1%) in the overall cohort, with similar recurrence rates in the training and validation sets (28.3% vs. 31.1%, P = 0.515). The mean age of the overall population was 59.21 ± 6.04 years, and 55.6% of patients were male. No statistically significant differences were identified between the training and validation sets with respect to baseline demographic characteristics, liver function status, comorbidities, tumor burden, treatment characteristics, or laboratory parameters (all P > 0.05), supporting the appropriateness of the cohort split for model development and validation (Table 1).

Comparison of Baseline Characteristics by Recurrence Status in the Training Set
As shown in Table 2, within the training set, 106 patients developed tumor recurrence, whereas 269 patients remained recurrence-free. Baseline demographic characteristics, including age, sex, and BMI, were comparable between the recurrence and non-recurrence groups (all P > 0.05). Patients in the recurrence group had a larger maximum tumor diameter than those in the non-recurrence group (1.75 ± 0.62 cm vs. 1.48 ± 0.69 cm, P < 0.001). The distribution of Child-Pugh grade differed significantly, with a higher proportion of Child-Pugh grade B observed among patients with recurrence (40.6% vs. 22.3%, P < 0.001). In addition, patients in the recurrence group exhibited a higher prevalence of cirrhosis (38.7% vs. 20.8%, P < 0.001) and portal hypertension (30.2% vs. 14.9%, P = 0.005). Comorbid conditions, lifestyle factors, hepatitis B virus infection status, tumor number, and ablation frequency were comparable between the recurrence and non-recurrence groups (all P > 0.05). Regarding laboratory parameters, patients with recurrence showed lower PLT and higher levels of AST, AFP, and CRP compared with the non-recurrence group (all P < 0.05). No significant differences were observed in WBC, ALT, TBIL, GGT, FIB, DCP, or albumin levels between groups.

Feature Selection Using Boruta and LASSO
Feature selection was performed in the training set using a combined Boruta and LASSO strategy to identify robust predictors associated with tumor recurrence after MWA. As illustrated in Figure 2A and B, the Boruta algorithm was first applied to rank the importance of candidate variables by comparing original features with their corresponding shadow features. Variables with importance values consistently exceeding the maximum shadow importance were confirmed as relevant. Using this approach, Boruta identified maximum tumor diameter, Child-Pugh grade, cirrhosis, portal hypertension, PLT, AFP, CRP, and TBIL as important features.
Subsequently, LASSO regression was used to further constrain the feature space. The optimal regularization parameter was selected based on cross-validation (Figure 2C), and the corresponding coefficient shrinkage paths are illustrated in Figure 2D. At the optimal λ value, LASSO retained maximum tumor diameter, Child-Pugh grade, cirrhosis, portal hypertension, PLT, AFP, and CRP. To ensure robustness and reduce redundancy, only features selected by both Boruta and LASSO were carried forward for model construction. The intersection of the two methods is summarized in Figure 2E. Consequently, seven variables, including maximum tumor diameter, Child-Pugh grade, cirrhosis, portal hypertension, PLT, AFP, and CRP, were ultimately selected as input features for subsequent ML analyses.

Variable Encoding and Collinearity Assessment
Before model construction, selected variables were appropriately encoded according to their data types. Continuous variables were retained in their original numerical form, while categorical variables were transformed into binary or ordinal variables based on clinical relevance to ensure compatibility with machine learning algorithms. All variables were processed in a consistent manner across the training and validation datasets (Table 3).
Collinearity among the selected variables was evaluated using VIF analysis. The calculated VIF values were 1.032 for maximum tumor diameter, 1.040 for Child-Pugh grade, 1.047 for cirrhosis, 1.019 for portal hypertension, 1.017 for PLT, 1.045 for AFP, and 1.063 for CRP. All predictors demonstrated VIF values below the predefined threshold of 5, indicating no evidence of significant multicollinearity. Consequently, all variables were retained for subsequent ML model construction.

Model Development and Performance Comparison
Using the selected features, six ML models, including SVM, RF, KNN, DT, XGBoost, and LR, were developed and their predictive performance was systematically evaluated. As shown in Figure 3A, in the training set, all models demonstrated discriminatory ability for predicting tumor recurrence. Among them, XGBoost achieved the highest performance with an AUC of 0.926 (95% CI, 0.897–0.954), followed closely by RF with an AUC of 0.924 (95% CI, 0.899–0.949). The SVM, KNN, and LR models showed moderate discrimination, whereas the DT model exhibited comparatively lower performance. The performance pattern was largely preserved in the validation set (Figure 3B), indicating good model generalizability. XGBoost maintained the highest discriminative ability (AUC = 0.889), while SVM and LR achieved comparable AUC values. Although RF and KNN showed a slight decline in performance, their overall discrimination remained acceptable DT consistently demonstrated the lowest performance across both datasets.
In addition to AUC, multiple classification metrics were assessed to provide a comprehensive evaluation of model behavior (Figure 3C and D). Across these metrics, XGBoost and RF generally exhibited more stable and balanced performance, whereas DT showed greater variability, particularly in measures related to agreement and predictive consistency. Based on its superior AUC performance, robust generalization across datasets, and balanced classification metrics, the XGBoost model was selected as the final prediction model for subsequent calibration, clinical utility assessment, and interpretability analysis.

Model Calibration and Clinical Performance
The calibration performance of the final XGBoost model was evaluated in both the training and validation sets. As shown in Figure 4A and B, the apparent and bias-corrected calibration curves closely followed the ideal reference line, indicating good agreement between predicted probabilities and observed recurrence rates across a wide range of risk levels. The clinical utility of the model was evaluated using DCA (Figure 4C and D). In both datasets, the XGBoost model provided a consistently higher net benefit than the “treat-all” and “treat-none” strategies across a broad range of threshold probabilities, suggesting meaningful clinical usefulness for guiding risk-based decision-making.
The classification performance of the XGBoost model was summarized using confusion matrices in Figure 4E (training set) and Figure 4F (validation set). In the training set, the model correctly identified 225 non-recurrence cases and 95 recurrence cases, with 11 false positives and 44 false negatives. In the validation set, 89 non-recurrence cases and 41 recurrence cases were correctly classified, while 9 false positives and 22 false negatives were observed. These results indicate balanced discrimination with acceptable error rates in both datasets. Overall, the XGBoost model demonstrated good calibration, favorable clinical net benefit, and stable classification performance, supporting its robustness and potential applicability for predicting tumor recurrence after MWA.

Model Interpretability Analysis Using SHAP
The interpretability of the final XGBoost model was examined using SHAP to quantify the contribution of individual predictors to recurrence risk after MWA. Global feature importance based on the mean absolute SHAP values is presented in Figure 5A. AFP and CRP emerged as the most influential predictors, followed by cirrhosis, maximum tumor diameter, PLT, Child-Pugh grade, and portal hypertension. The SHAP summary plot (Figure 5B) further illustrated both the magnitude and direction of feature effects, demonstrating that higher AFP and CRP values were generally associated with increased recurrence risk, whereas higher PLT values tended to reduce the predicted risk. Local interpretability at the individual level is illustrated in Figure 5C–E using representative SHAP force plots. These examples showed how different combinations of clinical features contributed to individual recurrence risk predictions, highlighting the additive nature of feature effects and the heterogeneity of risk profiles among patients.
The relationships between predictors and model output were further explored using SHAP dependence plots (Figure 6A–G). AFP and CRP displayed pronounced nonlinear effects, with SHAP values increasing sharply beyond certain concentration ranges. Maximum tumor diameter showed a progressive positive association with recurrence risk, whereas PLT exhibited an inverse relationship. For categorical variables, including cirrhosis, Child-Pugh grade, and portal hypertension, clear stepwise differences in SHAP values were observed, indicating distinct risk contributions between categories. Overall, SHAP analysis provided both global and individualized explanations for model predictions, enhancing the transparency and clinical interpretability of the XGBoost model for predicting tumor recurrence after MWA.

Discussion

Discussion
In this retrospective cohort of 536 patients with sHCC treated with ultrasound-guided MWA, we developed and internally validated an IML model to predict 1-year post-ablation recurrence. Tumor recurrence occurred in 29.1% of the overall cohort, underscoring that early relapse remains a clinically relevant challenge even in small tumors treated with curative-intent ablation. Using a two-step Boruta-LASSO feature selection strategy, we consistently identified seven routinely available variables, including maximum tumor diameter, Child-Pugh grade, cirrhosis, portal hypertension, PLT, AFP, and CRP, which were subsequently used as inputs for model development. Among six candidate algorithms, the XGBoost model provided the best overall discrimination and generalizability (training AUC 0.926; validation AUC 0.889), accompanied by favorable calibration and meaningful net benefit across a broad range of thresholds on DCA. Importantly, SHAP analyses provided transparent explanations at both the global and individual levels, revealing that AFP and CRP contributed most to recurrence risk, followed by cirrhosis and tumor diameter, and demonstrating nonlinear risk patterns for key continuous variables. Collectively, our findings suggest that an interpretable XGBoost model based on routine clinical variables can facilitate individualized recurrence risk stratification after MWA for sHCC.
Recurrence after curative-intent therapy for sHCC is common and represents a major determinant of long-term outcomes; therefore, accurate risk stratification after ablation is an active area of investigation. Prior prediction efforts in this space have largely relied on regression-based models and nomograms. For instance, Wang et al retrospectively developed a logistic regression-based nomogram to predict 1-year recurrence after ultrasound-guided MWA for HCC in a cohort of 119 patients, achieving an AUC of 0.86 in the validation cohort.11 Zhang et al established a multi-parameter model integrating preoperative MRI features to predict early recurrence within one year after MWA in patients with hepatitis B-associated HCC. The model demonstrated good external performance, with an AUC of 0.842 in the validation cohort.12 Despite these encouraging results, several limitations remain in the existing predictive literature. Most available models were derived from retrospective cohorts of small sample sizes, which may restrict statistical stability and generalizability. In addition, regression-based approaches and nomograms inherently rely on assumptions of linearity and additivity among predictors, potentially limiting their ability to capture complex nonlinear relationships and high-order interactions that underlie tumor recurrence after ablation. Imaging-driven models, although promising, often require advanced imaging protocols and feature extraction pipelines that may not be uniformly available across clinical settings, thereby constraining real-world applicability. In addition, recent ML studies have also explored clinically relevant outcomes after MWA in HCC. For example, Ren et al developed ML-based models to predict local tumor progression after initial MWA in patients with early-stage HCC and reported that the best-performing CatBoost model achieved an AUC of 0.898 in the training cohort.24 Although local tumor progression and overall recurrence represent different clinical endpoints, this study further supports the potential value of ML approaches in post-ablation risk assessment. Against this background, the present study leverages a substantially larger cohort of patients with sHCC and adopts a ML-based framework with rigorous feature selection to model recurrence risk without prespecified linear assumptions. By systematically comparing multiple algorithms and incorporating SHAP-based interpretability, our approach enables both robust discrimination and transparent understanding of individualized risk profiles. Notably, the final XGBoost model demonstrated excellent and consistent discriminative performance, achieving an AUC of 0.926 in the training cohort and 0.889 in the validation cohort, which compares favorably with previously reported models. These findings suggest that an IML model based on routinely available clinical variables may offer a more flexible and effective strategy for recurrence risk stratification after MWA in patients with sHCC.
The seven predictors retained in the final model are clinically coherent and reflect the interplay between tumor aggressiveness, systemic inflammatory status, and the underlying hepatic microenvironment, all of which are central to recurrence risk after curative-intent MWA. Tumor size emerged as a fundamental determinant of recurrence risk after MWA. Even within the spectrum of sHCC, increasing tumor diameter has been consistently associated with more aggressive tumor biology, including a higher likelihood of microscopic satellite nodules and microvascular invasion, which are not readily detectable on preoperative imaging but predispose to early relapse.25,26 From a technical standpoint, larger tumors pose greater challenges in achieving complete and homogeneous ablation with adequate safety margins, particularly in lesions approaching 3 cm or located near major vessels, where heat-sink effects and anatomical constraints may compromise energy deposition.27,28 These biological and procedural factors jointly explain the monotonic increase in recurrence risk with increasing tumor size observed in our SHAP analyses. Child-Pugh grade serves as a composite marker of hepatic functional reserve and chronic liver injury severity. Poorer Child-Pugh status may impair post-ablation liver regeneration and create a permissive microenvironment for residual tumor growth.29,30 Consistently, our findings indicate that higher Child-Pugh grade is associated with an increased risk of recurrence after MWA. Cirrhosis is a pathological state that develops after prolonged liver damage, often leading to chronic inflammation and fibrosis within the liver. Pinter et al31 found that most hepatocellular carcinomas arise against a cirrhotic background. In cirrhotic patients, disrupted liver architecture and abnormal blood circulation create favorable conditions for tumor cell growth and metastasis. Following microwave ablation therapy for small hepatocellular carcinomas, the presence of cirrhosis can impair the liver’s repair and regenerative capacity. This impairs the liver’s ability to eliminate residual tumor cells, allowing them to evade the body’s immune surveillance and killing mechanisms, thereby increasing the risk of tumor recurrence.32 Our research also observed that patients with cirrhosis are more prone to recurrence after ablation. Portal hypertension is a clinical syndrome caused by elevated pressure within the portal venous system, primarily resulting from increased resistance to portal blood flow or augmented portal inflow.33 Jang et al reported that portal hypertension is an important factor influencing long-term survival and tumor recurrence after local ablative therapies.34 In patients with sHCC, portal hypertension readily induces alterations in portal hemodynamics, characterized by increased portal venous pressure and reduced blood flow velocity. Under such conditions, even after effective MVA, residual tumor cells may disseminate intrahepatically via the portal venous system, thereby increasing the risk of tumor recurrence. In addition, portal hypertension promotes the development of portosystemic collateral circulation, which may allow tumor cells to bypass normal hepatic metabolism and immune surveillance, further elevating the risk of distant dissemination and post-treatment recurrence.35
PLT also contributed meaningfully to recurrence prediction. In a retrospective cohort of 172 patients, Pang et al demonstrated that PLT and multiple platelet-based indices were independently associated with postoperative recurrence in HCC.36 In addition, a large meta-analysis of 5545 patients demonstrated that a low preoperative PLT significantly increased the risk of recurrence and poor recurrence-free survival in HCC.37 In chronic liver disease, thrombocytopenia commonly reflects portal hypertension and advanced hepatic fibrosis and may additionally influence tumor progression through angiogenic and immunomodulatory pathways.38 Consistently, lower PLT in our cohort was associated with an increased risk of post-ablation recurrence, highlighting the interplay between portal hemodynamics and host hematologic status. AFP is an important biomarker for the diagnosis and surveillance of hepatocellular carcinoma, and elevated AFP levels may indicate increased tumor activity or tumor burden.39 In patients with sHCC, high serum AFP levels may promote tumor angiogenesis by modulating the peritumoral microenvironment, thereby providing essential nutrients and oxygen to support tumor cell proliferation and dissemination. In addition, elevated AFP has been shown to exert immunosuppressive effects on the host immune system, including inhibition of T-lymphocyte activity and attenuation of natural killer cell cytotoxicity, which weakens immune surveillance and clearance of malignant cells and consequently increases the risk of tumor recurrence.40 CRP, a marker of systemic inflammation, further highlights the role of host inflammatory status in recurrence risk.41 Chronic inflammation is a recognized driver of hepatocarcinogenesis and tumor progression, particularly in the cirrhotic liver.42 Elevated CRP levels may indicate a pro-tumorigenic inflammatory milieu that facilitates tumor cell survival, angiogenesis, and immune evasion.43 The association between higher CRP levels and increased recurrence risk observed in our model supports the concept that systemic inflammation contributes to an unfavorable post-ablation tumor microenvironment. It is worth noting that TBIL was identified as a relevant feature in the initial Boruta screening but was not retained in the final model after LASSO selection. This discrepancy may be attributed to differences in the underlying principles of the two methods. Boruta aims to identify all potentially relevant variables, including those with weak or indirect associations, whereas LASSO favors a more parsimonious model by selecting variables with stronger and more stable predictive contributions. In the presence of other liver function–related variables, the effect of TBIL may have been relatively redundant or less informative, leading to its exclusion in the final model.
The current study presents several methodological strengths that enhance the robustness and clinical applicability of the predictive model. The use of a two-step feature selection strategy, incorporating both Boruta and LASSO regression, is a major strength. The Boruta algorithm effectively identified relevant variables by comparing original features to their randomized shadow counterparts, ensuring that only truly important features were retained. LASSO regression further refined the feature selection process, minimizing overfitting and ensuring that only variables with consistent predictive value were included in the final model. This combined approach minimized redundancy and ensured the stability of the selected predictors. Another strength of the study is its use of multiple ML models for comparison (SVM, RF, KNN, DT, XGBoost, and LR), which enabled a comprehensive evaluation of the most appropriate model. Among the six models, XGBoost demonstrated superior performance in terms of discrimination, achieving an AUC of 0.926 in the training set and 0.889 in the validation set. This robust performance was coupled with favorable calibration and clinical utility, as assessed through DCA, further supporting its potential for real-world application. The integration of SHAP for model interpretability is another important strength. SHAP provides a transparent framework for understanding how individual features contribute to predictions, both globally and at the patient level. This is particularly important for clinical adoption, as it allows clinicians to interpret the model’s decisions and gain insights into the underlying risk factors contributing to recurrence, thus fostering trust in the model’s output. The predictive model developed in this study has direct clinical implications for managing patients with sHCC following MWA. Tumor recurrence remains a major challenge, and the ability to predict recurrence risk early can guide clinicians in personalizing patient care. By identifying patients at higher risk for recurrence, clinicians can modify follow-up plans, increase the intensity of surveillance, and implement timely interventions, potentially improving long-term outcomes. By utilizing routinely collected clinical data, including tumor size, Child-Pugh grade, and biomarkers such as AFP and CRP, the model offers a practical tool that can be easily integrated into daily clinical workflows without the need for complex or expensive tests. This makes it a cost-effective and efficient solution for risk stratification in real-world settings. The inclusion of SHAP-based interpretability further strengthens the model’s clinical value. SHAP allows clinicians to understand how each feature contributes to the prediction of recurrence, providing transparent, individualized explanations. This transparency fosters greater trust and confidence in the model, making it easier for healthcare providers to use the model as a supportive tool in clinical decision-making. While this study demonstrates promising results, there are several limitations that should be addressed in future research. First, the main limitation of this study is that all data were derived from a single center and no external validation was performed. Although internal validation was conducted, the absence of external validation substantially limits the generalizability of the model across different institutions and patient populations. Therefore, multicenter prospective studies with independent external validation are needed before this model can be more broadly applied in clinical practice. Second, although the study included a wide range of clinical and procedural variables, it did not consider all potential factors that might influence recurrence risk, such as genetic or molecular markers. Incorporating genomic or proteomic data into the model could further improve its accuracy and predictive power, particularly for patients with specific molecular subtypes of HCC. Future studies could explore the integration of these data to enhance the model’s precision. Third, the follow-up period in this study was limited to one year, and the model was specifically developed to predict early recurrence after MWA. However, early and late recurrence of HCC may differ in their underlying biological mechanisms, with early recurrence more likely related to residual microscopic disease or aggressive tumor biology, whereas late recurrence may be more closely associated with the carcinogenic background of the diseased liver. Therefore, predicting recurrence within one year may not fully reflect the long-term prognostic trajectory of patients. Future studies with longer follow-up are needed to assess the model’s performance for predicting long-term recurrence patterns and outcomes. Fourth, an imbalance between recurrence and non-recurrence cases was present in the dataset, which may introduce potential bias in model training and prediction. Although multiple performance metrics were used to provide a comprehensive evaluation, class imbalance may still affect model performance. Future studies with more balanced datasets or the use of resampling techniques may further improve the robustness and generalizability of the model. Finally, in the present study, recurrence was defined as the presence of newly enhanced lesions either outside the ablation zone or elsewhere in the liver, without further differentiation between local tumor progression and intrahepatic distant recurrence. These recurrence patterns may be associated with different biological mechanisms and clinical implications. However, due to limitations in the available retrospective data, detailed classification of recurrence patterns was not consistently available. Therefore, we were unable to evaluate whether the predictive performance of the model differs between these subtypes. Future studies incorporating standardized definitions of recurrence patterns are needed to further refine and validate the model.

Conclusion

Conclusion
In conclusion, our study presents an IML model developed to predict tumor recurrence after MWA in patients with sHCC. By incorporating seven key predictors, including maximum tumor diameter, Child-Pugh grade, cirrhosis, portal hypertension, PLT, AFP, and CRP, the model demonstrated strong predictive performance with an AUC of 0.926 in the training set and 0.889 in the validation set. The application of SHAP analysis further enhanced the model’s clinical value by providing clear explanations of how individual variables contribute to recurrence risk. These findings suggest that the model may serve as a useful and clinically accessible approach for individualized risk stratification and post-ablation management. However, because the model was developed and internally validated in a single-center cohort, further external validation in larger multicenter populations and longer follow-up are required to confirm its generalizability and long-term clinical utility. Future studies should focus on external validation and on exploring whether the integration of additional molecular or genomic data could further enhance its accuracy and clinical utility.

출처: PubMed Central (JATS). 라이선스는 원 publisher 정책을 따릅니다 — 인용 시 원문을 표기해 주세요.

🏷️ 같은 키워드 · 무료전문 — 이 논문 MeSH/keyword 기반

Intra-procedural Cone-Beam Computed Tomography for Ultrasound-Guided Microwave Ablation of Hepatocellular Carcinoma: An Initial Experience.
Cureus 2026 Kamaledeen S 외 📖 unpaywall
Impact of Tumor Size on Outcomes of Hepatic Arteriography and C-Arm CT-Guided Ablation (HepACAGA): > 3 cm Is No Absolute Contraindication.
Cardiovascular and interventional radiology 2026 Wijnen N 외 📖 unpaywall
CT Guided Microwave Ablation for Hepatocellular Carcinomas: Outcomes From a Tertiary Australian Centre.
Journal of medical imaging and radiation oncology 2026 Whittering A 외 📖 unpaywall
Microwave ablation combined with dendritic cells enhances CD8 T cell activation in rechallenged tumor mouse model.
Cancer immunology, immunotherapy : CII 2026 Ma J 외 📖 OA
Taiwan Academy of Tumor Ablation (TATA) consensus on hepatocellular carcinoma ablation.
Hepatology international 2026 Chang CW 외 📖 unpaywall
Targeted microwave ablation of localised prostate cancer (VIOLETTE trial): a prospective multicentre study.
BJU international 2026 Delongchamps NB 외 📖 unpaywall