Hepatocellular carcinoma (HCC) and focal nodular hyperplasia (FNH) showing iso- or hyperintensity in the hepatobiliary phase: differentiation using Gd-EOB-DTPA enhanced MRI radiomics and deep learning features.
1/5 보강
PICO 자동 추출 (휴리스틱, conf 2/4)
유사 논문P · Population 대상 환자/모집단
112 patients from three hospitals were collected totally.
I · Intervention 중재 / 시술
추출되지 않음
C · Comparison 대조 / 비교
추출되지 않음
O · Outcome 결과 / 결론
[CONCLUSION] The combined deep learning models based on Gd-EOB-DTPA enhanced MRI may be useful for discriminating HCC from FNH showing iso-or hyperintensity in the HBP. [SUPPLEMENTARY INFORMATION] The online version contains supplementary material available at 10.1186/s12880-025-01927-3.
[BACKGROUND] To develop and validate radiomics and deep learning models based on Gd-EOB-DTPA enhanced MRI for differentiation between hepatocellular carcinoma (HCC) and focal nodular hyperplasia (FNH)
- 95% CI 0.779–0.979
APA
Mao HY, Hu JC, et al. (2025). Hepatocellular carcinoma (HCC) and focal nodular hyperplasia (FNH) showing iso- or hyperintensity in the hepatobiliary phase: differentiation using Gd-EOB-DTPA enhanced MRI radiomics and deep learning features.. BMC medical imaging, 25(1), 397. https://doi.org/10.1186/s12880-025-01927-3
MLA
Mao HY, et al.. "Hepatocellular carcinoma (HCC) and focal nodular hyperplasia (FNH) showing iso- or hyperintensity in the hepatobiliary phase: differentiation using Gd-EOB-DTPA enhanced MRI radiomics and deep learning features.." BMC medical imaging, vol. 25, no. 1, 2025, pp. 397.
PMID
41023894 ↗
Abstract 한글 요약
[BACKGROUND] To develop and validate radiomics and deep learning models based on Gd-EOB-DTPA enhanced MRI for differentiation between hepatocellular carcinoma (HCC) and focal nodular hyperplasia (FNH) showing iso- or hyperintensity in the hepatobiliary phase (HBP).
[METHODS] 112 patients from three hospitals were collected totally. 84 patients from hospital a and b with 54 HCCs and 30 FNHs randomly divided into a training cohort ( = 59: 38 HCC; 21 FNH) and an internal validation cohort ( = 25: 16 HCC; 9 FNH). A total of 28 patients from hospital c ( = 28: 20 HCC; 8 FNH) acted as an external test cohort. 1781 radiomics features were extracted from tumor volumes of interest (VOIs) in the pre-contrast phase (Pre), arterial phase (AP), portal venous phase (PP) and HBP images. 512 deep learning features were extracted from VOIs in the AP, PP and HBP images. Pearson correlation coefficient (PCC) and analysis of variance (ANOVA) were used to select the useful features. Conventional, delta radiomics and deep learning models were established using machine learning algorithms (support vector machine [SVM] and logistic regression [LR]) and their discriminatory efficacy assessed and compared.
[RESULTS] The combined deep learning models demonstrated the highest diagnostic performance in both the internal validation and external test cohorts, with area under the curve (AUC) values of 0.965 (95% confidence interval [CI]: 0.906, 1.000) and 0.851 (95% CI: 0.620, 1.000) respectively. The conventional and delta radiomics models achieved AUCs of 0.944 (95% CI: 0.779–0.979) and 0.938 (95% CI: 0.836–1.000) respectively, which were not significantly different from the deep learning models or each other ( = 0.559, 0.256, and 0.137).
[CONCLUSION] The combined deep learning models based on Gd-EOB-DTPA enhanced MRI may be useful for discriminating HCC from FNH showing iso-or hyperintensity in the HBP.
[SUPPLEMENTARY INFORMATION] The online version contains supplementary material available at 10.1186/s12880-025-01927-3.
[METHODS] 112 patients from three hospitals were collected totally. 84 patients from hospital a and b with 54 HCCs and 30 FNHs randomly divided into a training cohort ( = 59: 38 HCC; 21 FNH) and an internal validation cohort ( = 25: 16 HCC; 9 FNH). A total of 28 patients from hospital c ( = 28: 20 HCC; 8 FNH) acted as an external test cohort. 1781 radiomics features were extracted from tumor volumes of interest (VOIs) in the pre-contrast phase (Pre), arterial phase (AP), portal venous phase (PP) and HBP images. 512 deep learning features were extracted from VOIs in the AP, PP and HBP images. Pearson correlation coefficient (PCC) and analysis of variance (ANOVA) were used to select the useful features. Conventional, delta radiomics and deep learning models were established using machine learning algorithms (support vector machine [SVM] and logistic regression [LR]) and their discriminatory efficacy assessed and compared.
[RESULTS] The combined deep learning models demonstrated the highest diagnostic performance in both the internal validation and external test cohorts, with area under the curve (AUC) values of 0.965 (95% confidence interval [CI]: 0.906, 1.000) and 0.851 (95% CI: 0.620, 1.000) respectively. The conventional and delta radiomics models achieved AUCs of 0.944 (95% CI: 0.779–0.979) and 0.938 (95% CI: 0.836–1.000) respectively, which were not significantly different from the deep learning models or each other ( = 0.559, 0.256, and 0.137).
[CONCLUSION] The combined deep learning models based on Gd-EOB-DTPA enhanced MRI may be useful for discriminating HCC from FNH showing iso-or hyperintensity in the HBP.
[SUPPLEMENTARY INFORMATION] The online version contains supplementary material available at 10.1186/s12880-025-01927-3.
🏷️ 키워드 / MeSH 📖 같은 키워드 OA만
📖 전문 본문 읽기 PMC JATS · ~74 KB · 영문
Background
Background
Hepatocellular carcinoma (HCC), the fourth leading cause of cancer-related deaths worldwide [1], is the predominant histological subtype of liver cancer, accounting for 90 percentile of primary liver cancer. Enhanced MRI has been routinely used to help diagnose liver masses, especially lesions with typical characteristics. At present, the most commonly used liver-specific contrast agents are Gd-BOPTA and Gd-EOB-DTPA. Gd-BOPTA is appropriate for the dynamic evaluation of vascular structural changes due to its faster excretion rate [2]. Gd-EOB-DTPA, on the other hand, is more sensitive to both hepatocyte uptake and biliary excretion phases of focal liver lesions, making it particularly suitable for differential diagnosis. Research has demonstrated that the hepatobiliary phase (HBP) of Gd-EOB-DTPA provides higher diagnostic accuracy than that of Gd-BOPTA in distinguishing hepatocellular adenoma (HCA) from focal nodular hyperplasia (FNH) [3]. Likewise, Gd-EOB-DTPA also exhibits significant value in HCC from other similar hepatic lesions.
In Gd-EOB-DTPA enhanced MRI, typical HCC shows “wash in” in the arterial phase (AP), “wash out” in the portal venous phase (PP) and hypointensity in the hepatobiliary phase (HBP) [4]. FNH is the second most common benign liver tumor, which is polyclonal tumor-like lesion and does not undergo haemorrhage or malignant transformation [5]. Typical FNH looks like “spoke wheel”, which refers to the internal vascular architecture seen in the lesion, with central centrifugal arterial vessels radiating from a central artery towards the periphery of the lesion [6–10]. The distinction between HCC and FNH is crucial since their management approaches differ substantially. However, the typical central scar can be observed in only 20–30% of FNH cases on MRI [11] and the central scar or central scar-like imaging findings can also be observed in 50% of non-cirrhotic HCC cases, especially fibrolamellar HCC [12, 13]. Due to the overexpression of anion-transporting polypeptide (OATP)1B3 of well-differentiated HCC [14, 15], the lesion can take up specific contrast agents and shows iso- or hyperintensity in the HBP [16], [17]. meanwhile, approximately 10–12% of FNHs may not show iso-or hyperintensity in the HBP [5, 18]. Therefore, we could not completely differentiate HCC from FNH showing iso- or hyperintensity in the HBP preoperatively by Gd-EOB-DTPA enhanced MRI [19].
As an important image analysis technique in oncology, radiomics [20–23] extracts objective and quantitative features hidden in the images and reflects the heterogeneity inside the lesions, avoiding the error of subjective visual imaging assessment. Recent studies have found that radiomics features are closely related to tumor microstructure and biological behavior [24], making it widely used in early disease recurrence, efficacy evaluation, and risk factors prediction. Delta radiomics and deep learning have also been preliminarily applied in the diagnosis of liver cancer [25, 26]. However, it has not been clearly reported that radiomics and deep learning features were fused to construct a joint model to distinguish HCC and FNH.
Thus, the purpose of this study was to validate a diagnostic model integrating Gd-EOB-DTPA enhanced MRI radiomics and deep learning features for discriminating HCC from FNH show iso-or hyperintensity in the HBP before surgery.
Hepatocellular carcinoma (HCC), the fourth leading cause of cancer-related deaths worldwide [1], is the predominant histological subtype of liver cancer, accounting for 90 percentile of primary liver cancer. Enhanced MRI has been routinely used to help diagnose liver masses, especially lesions with typical characteristics. At present, the most commonly used liver-specific contrast agents are Gd-BOPTA and Gd-EOB-DTPA. Gd-BOPTA is appropriate for the dynamic evaluation of vascular structural changes due to its faster excretion rate [2]. Gd-EOB-DTPA, on the other hand, is more sensitive to both hepatocyte uptake and biliary excretion phases of focal liver lesions, making it particularly suitable for differential diagnosis. Research has demonstrated that the hepatobiliary phase (HBP) of Gd-EOB-DTPA provides higher diagnostic accuracy than that of Gd-BOPTA in distinguishing hepatocellular adenoma (HCA) from focal nodular hyperplasia (FNH) [3]. Likewise, Gd-EOB-DTPA also exhibits significant value in HCC from other similar hepatic lesions.
In Gd-EOB-DTPA enhanced MRI, typical HCC shows “wash in” in the arterial phase (AP), “wash out” in the portal venous phase (PP) and hypointensity in the hepatobiliary phase (HBP) [4]. FNH is the second most common benign liver tumor, which is polyclonal tumor-like lesion and does not undergo haemorrhage or malignant transformation [5]. Typical FNH looks like “spoke wheel”, which refers to the internal vascular architecture seen in the lesion, with central centrifugal arterial vessels radiating from a central artery towards the periphery of the lesion [6–10]. The distinction between HCC and FNH is crucial since their management approaches differ substantially. However, the typical central scar can be observed in only 20–30% of FNH cases on MRI [11] and the central scar or central scar-like imaging findings can also be observed in 50% of non-cirrhotic HCC cases, especially fibrolamellar HCC [12, 13]. Due to the overexpression of anion-transporting polypeptide (OATP)1B3 of well-differentiated HCC [14, 15], the lesion can take up specific contrast agents and shows iso- or hyperintensity in the HBP [16], [17]. meanwhile, approximately 10–12% of FNHs may not show iso-or hyperintensity in the HBP [5, 18]. Therefore, we could not completely differentiate HCC from FNH showing iso- or hyperintensity in the HBP preoperatively by Gd-EOB-DTPA enhanced MRI [19].
As an important image analysis technique in oncology, radiomics [20–23] extracts objective and quantitative features hidden in the images and reflects the heterogeneity inside the lesions, avoiding the error of subjective visual imaging assessment. Recent studies have found that radiomics features are closely related to tumor microstructure and biological behavior [24], making it widely used in early disease recurrence, efficacy evaluation, and risk factors prediction. Delta radiomics and deep learning have also been preliminarily applied in the diagnosis of liver cancer [25, 26]. However, it has not been clearly reported that radiomics and deep learning features were fused to construct a joint model to distinguish HCC and FNH.
Thus, the purpose of this study was to validate a diagnostic model integrating Gd-EOB-DTPA enhanced MRI radiomics and deep learning features for discriminating HCC from FNH show iso-or hyperintensity in the HBP before surgery.
Materials and methods
Materials and methods
Patients
The institutional Ethics Review Board approved this retrospective study and waived the requirement for written informed consent.
In this study, patients admitted to three hospitals from January 2015 to February 2023 were retrospectively collected. Inclusion criteria: ①Patients underwent Gd-EOB-DTPA enhanced MRI examination before surgery; ②The lesions showed iso-or hyperintensity in the HBP; ③The lesions were confirmed as HCC or FNH by postoperative pathology or immunohistochemistry; ④If there were two or more lesions, the largest one was selected. Exclusion criteria:①There were the artifacts in the images, which affected the observation of the lesions;②Clinical data were incomplete. A total of 112 patients (72 males and 40 females, mean age 55 years, range 21–86 years) were included in this study, including 74 cases of HCC showing iso-or hyperintensity in the HBP and 38 cases of FNH. 84 patients from hospitals a and b were randomly divided into a training cohort (n = 59: 38 HCC; 21 FNH) and an internal validation cohort (n = 25: 16 HCC; 9 FNH) in a ratio of 7:3. 28 patients from hospital c were used as external test cohort (n = 28: 20 HCC; 8 FNH). The flow chart for patient selection is shown in Fig. 1.
Preoperative routine laboratory examination results were collected, including age, gender, alanine aminotransferase (ALT ≤ 50U/liter;>50U/liter), aspartate aminotransferase (AST ≤ 40 U/liter; >40 U/liter), gamma-glutamyl transferase (GGT ≤ 60 U/liter; >60 U/liter), alpha-fetoprotein (AFP ≤ 25 µg/liter; >25 µg/ liter), viral hepatitis status and cirrhosis. All MRI images were independently reviewed by two radiologists with 5 and 10 years of experience in liver MRI imaging interpretation by using an image archiving and communication system (PACS, Neusoft v. 5.5, Shenyang, China).
MRI image acquisition and processing
The detailed data were presented in Supplementary material 1.
Volume of interest identification and segmentation
Gd-EOB-DTPA enhanced MRI images were segmented manually by a radiologist with 5 years of professional experience in 3D-Slicer (version 5.2.2; http://www.slicer.org). Tumor segmentation were reviewed by another radiologist who had 10 years of professional experience. The volumes of interest (VOIs) covered the whole tumor.
Radiomics and deep learning models establishment
Features used in this study were divided into two parts. Radiomics features were extracted by an open-source software package FAE (version 0.5.2; https://github.com/salan668/), including first-order statistics and shape, gray level cooccurrence matrix (GLCM), gray level size zone matrix (GLSZM), gray level run length matrix (GLRLM), gray level dependence matrix (GLDM), neighboring gray tone difference matrix (NGTDM) features.
Then, we used the ResNet3D-18 model [27], a 3D convolution neural network (CNN) to extract the deep learning features. First, the model is modified to remove the fully connected layer, and the part before the fully connected layer is used as the feature extractor, which is composed of multiple convolution layers and pooling layers. The last layer is the global average pooling layer, which can extract high-level feature representations from the input images. 3D deep learning features were expressed as deep feature_1 to 512.
All the radiomics and deep learning features were imported into FAE. First of all, radiomics features were normalized by determining the Mean algorithm. Then, Pearson correlation coefficients (PCCs) were calculated for dimensionality reduction: the PCC was calculated for each pair of features and if the PCC was larger than 0.90, one of them was randomly removed. Lastly, analysis of variance (ANOVA) was used for the feature selection. The range of the feature number was set from 5 to 20. Features with high collinearity (PCC < 0.90) were deleted, and the selected features were used to establish models using two machine learning algorithms: support vector machine [SVM] and logistic regression [LR].
In this study, we established three types of models, including conventional radiomics models (AP, PP, HBP and combined models), delta radiomics model (Pre-AP and AP-PP models) and deep learning models (AP, PP, HBP and combined models). The features used in the Pre-AP delta radiomics models were obtained by subtracting the values of the corresponding radiomics features extracted from the VOIs in the Pre and AP, and the AP-PP delta radiomics models were the same.
The flow chart of this study is shown in Supplementary Fig. 1.
Statistical analysis
SPSS26.0 was used for statistical analysis of clinical data, qualitative and quantitative features and rad-score. Shapiro-Wilk was applied to confirm whether the measurement data conformed to the normal distribution. Levene test was used to confirm whether the measurement data conforming to normal distribution were consistent with homogeneity of variance. The measurement data conformed to the normal distribution were expressed as mean ± standard deviation. The measurement data with non-normal distribution were expressed as M (Q1, Q3). In statistics, M (Q1, Q3) is a measurement method of median and interquartile range, which is usually used to describe the degree of concentration and dispersion of data. Q1 represents the 25% quartile of the data (i.e., the first quartile), and Q3 represents the 75% quartile (i.e., the third quartile). The count data were represented by the number of cases. The independent-sample t test or Mann–Whitney U test was performed to compare quantitative parameters and the chi-square test was used to compare qualitative features. The likelihood ratio test was used to perform multivariable logistic regression with forward step-wise variable selection. The identification capability of each model was assessed by the area under the receiver operating features (ROC) curve (AUC) in the training and validation cohorts. The DeLong test was used to compare the AUCs between these models. We performed a comprehensive Post Hoc Power Analysis to demonstrate whether our sample size was sufficient. All analyses were considered significant at P values of less than 0.05 (two-tailed).
Patients
The institutional Ethics Review Board approved this retrospective study and waived the requirement for written informed consent.
In this study, patients admitted to three hospitals from January 2015 to February 2023 were retrospectively collected. Inclusion criteria: ①Patients underwent Gd-EOB-DTPA enhanced MRI examination before surgery; ②The lesions showed iso-or hyperintensity in the HBP; ③The lesions were confirmed as HCC or FNH by postoperative pathology or immunohistochemistry; ④If there were two or more lesions, the largest one was selected. Exclusion criteria:①There were the artifacts in the images, which affected the observation of the lesions;②Clinical data were incomplete. A total of 112 patients (72 males and 40 females, mean age 55 years, range 21–86 years) were included in this study, including 74 cases of HCC showing iso-or hyperintensity in the HBP and 38 cases of FNH. 84 patients from hospitals a and b were randomly divided into a training cohort (n = 59: 38 HCC; 21 FNH) and an internal validation cohort (n = 25: 16 HCC; 9 FNH) in a ratio of 7:3. 28 patients from hospital c were used as external test cohort (n = 28: 20 HCC; 8 FNH). The flow chart for patient selection is shown in Fig. 1.
Preoperative routine laboratory examination results were collected, including age, gender, alanine aminotransferase (ALT ≤ 50U/liter;>50U/liter), aspartate aminotransferase (AST ≤ 40 U/liter; >40 U/liter), gamma-glutamyl transferase (GGT ≤ 60 U/liter; >60 U/liter), alpha-fetoprotein (AFP ≤ 25 µg/liter; >25 µg/ liter), viral hepatitis status and cirrhosis. All MRI images were independently reviewed by two radiologists with 5 and 10 years of experience in liver MRI imaging interpretation by using an image archiving and communication system (PACS, Neusoft v. 5.5, Shenyang, China).
MRI image acquisition and processing
The detailed data were presented in Supplementary material 1.
Volume of interest identification and segmentation
Gd-EOB-DTPA enhanced MRI images were segmented manually by a radiologist with 5 years of professional experience in 3D-Slicer (version 5.2.2; http://www.slicer.org). Tumor segmentation were reviewed by another radiologist who had 10 years of professional experience. The volumes of interest (VOIs) covered the whole tumor.
Radiomics and deep learning models establishment
Features used in this study were divided into two parts. Radiomics features were extracted by an open-source software package FAE (version 0.5.2; https://github.com/salan668/), including first-order statistics and shape, gray level cooccurrence matrix (GLCM), gray level size zone matrix (GLSZM), gray level run length matrix (GLRLM), gray level dependence matrix (GLDM), neighboring gray tone difference matrix (NGTDM) features.
Then, we used the ResNet3D-18 model [27], a 3D convolution neural network (CNN) to extract the deep learning features. First, the model is modified to remove the fully connected layer, and the part before the fully connected layer is used as the feature extractor, which is composed of multiple convolution layers and pooling layers. The last layer is the global average pooling layer, which can extract high-level feature representations from the input images. 3D deep learning features were expressed as deep feature_1 to 512.
All the radiomics and deep learning features were imported into FAE. First of all, radiomics features were normalized by determining the Mean algorithm. Then, Pearson correlation coefficients (PCCs) were calculated for dimensionality reduction: the PCC was calculated for each pair of features and if the PCC was larger than 0.90, one of them was randomly removed. Lastly, analysis of variance (ANOVA) was used for the feature selection. The range of the feature number was set from 5 to 20. Features with high collinearity (PCC < 0.90) were deleted, and the selected features were used to establish models using two machine learning algorithms: support vector machine [SVM] and logistic regression [LR].
In this study, we established three types of models, including conventional radiomics models (AP, PP, HBP and combined models), delta radiomics model (Pre-AP and AP-PP models) and deep learning models (AP, PP, HBP and combined models). The features used in the Pre-AP delta radiomics models were obtained by subtracting the values of the corresponding radiomics features extracted from the VOIs in the Pre and AP, and the AP-PP delta radiomics models were the same.
The flow chart of this study is shown in Supplementary Fig. 1.
Statistical analysis
SPSS26.0 was used for statistical analysis of clinical data, qualitative and quantitative features and rad-score. Shapiro-Wilk was applied to confirm whether the measurement data conformed to the normal distribution. Levene test was used to confirm whether the measurement data conforming to normal distribution were consistent with homogeneity of variance. The measurement data conformed to the normal distribution were expressed as mean ± standard deviation. The measurement data with non-normal distribution were expressed as M (Q1, Q3). In statistics, M (Q1, Q3) is a measurement method of median and interquartile range, which is usually used to describe the degree of concentration and dispersion of data. Q1 represents the 25% quartile of the data (i.e., the first quartile), and Q3 represents the 75% quartile (i.e., the third quartile). The count data were represented by the number of cases. The independent-sample t test or Mann–Whitney U test was performed to compare quantitative parameters and the chi-square test was used to compare qualitative features. The likelihood ratio test was used to perform multivariable logistic regression with forward step-wise variable selection. The identification capability of each model was assessed by the area under the receiver operating features (ROC) curve (AUC) in the training and validation cohorts. The DeLong test was used to compare the AUCs between these models. We performed a comprehensive Post Hoc Power Analysis to demonstrate whether our sample size was sufficient. All analyses were considered significant at P values of less than 0.05 (two-tailed).
Results
Results
Baseline characteristics
In the training cohort, the differences of age, gender, ALT, AST, AFP, HBV, and cirrhosis were statistically significant between HCC and FNH groups (all P < 0.05). (Table 1) In the internal validation cohort, the differences of age, gender HBV (+), and cirrhosis (+) were statistically significant between HCC and FNH groups (all P < 0.05). (Table 1)
There were no significant differences in age, gender, ALT, AST, GGT, AFP, HBV, and cirrhosis between training and internal validation cohorts (all P > 0.05) (Table 1).
Radiomics and deep learning features extraction and selection
A total of 1781 radiomics were extracted from VOIs in the Pre, AP, PP and HBP images, meanwhile, 512 deep learning features were extracted from VOIs in the AP, PP and HBP (Figs. 2, 3 and 4), respectively. The result of features selection was shown in the Tables 2 and 3. As is shown in the Table 2, there were 5 AP, 5 PP and 4 HBP and 11 radiomics features retained in the AP, PP, HBP and combined conventional radiomics models. There were 7 and 6 features retained in the Pre-AP and AP-PP delta radiomics models. In the Tables 2 and 3 radiomics and 5 deep learning features were obtained in the AP model; 8 radiomics and 2 deep learning features were obtained in the PP model; 2 radiomics and 6 deep learning features were obtained in the HBP model; 3 AP, 8 PP radiomics features and 3 AP, 1 PP, 1 HBP deep learning features were obtained in the combined model. Low correlation coefficients were observed in the selected features (Fig. 5). To enhance interpretability, we employed Class Activation Mapping (CAM) to visualize the outputs of ResNet3D-18 model, thereby facilitating an understanding of the critical regions emphasized by the models during the differentiation between HCC and FNH showing iso- or hyperintensity in the HBP. As is shown in Fig. 6, the red region converging toward the blue area is active, indicating that the models focus particularly on these areas.
Establishment and evaluation of radiomics and deep learning models
In the training cohort, SVM and LR classifiers were used to construct the three types of models based on the selected features in Tables 2 and 3. The performance of the diagnosis models was presented in Table 4. On the whole, the LR classifier had better performance than the SVM classifiers. Thus, LR was used for the following results. Totally speaking, the combined deep learning model showed the highest diagnostic performance in the training (AUC [95%CI] = 0.995[0.985,1.000]; sensitivity: 0.973; specificity: 0.952), internal validation (AUC [95%CI] = 0.965[0.906,1.000]; sensitivity: 0.938; specificity: 0.889) and external test cohorts (AUC [95%CI] = 0.851[0.620,1.000]; sensitivity: 0.857; specificity: 0.875). The combined conventional radiomics model and Pre-AP model also performed well, though not as well as combined deep learning model. (Table 4; Fig. 7) We quantified the contribution of each feature in combined deep learning model to the final prediction outcome using Shapley Additive Explanation (SHAP) methodology. Figure 8 confirms that AP radiomics feature 2, AP deep feature 352 and 371 serve as three key positive features in the model’s prediction—higher feature values correspond to an increased likelihood of favorable model outputs.
The feature abbreviations shown in the figures were respectively corresponding to specific features in Tables 2 and 3. AP, arterial phase; PP, portal venous phase; HBP, hepatobiliary phase.
Comparison of the diagnosis models
In the internal validation cohort, models showed the highest diagnostic performance among each type of models using LR classifier were compared by using DeLong test. The combined deep learning model showed higher AUC value than the combined and Pre-AP radiomics models, but the differences were not significant (P = 0.585 and 0.137). The AUC of Pre-AP radiomics model was lower than that of combined radiomics model, but the difference was not statistically significant (P = 0.256). (Table 5) In external test cohort, all the models using LR classifier were compared by using DeLong test. (Supplementary material 2)
We plotted the decision curve with the three best performing LR models (combined conventional radiomics model, Pre-AP model and combined deep learning model) of each type of models, and the curve showed that combined deep learning model had the highest clinical benefit of the three models. (Fig. 9)
Post hoc power analysis
The result of Post Hoc Power Analysis was presented in Supplementary material 3.
Baseline characteristics
In the training cohort, the differences of age, gender, ALT, AST, AFP, HBV, and cirrhosis were statistically significant between HCC and FNH groups (all P < 0.05). (Table 1) In the internal validation cohort, the differences of age, gender HBV (+), and cirrhosis (+) were statistically significant between HCC and FNH groups (all P < 0.05). (Table 1)
There were no significant differences in age, gender, ALT, AST, GGT, AFP, HBV, and cirrhosis between training and internal validation cohorts (all P > 0.05) (Table 1).
Radiomics and deep learning features extraction and selection
A total of 1781 radiomics were extracted from VOIs in the Pre, AP, PP and HBP images, meanwhile, 512 deep learning features were extracted from VOIs in the AP, PP and HBP (Figs. 2, 3 and 4), respectively. The result of features selection was shown in the Tables 2 and 3. As is shown in the Table 2, there were 5 AP, 5 PP and 4 HBP and 11 radiomics features retained in the AP, PP, HBP and combined conventional radiomics models. There were 7 and 6 features retained in the Pre-AP and AP-PP delta radiomics models. In the Tables 2 and 3 radiomics and 5 deep learning features were obtained in the AP model; 8 radiomics and 2 deep learning features were obtained in the PP model; 2 radiomics and 6 deep learning features were obtained in the HBP model; 3 AP, 8 PP radiomics features and 3 AP, 1 PP, 1 HBP deep learning features were obtained in the combined model. Low correlation coefficients were observed in the selected features (Fig. 5). To enhance interpretability, we employed Class Activation Mapping (CAM) to visualize the outputs of ResNet3D-18 model, thereby facilitating an understanding of the critical regions emphasized by the models during the differentiation between HCC and FNH showing iso- or hyperintensity in the HBP. As is shown in Fig. 6, the red region converging toward the blue area is active, indicating that the models focus particularly on these areas.
Establishment and evaluation of radiomics and deep learning models
In the training cohort, SVM and LR classifiers were used to construct the three types of models based on the selected features in Tables 2 and 3. The performance of the diagnosis models was presented in Table 4. On the whole, the LR classifier had better performance than the SVM classifiers. Thus, LR was used for the following results. Totally speaking, the combined deep learning model showed the highest diagnostic performance in the training (AUC [95%CI] = 0.995[0.985,1.000]; sensitivity: 0.973; specificity: 0.952), internal validation (AUC [95%CI] = 0.965[0.906,1.000]; sensitivity: 0.938; specificity: 0.889) and external test cohorts (AUC [95%CI] = 0.851[0.620,1.000]; sensitivity: 0.857; specificity: 0.875). The combined conventional radiomics model and Pre-AP model also performed well, though not as well as combined deep learning model. (Table 4; Fig. 7) We quantified the contribution of each feature in combined deep learning model to the final prediction outcome using Shapley Additive Explanation (SHAP) methodology. Figure 8 confirms that AP radiomics feature 2, AP deep feature 352 and 371 serve as three key positive features in the model’s prediction—higher feature values correspond to an increased likelihood of favorable model outputs.
The feature abbreviations shown in the figures were respectively corresponding to specific features in Tables 2 and 3. AP, arterial phase; PP, portal venous phase; HBP, hepatobiliary phase.
Comparison of the diagnosis models
In the internal validation cohort, models showed the highest diagnostic performance among each type of models using LR classifier were compared by using DeLong test. The combined deep learning model showed higher AUC value than the combined and Pre-AP radiomics models, but the differences were not significant (P = 0.585 and 0.137). The AUC of Pre-AP radiomics model was lower than that of combined radiomics model, but the difference was not statistically significant (P = 0.256). (Table 5) In external test cohort, all the models using LR classifier were compared by using DeLong test. (Supplementary material 2)
We plotted the decision curve with the three best performing LR models (combined conventional radiomics model, Pre-AP model and combined deep learning model) of each type of models, and the curve showed that combined deep learning model had the highest clinical benefit of the three models. (Fig. 9)
Post hoc power analysis
The result of Post Hoc Power Analysis was presented in Supplementary material 3.
Discussion
Discussion
Clinically, the imaging findings of HCC and FNH showing iso-or hyperintensity in the HBP are similar, it is difficult for preoperative subjective diagnosis. Gd-EOB-DTPA enhanced MRI is a common clinical examination, especially the HBP imaging findings can be more accurate in the differential diagnosis of benign and malignant liver tumor. Radiomics has been widely used in the study of liver disease. For example, Yu et al. [28] established intra-tumoral and peritumoral radiomics models to predict vessels encapsulating tumor clusters (VETC) in HCC, and Wu et al. [29] found that MRI radiomics models may be useful for discriminating dual-phenotype hepatocellular carcinoma (DPHCC) from non-DPHCC before surgery. In addition, compared to conventional machine learning methods, deep learning [30, 31] has emerged as a state-of-the-art machine learning method in many areas. The advantage of our research lies in the combination of conventional machine learning and deep learning methods.
Our results showed that the AUC of the combined deep learning model were higher than that of other models. Although the difference was not statistically significant between the combined deep learning model and other models, the AUC value of the combined deep learning model was higher than other models. What’s more, the decision curve indicated that the combined deep learning model had the highest clinical benefit, indicating that deep leaning models had higher diagnostic efficacy than conventional and delta radiomics models.
Delta radiomics [32], consisting in the analysis of feature variation at different acquisition time points, demonstrated potential utility for differential diagnosis in oncology. In this study, we extracted radiomics features of images at different scanning time points and subtracted the values of same features of each phase to obtain the delta-radiomics features. Finally, we established Pre-PP and AP-PP delta radiomics models using the features obtained from the Pre, AP and PP images. As well-differentiated HCC may not show the typical enhancement mode, while FNH usually significantly enhances in the AP and the degree of enhancement in the PP is close to normal liver parenchyma, resulting in a greater difference in texture features between Pre and AP than between AP and PP. That might explain why the AUCs of Pre-AP models were higher than those of AP-PP models. However, delta radiomics models did not perform as well as deep learning models in this study.
Generally speaking, the combined deep learning model had the highest identification efficiency because the model included conventional radiomics and deep learning features of AP, PP and HBP images, containing more comprehensive information. It can explain why deep leaning combined model had a better AUC value in discriminating HCC from FNH showing iso-or hyperintensity in the HBP. Therefore, deep learning methods combined radiomics diagnosis models may be a potential tool for the identification of HCC and FNH showing iso-or hyperintensity in the HBP, which is helpful for the clinical decision.
Cui et al. [33] have developed a deep learning-based radiomics nomogram model (DLRN) in order to predict the early response to neoadjuvant chemotherapy in gastric cancer. Based on enhanced CT original images, the VOIs were delineated at the maximum level of the lesions, the radiomics and deep learning features were extracted and the model was established. DLRN performed well in the training, internal and external validation cohorts. It may be due to the irregular morphology of gastric cancer lesions in CT images with irregular boundaries, so this study extracted 2D features of the largest layer of the lesions. Compared with this study, VOIs of our study delineated the entire 3D lesion and contained more information, making the study results more comprehensive and objective.
At present, there are a few reports on the identification of HCC and FNH. Ding et al. [17] chose eight radiomics features and four clinical factors (age, sex, HBsAg, and enhancement pattern) for the establishment of radiomics, clinical and combined models. The results of this study showed that the AUC of the combined model was significantly higher than that of the clinical or imaging model in both the training cohort (0.984 vs.0.937) and the validation cohort (0.972 vs.0.903), providing a non-invasive quantitative method for the differentiation between HCC and FNH in non-cirrhotic liver, with high predictive value. However, the cases of FNH included in this study were confirmed partly by pathological biopsy and partly by radiology diagnosis based on the European Association for the Study of the Liver (EASL) Clinical Practice Guidelines. The advantage of our study is that all the included FNH cases were confirmed by pathological biopsy. As a matter of fact, the difficulty in differentiating HCC and FNH is not whether both of them occur in the background of non-cirrhosis, but in the case of well differentiation of HCC, the imaging findings of the two are difficult to distinguish. As a multicenter study, we focused on the differentiation between HCC and FNH showing iso-or hyperintensity in the HBP, and established differential diagnosis models. What’s more, we set up an external test cohort to verify the repeatability of our models. However, the discrimination efficiency of the combined deep learning model in our study (internal validation cohort, AUC = 0.965) was lower than that of the clinical-radiomics model of Ding’s study (validation cohort, AUC = 0.972), the reason may be the small number of cases in our study.
Kitao Azusa et al. [14] analysed the imaging findings of dynamic CT and gadoxetic acid–enhanced MRI of hyperintense HCCs, FNHs and FNH-like nodule. The result showed that ADC ratio (P = 0.03) and arterial phase enhancement and washout pattern at dynamic CT (P = 0.04) were the independent factors for differentiation between hyperintense HCC and FNH. Although this study highlighted the identification of HCC and FNH showing hyperintensity in the HBP, the results were based on qualitative analysis, relying on subjective visual judgment of lesion enhancement patterns. Our study employed radiomics to investigate objective and quantitative features in higher dimensions of images, eliminating reliance on visual observations and enhancing preoperative identification accuracy.
Limitations
First of all, as a retrospective study with small sample sizes, we will continue to collect relevant cases in the future. What’s more, clinical features were not the focus of this study, if the number of cases can be expanded, clinical and radiomics can be combined for further study. Finally, only the images of Pre, AP, PP and HBP were segmented in this study, the other sequences, such as ADC, DWI or T2, could be added later to improve the multi-parameter study.
Clinically, the imaging findings of HCC and FNH showing iso-or hyperintensity in the HBP are similar, it is difficult for preoperative subjective diagnosis. Gd-EOB-DTPA enhanced MRI is a common clinical examination, especially the HBP imaging findings can be more accurate in the differential diagnosis of benign and malignant liver tumor. Radiomics has been widely used in the study of liver disease. For example, Yu et al. [28] established intra-tumoral and peritumoral radiomics models to predict vessels encapsulating tumor clusters (VETC) in HCC, and Wu et al. [29] found that MRI radiomics models may be useful for discriminating dual-phenotype hepatocellular carcinoma (DPHCC) from non-DPHCC before surgery. In addition, compared to conventional machine learning methods, deep learning [30, 31] has emerged as a state-of-the-art machine learning method in many areas. The advantage of our research lies in the combination of conventional machine learning and deep learning methods.
Our results showed that the AUC of the combined deep learning model were higher than that of other models. Although the difference was not statistically significant between the combined deep learning model and other models, the AUC value of the combined deep learning model was higher than other models. What’s more, the decision curve indicated that the combined deep learning model had the highest clinical benefit, indicating that deep leaning models had higher diagnostic efficacy than conventional and delta radiomics models.
Delta radiomics [32], consisting in the analysis of feature variation at different acquisition time points, demonstrated potential utility for differential diagnosis in oncology. In this study, we extracted radiomics features of images at different scanning time points and subtracted the values of same features of each phase to obtain the delta-radiomics features. Finally, we established Pre-PP and AP-PP delta radiomics models using the features obtained from the Pre, AP and PP images. As well-differentiated HCC may not show the typical enhancement mode, while FNH usually significantly enhances in the AP and the degree of enhancement in the PP is close to normal liver parenchyma, resulting in a greater difference in texture features between Pre and AP than between AP and PP. That might explain why the AUCs of Pre-AP models were higher than those of AP-PP models. However, delta radiomics models did not perform as well as deep learning models in this study.
Generally speaking, the combined deep learning model had the highest identification efficiency because the model included conventional radiomics and deep learning features of AP, PP and HBP images, containing more comprehensive information. It can explain why deep leaning combined model had a better AUC value in discriminating HCC from FNH showing iso-or hyperintensity in the HBP. Therefore, deep learning methods combined radiomics diagnosis models may be a potential tool for the identification of HCC and FNH showing iso-or hyperintensity in the HBP, which is helpful for the clinical decision.
Cui et al. [33] have developed a deep learning-based radiomics nomogram model (DLRN) in order to predict the early response to neoadjuvant chemotherapy in gastric cancer. Based on enhanced CT original images, the VOIs were delineated at the maximum level of the lesions, the radiomics and deep learning features were extracted and the model was established. DLRN performed well in the training, internal and external validation cohorts. It may be due to the irregular morphology of gastric cancer lesions in CT images with irregular boundaries, so this study extracted 2D features of the largest layer of the lesions. Compared with this study, VOIs of our study delineated the entire 3D lesion and contained more information, making the study results more comprehensive and objective.
At present, there are a few reports on the identification of HCC and FNH. Ding et al. [17] chose eight radiomics features and four clinical factors (age, sex, HBsAg, and enhancement pattern) for the establishment of radiomics, clinical and combined models. The results of this study showed that the AUC of the combined model was significantly higher than that of the clinical or imaging model in both the training cohort (0.984 vs.0.937) and the validation cohort (0.972 vs.0.903), providing a non-invasive quantitative method for the differentiation between HCC and FNH in non-cirrhotic liver, with high predictive value. However, the cases of FNH included in this study were confirmed partly by pathological biopsy and partly by radiology diagnosis based on the European Association for the Study of the Liver (EASL) Clinical Practice Guidelines. The advantage of our study is that all the included FNH cases were confirmed by pathological biopsy. As a matter of fact, the difficulty in differentiating HCC and FNH is not whether both of them occur in the background of non-cirrhosis, but in the case of well differentiation of HCC, the imaging findings of the two are difficult to distinguish. As a multicenter study, we focused on the differentiation between HCC and FNH showing iso-or hyperintensity in the HBP, and established differential diagnosis models. What’s more, we set up an external test cohort to verify the repeatability of our models. However, the discrimination efficiency of the combined deep learning model in our study (internal validation cohort, AUC = 0.965) was lower than that of the clinical-radiomics model of Ding’s study (validation cohort, AUC = 0.972), the reason may be the small number of cases in our study.
Kitao Azusa et al. [14] analysed the imaging findings of dynamic CT and gadoxetic acid–enhanced MRI of hyperintense HCCs, FNHs and FNH-like nodule. The result showed that ADC ratio (P = 0.03) and arterial phase enhancement and washout pattern at dynamic CT (P = 0.04) were the independent factors for differentiation between hyperintense HCC and FNH. Although this study highlighted the identification of HCC and FNH showing hyperintensity in the HBP, the results were based on qualitative analysis, relying on subjective visual judgment of lesion enhancement patterns. Our study employed radiomics to investigate objective and quantitative features in higher dimensions of images, eliminating reliance on visual observations and enhancing preoperative identification accuracy.
Limitations
First of all, as a retrospective study with small sample sizes, we will continue to collect relevant cases in the future. What’s more, clinical features were not the focus of this study, if the number of cases can be expanded, clinical and radiomics can be combined for further study. Finally, only the images of Pre, AP, PP and HBP were segmented in this study, the other sequences, such as ADC, DWI or T2, could be added later to improve the multi-parameter study.
Conclusion
Conclusion
The combined deep learning model shows high sensitivity and specificity in discriminating between HCC and FNH showing iso-or hyperintensity in the HBP in training, internal validation and external test cohorts and has good generalization and repeatability. Therefore, it may help diagnose HCC in preoperative individuals and assist clinicians in making decisions.
The combined deep learning model shows high sensitivity and specificity in discriminating between HCC and FNH showing iso-or hyperintensity in the HBP in training, internal validation and external test cohorts and has good generalization and repeatability. Therefore, it may help diagnose HCC in preoperative individuals and assist clinicians in making decisions.
Supplementary Information
Supplementary Information
Below is the link to the electronic supplementary material.
Below is the link to the electronic supplementary material.
출처: PubMed Central (JATS). 라이선스는 원 publisher 정책을 따릅니다 — 인용 시 원문을 표기해 주세요.
🏷️ 같은 키워드 · 무료전문 — 이 논문 MeSH/keyword 기반
- Raman Spectroscopic Signatures of Hepatic Carcinoma: Progress and Future Prospect.
- LCMS-Net: Deep Learning for Raw High Resolution Mass Spectrometry Data Applied to Forensic Cause-of-Death Screening.
- Heat Shock Protein 47 as a Novel Predictive and Diagnostic Biomarker for Thrombosis in Hepatocellular Carcinoma.
- Crosstalk Between -Regulatory Elements and Metabolism Reprogramming in Hepatocellular Carcinoma.
- TAZ WW Domain-Mediated Regulation of Gluconeogenesis and Tumorigenesis in Hepatocellular Carcinoma through Interaction with the Glucocorticoid Receptor.
- Enhanced efficacy and long-term survival with SBRT plus PD-1 inhibitors versus SBRT alone in unresectable HCC: a multicenter PSM study.