Deep Learning-aided H-MR Spectroscopy for Differentiating between Patients with and without Hepatocellular Carcinoma.
1/5 보강
PICO 자동 추출 (휴리스틱, conf 3/4)
유사 논문P · Population 대상 환자/모집단
10 patients from each group, big spectral datasets were simulated to develop 2 kinds of convolutional neural networks (CNNs): CNNs quantifying 15 metabolites and 5 lipid resonances (qCNNs) and CNNs classifying patients into HBV-LC and HBV-LC-HCC (cCNNs).
I · Intervention 중재 / 시술
MRI for HCC surveillance, without HCC (HBV-LC group, n = 20) and with HCC (HBV-LC-HCC group, n = 17)
C · Comparison 대조 / 비교
추출되지 않음
O · Outcome 결과 / 결론
The cCNNs exhibited sensitivity, specificity, and accuracy of 100% (7/7), 90% (9/10), and 94% (16/17), respectively, for identifying the HBV-LC-HCC group. [CONCLUSION] Deep-learning-aided H-MRS with data augmentation by spectral simulation may have potential in differentiating between HBV-LC patients with and without HCC.
[PURPOSE] Among patients with hepatitis B virus-associated liver cirrhosis (HBV-LC), there may be differences in the hepatic parenchyma between those with and without hepatocellular carcinoma (HCC).
- 표본수 (n) 20
- p-value P ≤0.004
APA
Bae JS, Lee HH, et al. (2025). Deep Learning-aided H-MR Spectroscopy for Differentiating between Patients with and without Hepatocellular Carcinoma.. Magnetic resonance in medical sciences : MRMS : an official journal of Japan Society of Magnetic Resonance in Medicine, 24(4). https://doi.org/10.2463/mrms.mp.2025-0064
MLA
Bae JS, et al.. "Deep Learning-aided H-MR Spectroscopy for Differentiating between Patients with and without Hepatocellular Carcinoma.." Magnetic resonance in medical sciences : MRMS : an official journal of Japan Society of Magnetic Resonance in Medicine, vol. 24, no. 4, 2025.
PMID
40790529 ↗
Abstract 한글 요약
[PURPOSE] Among patients with hepatitis B virus-associated liver cirrhosis (HBV-LC), there may be differences in the hepatic parenchyma between those with and without hepatocellular carcinoma (HCC). Proton MR spectroscopy (H-MRS) is a well-established tool for noninvasive metabolomics, but has been challenging in the liver allowing only a few metabolites to be detected other than lipids. This study aims to explore the potential of H-MRS of the liver in conjunction with deep learning to differentiate between HBV-LC patients with and without HCC.
[METHODS] Between August 2018 and March 2021, H-MRS data were collected from 37 HBV-LC patients who underwent MRI for HCC surveillance, without HCC (HBV-LC group, n = 20) and with HCC (HBV-LC-HCC group, n = 17). Based on a priori knowledge from the first 10 patients from each group, big spectral datasets were simulated to develop 2 kinds of convolutional neural networks (CNNs): CNNs quantifying 15 metabolites and 5 lipid resonances (qCNNs) and CNNs classifying patients into HBV-LC and HBV-LC-HCC (cCNNs). The performance of the cCNNs was assessed using the remaining patients in the 2 groups (10 HBV-LC and 7 HBV-LC-HCC patients).
[RESULTS] Using a simulated dataset, the quantitative errors with the qCNNs were significantly lower than those with a conventional nonlinear-least-squares-fitting method for all metabolites and lipids (P ≤0.004). The cCNNs exhibited sensitivity, specificity, and accuracy of 100% (7/7), 90% (9/10), and 94% (16/17), respectively, for identifying the HBV-LC-HCC group.
[CONCLUSION] Deep-learning-aided H-MRS with data augmentation by spectral simulation may have potential in differentiating between HBV-LC patients with and without HCC.
[METHODS] Between August 2018 and March 2021, H-MRS data were collected from 37 HBV-LC patients who underwent MRI for HCC surveillance, without HCC (HBV-LC group, n = 20) and with HCC (HBV-LC-HCC group, n = 17). Based on a priori knowledge from the first 10 patients from each group, big spectral datasets were simulated to develop 2 kinds of convolutional neural networks (CNNs): CNNs quantifying 15 metabolites and 5 lipid resonances (qCNNs) and CNNs classifying patients into HBV-LC and HBV-LC-HCC (cCNNs). The performance of the cCNNs was assessed using the remaining patients in the 2 groups (10 HBV-LC and 7 HBV-LC-HCC patients).
[RESULTS] Using a simulated dataset, the quantitative errors with the qCNNs were significantly lower than those with a conventional nonlinear-least-squares-fitting method for all metabolites and lipids (P ≤0.004). The cCNNs exhibited sensitivity, specificity, and accuracy of 100% (7/7), 90% (9/10), and 94% (16/17), respectively, for identifying the HBV-LC-HCC group.
[CONCLUSION] Deep-learning-aided H-MRS with data augmentation by spectral simulation may have potential in differentiating between HBV-LC patients with and without HCC.
🏷️ 키워드 / MeSH 📖 같은 키워드 OA만
- Humans
- Deep Learning
- Liver Neoplasms
- Carcinoma
- Hepatocellular
- Male
- Female
- Middle Aged
- Proton Magnetic Resonance Spectroscopy
- Diagnosis
- Differential
- Aged
- Liver
- Liver Cirrhosis
- Adult
- Hepatitis B
- Neural Networks
- Computer
- Magnetic Resonance Imaging
- deep learning
- hepatitis B virus
- hepatocellular carcinoma
- liver cirrhosis
- proton magnetic resonance spectroscopy
📖 전문 본문 읽기 PMC JATS · ~41 KB · 영문
Introduction
Introduction
Hepatocellular carcinoma (HCC) is the sixth most commonly diagnosed malignancy and the third leading cause of cancer-related death globally.1 Liver cirrhosis caused by chronic infection with the hepatitis B virus (HBV-LC) is one of the most common causes of HCC.2,3 Therefore, major international guidelines4–7 recommend that patients with HBV-LC undergo surveillance for HCC detection. However, not all patients with HBV-LC develop HCC, which may suggest differences in the hepatic parenchyma between patients with and without HCC.
MR spectroscopy (MRS) is a well-established tool for noninvasive metabolomics. Among the MR-detectable nuclei, the proton (1H) is regarded as the choice of nucleus in terms of MR sensitivity,8 allowing up to approximately 20 metabolites to be quantified in the brain on clinical scanners.9 However, MRS of the liver is challenging owing to motion, which results in poor field homogeneity, and consequently, a more severe spectral overlap between the metabolite signals.8 Furthermore, in proton MRS 1H-MRS), the strong lipid signals that are frequently observed in diseased livers exacerbate the difficulty of quantifying other metabolites in addition to their intrinsically low concentrations.8,10 Thus, 1H-MRS of the liver has mainly been used for lipid quantification11 and only a few metabolites other than lipids have typically been reported using clinical scanners such as glutamate-glutamine complex (Glx), glycogen (Glyc), and total choline (tCho).12,13 The limited number of quantifiable metabolites is a major hindrance to the further exploration of the potential of 1H-MRS in liver diseases.
The application of deep learning in medical imaging is rapidly increasing14 owing to its remarkable achievements. Its potential has also been reported in 1H-MRS of the brain for metabolite quantification15–20 and spectra classification.21,22 A deep learning approach typically requires a large amount of training data, which is particularly important for medical applications. However, in 1H-MRS, a large training dataset can be simulated based on a priori knowledge, as performed routinely in previous deep learning studies,15,17–20,22–24 which can further facilitate deep-learning-aided 1H-MRS (DL-1H-MRS) research.
In this study, we aimed to explore the potential of DL-1H-MRS of the liver in differentiating between HBV-LC patients with and without HCC using a limited amount of true 1H-MRS patient spectra that were collected in the liver parenchyma without lesions.
Hepatocellular carcinoma (HCC) is the sixth most commonly diagnosed malignancy and the third leading cause of cancer-related death globally.1 Liver cirrhosis caused by chronic infection with the hepatitis B virus (HBV-LC) is one of the most common causes of HCC.2,3 Therefore, major international guidelines4–7 recommend that patients with HBV-LC undergo surveillance for HCC detection. However, not all patients with HBV-LC develop HCC, which may suggest differences in the hepatic parenchyma between patients with and without HCC.
MR spectroscopy (MRS) is a well-established tool for noninvasive metabolomics. Among the MR-detectable nuclei, the proton (1H) is regarded as the choice of nucleus in terms of MR sensitivity,8 allowing up to approximately 20 metabolites to be quantified in the brain on clinical scanners.9 However, MRS of the liver is challenging owing to motion, which results in poor field homogeneity, and consequently, a more severe spectral overlap between the metabolite signals.8 Furthermore, in proton MRS 1H-MRS), the strong lipid signals that are frequently observed in diseased livers exacerbate the difficulty of quantifying other metabolites in addition to their intrinsically low concentrations.8,10 Thus, 1H-MRS of the liver has mainly been used for lipid quantification11 and only a few metabolites other than lipids have typically been reported using clinical scanners such as glutamate-glutamine complex (Glx), glycogen (Glyc), and total choline (tCho).12,13 The limited number of quantifiable metabolites is a major hindrance to the further exploration of the potential of 1H-MRS in liver diseases.
The application of deep learning in medical imaging is rapidly increasing14 owing to its remarkable achievements. Its potential has also been reported in 1H-MRS of the brain for metabolite quantification15–20 and spectra classification.21,22 A deep learning approach typically requires a large amount of training data, which is particularly important for medical applications. However, in 1H-MRS, a large training dataset can be simulated based on a priori knowledge, as performed routinely in previous deep learning studies,15,17–20,22–24 which can further facilitate deep-learning-aided 1H-MRS (DL-1H-MRS) research.
In this study, we aimed to explore the potential of DL-1H-MRS of the liver in differentiating between HBV-LC patients with and without HCC using a limited amount of true 1H-MRS patient spectra that were collected in the liver parenchyma without lesions.
Materials and Methods
Materials and Methods
This prospective, single-center study was approved by the institutional review board of the Seoul National University Hospital (H-1804-174-944) and informed consent was obtained from all participants.
Participants and final MRS dataset
Participants who had liver cirrhosis related to chronic HBV infection and were scheduled to undergo liver MRI for HCC surveillance or HCC diagnosis were consecutively enrolled (Fig. 1) between August 2018 and March 2021. MRI examinations were clinically indicated, and patients who agreed to participate underwent additional MRS during the same session. The presence of liver cirrhosis was determined based on the typical morphologic features including surface nodularity on previous imaging studies. The participants were categorized into 2 groups according to the formal radiologic MRI report: without HCC (group 1; HBV-LC) and with HCC (group 2; HBV-LC-HCC). The presence of HCC on the surveillance MRI was made based on the noninvasive criteria by the Korean Liver Cancer Association and National Cancer Center guideline.7 The exclusion criteria were: (a) a previous history of HCC treatment, (b) erroneous MRS data, and (c) suboptimal MRS data quality. A total of 40 participants were enrolled, with 20 participants in each group. The sample size was determined by considering the available funding for MRS examinations and the duration of the study period. Additionally, as this was a preliminary study designed to explore the novel hypothesis that metabolite profiles of liver parenchyma differ between HBV-LC patients with and without HCC, there were no prior studies or established data available for a formal sample size calculation. Thus, our study primarily aims to test the feasibility and potential value of this novel approach, providing foundational data to guide future larger-scale studies.
Among the MRS data from the 40 participants, 3 data were excluded because of suboptimal spectral quality (n = 2) or an incorrect raw data format (n = 1). Among the remaining 37 MRS data, the first 10 data from groups 1 and 2 were combined to develop the deep learning models (In Vivo Dataset I (n = 20)). The remaining data from groups 1 and 2 were combined to test the models (In Vivo Dataset II [n = 17; 10 from group 1 and 7 from group 2]).
In vivo 1H-MRS data acquisition
The MR data were collected using a 3.0T Siemens Magnetom Skyra MR scanner with a system body coil for RF transmission and an 18-channel body coil for signal reception (Siemens Healthineers, Erlangen, Germany). Scout images were collected during breath-holding (expiration state) along all 3 orthogonal directions using a T2-weighted echo-planar fast spin echo sequence. Based on the scout images, an MRS voxel (2 × 2 × 2 cm3) was positioned in the right lobe of the liver without lesions (primarily segment VI) avoiding major blood vessels and the bile duct (Fig. S1).
1H-MRS data were collected with and without water suppression using a respiratory navigator-gated (expiration state) point resolved spectroscopy sequence (PRESS;25 TR/TE = 2000/33 ms, spectral bandwidth = 1200 Hz, data points = 1024, number of signal averages = 48 [with water suppression] or 4 [without water suppression]). An auto-shimming was performed over the MRS voxel prior to the data acquisition. The total scan time for both MRI and MRS was approximately 20 min.
Strategies for DL-1H-MRS of the liver
Our ultimate goal was to develop convolutional neural networks (CNNs) that could classify patient liver spectra into the HBV-LC and HBV-LC-HCC groups (cCNNs) with limited true in vivo data. Therefore, the simulation of large training datasets was crucial, which should well represent the distributions of the hepatic metabolite and lipid contents of the 2 patient groups. This would require an accurate initial quantitative analysis of In Vivo Dataset I, which was used as the representative data for the 2 patient groups. However, 1H-MRS of the liver has been challenging to date, as it allows only a few metabolites to be quantified.8,10,12,13 To address this issue, we first developed CNNs for the quantification of up to 15 metabolites and 5 lipid resonances (qCNNs) using a preliminary simulated dataset (Simulated Dataset I [see Simulation and preparation of spectra below]). In the training of the qCNN, the extraction of detailed features simultaneously from both metabolite and lipid signals can be challenging in the presence of strong lipid signals in liver 1H-MRS spectra. Therefore, 2 separate qCNNs were developed: one for the quantification of metabolites only (qCNNmetab) and the other for the quantification of lipids only (qCNNlipid). The performances of the qCNNs were compared with that of the conventional nonlinear-least-squares fitting (NLSF) method on another simulated dataset (Simulated Dataset II [see Simulation and preparation of spectra below]).
Thereafter, the HBV-LC and HBV-LC-HCC patient data from In Vivo Dataset I were analyzed using the qCNNs. Then, to minimize potential overfitting and biased estimation of the cCNNs resulting from the small amount of data in In Vivo Dataset I, it was rearranged into 10 folds. Subsequently, the quantitative results were used to simulate the training dataset for the cCNNs (Simulated Dataset III [see Simulation and preparation of spectra below]). Finally, 1 cCNN was developed for each fold. The performances of the cCNNs were tested on In Vivo Dataset II, using an aggregate prediction approach (majority voting from the 10 cCNNs). The performances of the cCNNs were compared with that of a conventional linear discriminant analysis (LDA)-based classifier. The relationships among the in vivo datasets, simulated datasets, and CNNs are summarized in Fig. 2.
Simulation and preparation of spectra
In Vivo Dataset I was analyzed using AMARES26 in the jMRUI software package (v. 6.0)27 to acquire a priori knowledge for the simulation of patient liver spectra. A SNR range of 16.08–126.63 (57.44 ± 31.55), a linewidth range of 14.67–30.43 (20.67 ± 4.46) Hz, and a mean metabolite-to-lipid ratio of 1.853–2.955 (2.335 ± 0.324) were obtained (Supplementary Information 2). The same analysis was performed on In Vivo Dataset II for comparison.
A spectral basis set was prepared for NLSF and the simulation of the training datasets for the deep learning (Supplementary Information 3) based on the previous studies.13,28,29 A total of 15 metabolites were included: alanine (Ala), aspartate (Asp), citrate (Cit), creatine (Cr), glutamine (Gln), glutamate (Glu), glycerophosphocholine (GPC), glycine (Gly), Glyc, lactate (Lac), phosphocreatine (PCr), phosphocholine (PC), succinate (Suc), taurine (Tau), and threonine (Thr). The lipid signal was modeled using 5 resonance groups (0.9, 1.3, 2.05, 2.25, and 2.80 ppm, which were denoted as Lip 09, 13, 205, 225, and 280, respectively).
Simulated Dataset I was generated using in-house software written in Python (v. 3.7) according to previous studies18,23 (Supplementary Information 4) based on the prior knowledge from In Vivo Dataset I (the ranges of SNR, linewidth, and metabolite-to-lipid ratio) and the literature13,28,29 (metabolites and their relative concentrations) and the spectral basis set. A total of 100000 spectra were simulated and randomly divided into training (n = 90000), validation (n = 5000), and test (n = 5000) sets.
Simulated Dataset II was generated (n = 100) (Supplementary Information 4) to compare the performances of the qCNNs and NLSF in metabolite and lipid quantification. The concentrations of the individual metabolites and lipids were considered within their respective mean ± standard deviation (SD) values that were obtained from In Vivo Dataset I (without distinction between HBV-LC and HBV-LC-HCC) to enable a comparison in realistic concentration ranges in the patient livers.
In Vivo Dataset I was analyzed using the qCNNs for the generation of Simulated Dataset III (Supplementary Information 4). Thereafter, In Vivo Dataset I was rearranged into 10 folds, each of which consisted of training and test sets. The training set contained 9 HBV-LC and 9 HBV-LC-HCC patient data, whereas the test set consisted of 1 HBV-LC and 1 HBV-LC-HCC patient data. The training sets were used for the generation of Simulated Dataset III and the test sets were used for the evaluation of the accuracy of the cCNNs in the training phase (the final evaluation of the accuracy of the cCNNs is performed in the test phase on In Vivo Dataset II). Subsequently, 100000 spectra were simulated (50000 spectra for each of the HBV-LC and HBV-LC-HCC groups) for each fold. The concentrations of the individual metabolites and lipids were considered within their respective mean ± SD values that were obtained from In Vivo Dataset I for each patient group.
For deep learning, all simulated and in vivo spectra were cropped to the 0.2–4.0 ppm range and normalized. For in vivo data, the residual water signal was removed (Supplementary Information 4).
NLSF
Conventional NLSF was performed using QUEST30 in jMRUI,27 which is widely used for NLSF together with LCModel.9 The same metabolite and lipid basis sets that were used for the simulation of the spectra were used (Supplementary Information 5). In addition to the 15 metabolites, the contents of the Glx (= Gln + Glu), tCho (= PC + GPC), and total creatine (tCr = Cr + PCr) were obtained. For the in vivo data, the resulting metabolite content was expressed in mmol/kg wet weight using water-unsuppressed spectra.13,31 For the lipids, the fitted amplitude from the nonlinear fitting was reported for the individual resonances.
Deep learning
All CNNs were developed using PyTorch 1.11 implemented in Python 3.7.11, with 4 GPUs (NVIDIA Titan RTX). A detailed description of the design and training of the CNNs is provided in Supplementary Information 6.
qCNNmetab and qCNNlipid were developed on Simulated Dataset I to quantify the metabolites and lipids, respectively (Fig. 2). qCNNs use the real and imaginary spectra in a single channel as the input; thus, they are complex-valued CNNs.32,33 The CNNs yield a noise-free, line-narrowed, metabolite-only (qCNNmetab) or lipid-only (qCNNlipid) real spectrum as the output to facilitate the subsequent quantitative analysis by improving the SNR and reducing the degree of spectral overlap, as previously reported.18,23 The quantification of the individual metabolites and lipids was achieved by separately performing multiple regression on the metabolite-only and lipid-only output spectra using the corresponding basis sets.18,23 For the in vivo data, the resulting metabolite content was converted into mmol/kg. For the lipids, the fitted amplitudes from the multiple regression (linear fitting) were reported for the individual resonances.
The cCNNs were developed for the binary classification of the input spectra into HBV-LC and HBV-LC-HCC (Fig. 2) for each fold of Simulated Dataset III. The cCNNs also use the real and imaginary spectra in a single channel as the input (complex-valued CNNs) but yield a value between 0 and 1 as the output, which can be considered as the probability of belonging to the HBV-LC-HCC class. Accordingly, the labels of the HBV-LC and HBV-LC-HCC groups were set to 0 and 1, respectively, for training. In the test phase, all 10 cCNNs were applied to each sample in In Vivo Dataset II and the majority class was selected as the predicted class for the sample (majority voting). To identify the spectral regions that have a significant influence on the cCNNs’ decision, Grad-CAM analyses34 were performed.
LDA
An LDA model was developed for each fold (Supplementary Information 7) using the 10-fold In Vivo Dataset I that was used to develop the cCNNs. Only the metabolites and lipids that showed statistical differences in the qCNN-estimated content between the HBV-LC and HBV-LC-HCC groups in In Vivo Dataset I were included. The same procedure was repeated with the inclusion of all metabolites and lipids to generate another group of 10 LDA models for comparison. The performances of the LDA models were evaluated using In Vivo Dataset II by employing majority voting, as with the cCNNs.
Statistical analysis
Statistical analyses were performed using R 3.6.2 (R Foundation for Statistical Computing, Vienna, Austria). Two-tailed Student’s t-tests were performed for pair-wise group comparisons. For multiple pair-wise group comparisons, the Bonferroni correction was applied by adjusting P values. The statistical significance level was set to P < 0.05.
The developer of the deep learning and LDA models was completely blinded to the clinical status of the patients in In Vivo Dataset II throughout the study. The sensitivity (correct identification of HBV-LC-HCC), specificity (correct identification of HBV-LC), and overall diagnostic accuracy were examined in the final blinded tests of the cCNN and LDA models.
This prospective, single-center study was approved by the institutional review board of the Seoul National University Hospital (H-1804-174-944) and informed consent was obtained from all participants.
Participants and final MRS dataset
Participants who had liver cirrhosis related to chronic HBV infection and were scheduled to undergo liver MRI for HCC surveillance or HCC diagnosis were consecutively enrolled (Fig. 1) between August 2018 and March 2021. MRI examinations were clinically indicated, and patients who agreed to participate underwent additional MRS during the same session. The presence of liver cirrhosis was determined based on the typical morphologic features including surface nodularity on previous imaging studies. The participants were categorized into 2 groups according to the formal radiologic MRI report: without HCC (group 1; HBV-LC) and with HCC (group 2; HBV-LC-HCC). The presence of HCC on the surveillance MRI was made based on the noninvasive criteria by the Korean Liver Cancer Association and National Cancer Center guideline.7 The exclusion criteria were: (a) a previous history of HCC treatment, (b) erroneous MRS data, and (c) suboptimal MRS data quality. A total of 40 participants were enrolled, with 20 participants in each group. The sample size was determined by considering the available funding for MRS examinations and the duration of the study period. Additionally, as this was a preliminary study designed to explore the novel hypothesis that metabolite profiles of liver parenchyma differ between HBV-LC patients with and without HCC, there were no prior studies or established data available for a formal sample size calculation. Thus, our study primarily aims to test the feasibility and potential value of this novel approach, providing foundational data to guide future larger-scale studies.
Among the MRS data from the 40 participants, 3 data were excluded because of suboptimal spectral quality (n = 2) or an incorrect raw data format (n = 1). Among the remaining 37 MRS data, the first 10 data from groups 1 and 2 were combined to develop the deep learning models (In Vivo Dataset I (n = 20)). The remaining data from groups 1 and 2 were combined to test the models (In Vivo Dataset II [n = 17; 10 from group 1 and 7 from group 2]).
In vivo 1H-MRS data acquisition
The MR data were collected using a 3.0T Siemens Magnetom Skyra MR scanner with a system body coil for RF transmission and an 18-channel body coil for signal reception (Siemens Healthineers, Erlangen, Germany). Scout images were collected during breath-holding (expiration state) along all 3 orthogonal directions using a T2-weighted echo-planar fast spin echo sequence. Based on the scout images, an MRS voxel (2 × 2 × 2 cm3) was positioned in the right lobe of the liver without lesions (primarily segment VI) avoiding major blood vessels and the bile duct (Fig. S1).
1H-MRS data were collected with and without water suppression using a respiratory navigator-gated (expiration state) point resolved spectroscopy sequence (PRESS;25 TR/TE = 2000/33 ms, spectral bandwidth = 1200 Hz, data points = 1024, number of signal averages = 48 [with water suppression] or 4 [without water suppression]). An auto-shimming was performed over the MRS voxel prior to the data acquisition. The total scan time for both MRI and MRS was approximately 20 min.
Strategies for DL-1H-MRS of the liver
Our ultimate goal was to develop convolutional neural networks (CNNs) that could classify patient liver spectra into the HBV-LC and HBV-LC-HCC groups (cCNNs) with limited true in vivo data. Therefore, the simulation of large training datasets was crucial, which should well represent the distributions of the hepatic metabolite and lipid contents of the 2 patient groups. This would require an accurate initial quantitative analysis of In Vivo Dataset I, which was used as the representative data for the 2 patient groups. However, 1H-MRS of the liver has been challenging to date, as it allows only a few metabolites to be quantified.8,10,12,13 To address this issue, we first developed CNNs for the quantification of up to 15 metabolites and 5 lipid resonances (qCNNs) using a preliminary simulated dataset (Simulated Dataset I [see Simulation and preparation of spectra below]). In the training of the qCNN, the extraction of detailed features simultaneously from both metabolite and lipid signals can be challenging in the presence of strong lipid signals in liver 1H-MRS spectra. Therefore, 2 separate qCNNs were developed: one for the quantification of metabolites only (qCNNmetab) and the other for the quantification of lipids only (qCNNlipid). The performances of the qCNNs were compared with that of the conventional nonlinear-least-squares fitting (NLSF) method on another simulated dataset (Simulated Dataset II [see Simulation and preparation of spectra below]).
Thereafter, the HBV-LC and HBV-LC-HCC patient data from In Vivo Dataset I were analyzed using the qCNNs. Then, to minimize potential overfitting and biased estimation of the cCNNs resulting from the small amount of data in In Vivo Dataset I, it was rearranged into 10 folds. Subsequently, the quantitative results were used to simulate the training dataset for the cCNNs (Simulated Dataset III [see Simulation and preparation of spectra below]). Finally, 1 cCNN was developed for each fold. The performances of the cCNNs were tested on In Vivo Dataset II, using an aggregate prediction approach (majority voting from the 10 cCNNs). The performances of the cCNNs were compared with that of a conventional linear discriminant analysis (LDA)-based classifier. The relationships among the in vivo datasets, simulated datasets, and CNNs are summarized in Fig. 2.
Simulation and preparation of spectra
In Vivo Dataset I was analyzed using AMARES26 in the jMRUI software package (v. 6.0)27 to acquire a priori knowledge for the simulation of patient liver spectra. A SNR range of 16.08–126.63 (57.44 ± 31.55), a linewidth range of 14.67–30.43 (20.67 ± 4.46) Hz, and a mean metabolite-to-lipid ratio of 1.853–2.955 (2.335 ± 0.324) were obtained (Supplementary Information 2). The same analysis was performed on In Vivo Dataset II for comparison.
A spectral basis set was prepared for NLSF and the simulation of the training datasets for the deep learning (Supplementary Information 3) based on the previous studies.13,28,29 A total of 15 metabolites were included: alanine (Ala), aspartate (Asp), citrate (Cit), creatine (Cr), glutamine (Gln), glutamate (Glu), glycerophosphocholine (GPC), glycine (Gly), Glyc, lactate (Lac), phosphocreatine (PCr), phosphocholine (PC), succinate (Suc), taurine (Tau), and threonine (Thr). The lipid signal was modeled using 5 resonance groups (0.9, 1.3, 2.05, 2.25, and 2.80 ppm, which were denoted as Lip 09, 13, 205, 225, and 280, respectively).
Simulated Dataset I was generated using in-house software written in Python (v. 3.7) according to previous studies18,23 (Supplementary Information 4) based on the prior knowledge from In Vivo Dataset I (the ranges of SNR, linewidth, and metabolite-to-lipid ratio) and the literature13,28,29 (metabolites and their relative concentrations) and the spectral basis set. A total of 100000 spectra were simulated and randomly divided into training (n = 90000), validation (n = 5000), and test (n = 5000) sets.
Simulated Dataset II was generated (n = 100) (Supplementary Information 4) to compare the performances of the qCNNs and NLSF in metabolite and lipid quantification. The concentrations of the individual metabolites and lipids were considered within their respective mean ± standard deviation (SD) values that were obtained from In Vivo Dataset I (without distinction between HBV-LC and HBV-LC-HCC) to enable a comparison in realistic concentration ranges in the patient livers.
In Vivo Dataset I was analyzed using the qCNNs for the generation of Simulated Dataset III (Supplementary Information 4). Thereafter, In Vivo Dataset I was rearranged into 10 folds, each of which consisted of training and test sets. The training set contained 9 HBV-LC and 9 HBV-LC-HCC patient data, whereas the test set consisted of 1 HBV-LC and 1 HBV-LC-HCC patient data. The training sets were used for the generation of Simulated Dataset III and the test sets were used for the evaluation of the accuracy of the cCNNs in the training phase (the final evaluation of the accuracy of the cCNNs is performed in the test phase on In Vivo Dataset II). Subsequently, 100000 spectra were simulated (50000 spectra for each of the HBV-LC and HBV-LC-HCC groups) for each fold. The concentrations of the individual metabolites and lipids were considered within their respective mean ± SD values that were obtained from In Vivo Dataset I for each patient group.
For deep learning, all simulated and in vivo spectra were cropped to the 0.2–4.0 ppm range and normalized. For in vivo data, the residual water signal was removed (Supplementary Information 4).
NLSF
Conventional NLSF was performed using QUEST30 in jMRUI,27 which is widely used for NLSF together with LCModel.9 The same metabolite and lipid basis sets that were used for the simulation of the spectra were used (Supplementary Information 5). In addition to the 15 metabolites, the contents of the Glx (= Gln + Glu), tCho (= PC + GPC), and total creatine (tCr = Cr + PCr) were obtained. For the in vivo data, the resulting metabolite content was expressed in mmol/kg wet weight using water-unsuppressed spectra.13,31 For the lipids, the fitted amplitude from the nonlinear fitting was reported for the individual resonances.
Deep learning
All CNNs were developed using PyTorch 1.11 implemented in Python 3.7.11, with 4 GPUs (NVIDIA Titan RTX). A detailed description of the design and training of the CNNs is provided in Supplementary Information 6.
qCNNmetab and qCNNlipid were developed on Simulated Dataset I to quantify the metabolites and lipids, respectively (Fig. 2). qCNNs use the real and imaginary spectra in a single channel as the input; thus, they are complex-valued CNNs.32,33 The CNNs yield a noise-free, line-narrowed, metabolite-only (qCNNmetab) or lipid-only (qCNNlipid) real spectrum as the output to facilitate the subsequent quantitative analysis by improving the SNR and reducing the degree of spectral overlap, as previously reported.18,23 The quantification of the individual metabolites and lipids was achieved by separately performing multiple regression on the metabolite-only and lipid-only output spectra using the corresponding basis sets.18,23 For the in vivo data, the resulting metabolite content was converted into mmol/kg. For the lipids, the fitted amplitudes from the multiple regression (linear fitting) were reported for the individual resonances.
The cCNNs were developed for the binary classification of the input spectra into HBV-LC and HBV-LC-HCC (Fig. 2) for each fold of Simulated Dataset III. The cCNNs also use the real and imaginary spectra in a single channel as the input (complex-valued CNNs) but yield a value between 0 and 1 as the output, which can be considered as the probability of belonging to the HBV-LC-HCC class. Accordingly, the labels of the HBV-LC and HBV-LC-HCC groups were set to 0 and 1, respectively, for training. In the test phase, all 10 cCNNs were applied to each sample in In Vivo Dataset II and the majority class was selected as the predicted class for the sample (majority voting). To identify the spectral regions that have a significant influence on the cCNNs’ decision, Grad-CAM analyses34 were performed.
LDA
An LDA model was developed for each fold (Supplementary Information 7) using the 10-fold In Vivo Dataset I that was used to develop the cCNNs. Only the metabolites and lipids that showed statistical differences in the qCNN-estimated content between the HBV-LC and HBV-LC-HCC groups in In Vivo Dataset I were included. The same procedure was repeated with the inclusion of all metabolites and lipids to generate another group of 10 LDA models for comparison. The performances of the LDA models were evaluated using In Vivo Dataset II by employing majority voting, as with the cCNNs.
Statistical analysis
Statistical analyses were performed using R 3.6.2 (R Foundation for Statistical Computing, Vienna, Austria). Two-tailed Student’s t-tests were performed for pair-wise group comparisons. For multiple pair-wise group comparisons, the Bonferroni correction was applied by adjusting P values. The statistical significance level was set to P < 0.05.
The developer of the deep learning and LDA models was completely blinded to the clinical status of the patients in In Vivo Dataset II throughout the study. The sensitivity (correct identification of HBV-LC-HCC), specificity (correct identification of HBV-LC), and overall diagnostic accuracy were examined in the final blinded tests of the cCNN and LDA models.
Results
Results
Participants
Among the initial 312 participants, 272 were excluded owing to a history of HCC treatment. As described previously, 3 participants were excluded because of suboptimal MRS data quality (n = 2) or incorrect raw MRS data format (n = 1). Thus, 37 participants remained (mean age: 47.6 ± 14.3 years; 28 men) (Fig. 1). The characteristics of the participants are presented in Table 1. There was no significant difference in the degree of hepatic fat fraction determined on proton density fat fraction map between group 1 and group 2 participants (P = 0.67). Regarding the duration of HBV infection, due to incomplete external records for some patients referred from other hospitals, the actual duration of HBV infection may have been underestimated.
Quantification with NLSF vs. qCNNs on Simulated Dataset II
Figure 3a shows the 4 representative simulated liver spectra (Simulated) from Simulated Dataset II (first row: high SNR and narrow linewidth, second row: high SNR and wide linewidth, third row: low SNR and narrow linewidth, and fourth row: low SNR and wide linewidth). The fitted spectra (Fitted) and residual spectra (Residual = Simulated – Fitted) resulting from the NLSF analysis are also shown. The fitting residual was negligible.
Figure 3b depicts the results from qCNNmetab and qCNNlipid when using the simulated spectra shown in Fig. 3a as inputs. The first column shows the qCNN-predicted metabolite-only and lipid-only spectra (Predicted) along with the corresponding ground truth (GT) spectra. The difference between the qCNN-predicted and GT spectra was negligible (third column). The second column shows the reconstructed spectra (Reconstructed) that were obtained by fitting the qCNN-predicted spectra with the metabolite and lipid basis sets. The qCNN-predicted spectra were completely accounted for by a linear combination of the basis spectra (fourth column).
Table 2 compares the quantitative errors that were obtained with NLSF (QUEST in jMRUI) and the qCNNs on Simulated Dataset II in terms of the mean absolute percent error (MAPE) of the contents of the individual metabolites and lipids (MAPE = ; N = 100). The GT metabolite and lipid contents of the dataset are known and in the realistic ranges of the patient livers. The MAPE values that were obtained by the qCNNs (5.7%–29.4%; 12.9% ± 13.3%) were significantly lower than those that were obtained by NLSF (18.1%–140.5%; 70.0% ± 80.4%) for all metabolites and lipids (P ≤ 0.004).
Quantification with NLSF vs. qCNN on In Vivo Dataset I
Figure 4a shows the representative true in vivo spectra (in vivo) of the HBV-LC (first and second columns) and HBV-LC-HCC (third and fourth columns) patients from In Vivo Dataset I. The fitted spectra (Fitted) and residual spectra (Residual = in vivo – Fitted) resulting from the NLSF analysis are also shown. The fitting residual was negligible.
Figure 4b and 4c depict the results from qCNNmetab and qCNNlipid, respectively, when using the in vivo spectra shown in Fig. 4a as inputs. Although qCNNmetab and qCNNlipid were trained solely on the simulated spectra, the qCNN-predicted metabolite-only (Fig. 4b; Predicted) and lipid-only (Fig. 4c; Predicted) spectra were completely accounted for by a linear combination of the basis spectra, as demonstrated by the Difference (Pred. – Recon.) spectra.
Figure 5 compares the results of the metabolite and lipid quantification between the 2 patient groups of In Vivo Dataset I when using (a) NLSF and (b) the qCNNs. When using NLSF, no statistical difference was found in the metabolite and lipid contents between the 2 patient groups, except for Glyc (P = 0.024). When using the qCNNs, Ala (P = 0.045), Gln (P = 0.011), Lac (P = 0.047), PCr (P = 0.048), tCr (P = 0.021), and Lip 13 (P = 0.029) were significantly higher, whereas Lip 225 (P = 0.025) and Lip 280 (P = 0.041) were significantly lower, in HBV-LC-HCC than in HBV-LC.
qCNN-based quantification on In Vivo Dataset II
The SNR, linewidth, and metabolite-to-lipid ratio of In Vivo Dataset II ranged 11.56–116.58 (47.41 ± 31.44), 14.58–33.20 (21.44 ± 4.81) Hz, and 1.407–2.961 (2.290 ± 0.458), respectively, none of which were statistically different from those of In Vivo Dataset I (P = 0.6259, 0.3544, and 0.7352, respectively).
Figure 6a presents a comparison of the metabolite and lipid contents between the 2 patient groups in In Vivo Dataset II, as quantified by the qCNNs. Gln (P = 0.038), Lac (P = 0.038), and Lip 13 (P = 0.022) were significantly higher, whereas Lip 225 (P = 0.020) was significantly lower in HBV-LC-HCC than in HBV-LC, as in In Vivo Dataset I (Fig. 5b). Ala, PCr, tCr, and Lip 280 did not exhibit significant differences between the 2 patient groups, unlike in In Vivo Dataset I. However, there were trends toward higher Ala, PCr, and tCr, and lower Lip 280 in HBV-LC-HCC than in HBV-LC, which is in line with the findings from In Vivo Dataset I.
A comparison of the overall metabolite and lipid contents between In Vivo Datasets I and II is presented in Fig. 6b. No statistical differences were observed in any of the metabolites or lipids (P ≥ 0.232).
Performance of cCNNs
In the training phase, the accuracy of the differentiation between HBV-LC and HBV-LC-HCC was 90%, that is, 18 correct cases out of 20 patients (2 test patients from each of the 10 folds of In Vivo Dataset I). In the test phase, the application of the 10 cCNNs to In Vivo Dataset II resulted in sensitivity (correct identification of HBV-LC-HCC), specificity, and overall diagnostic accuracy values of 100% (7/7), 90% (9/10), and 94% (16/17), respectively.
The prediction results from each of the 10 cCNN models on In Vivo Dataset II are shown in Supplementary Information 8. The results of the Grad-CAM analyses using the cCNNs on In Vivo Dataset II are shown in Supplementary Information 9, which demonstrates that the cCNNs utilized almost the entire spectral range for their decisions and did not seem to be fooled by some unexpected information in the spectra such as SNR, linewidth, or spectral artifacts.
Performance of LDA models
In the training phase, the accuracy of the differentiation between the 2 patient groups was 75%, that is, 15 correct cases out of 20 patients (2 test patients from each of the 10 folds of In Vivo Dataset I) for the LDA models that were developed by including only those metabolites and lipids that showed statistical differences in the qCNN-estimated contents between the HBV-LC and HBV-LC-HCC groups in In Vivo Dataset I (Fig. 5b). The accuracy was 50% (10/20) for the LDA models that were developed by including all metabolites and lipids.
Based on these results, the performance of the LDA models that were developed by including only statistically different metabolites and lipids was tested further on In Vivo Dataset II. The resulting sensitivity, specificity, and accuracy were 86% (6/7), 70% (7/10), and 76% (13/17), respectively. The prediction results from each of the 10 LDA models on In Vivo Dataset II are described in Supplementary Information 10.
Participants
Among the initial 312 participants, 272 were excluded owing to a history of HCC treatment. As described previously, 3 participants were excluded because of suboptimal MRS data quality (n = 2) or incorrect raw MRS data format (n = 1). Thus, 37 participants remained (mean age: 47.6 ± 14.3 years; 28 men) (Fig. 1). The characteristics of the participants are presented in Table 1. There was no significant difference in the degree of hepatic fat fraction determined on proton density fat fraction map between group 1 and group 2 participants (P = 0.67). Regarding the duration of HBV infection, due to incomplete external records for some patients referred from other hospitals, the actual duration of HBV infection may have been underestimated.
Quantification with NLSF vs. qCNNs on Simulated Dataset II
Figure 3a shows the 4 representative simulated liver spectra (Simulated) from Simulated Dataset II (first row: high SNR and narrow linewidth, second row: high SNR and wide linewidth, third row: low SNR and narrow linewidth, and fourth row: low SNR and wide linewidth). The fitted spectra (Fitted) and residual spectra (Residual = Simulated – Fitted) resulting from the NLSF analysis are also shown. The fitting residual was negligible.
Figure 3b depicts the results from qCNNmetab and qCNNlipid when using the simulated spectra shown in Fig. 3a as inputs. The first column shows the qCNN-predicted metabolite-only and lipid-only spectra (Predicted) along with the corresponding ground truth (GT) spectra. The difference between the qCNN-predicted and GT spectra was negligible (third column). The second column shows the reconstructed spectra (Reconstructed) that were obtained by fitting the qCNN-predicted spectra with the metabolite and lipid basis sets. The qCNN-predicted spectra were completely accounted for by a linear combination of the basis spectra (fourth column).
Table 2 compares the quantitative errors that were obtained with NLSF (QUEST in jMRUI) and the qCNNs on Simulated Dataset II in terms of the mean absolute percent error (MAPE) of the contents of the individual metabolites and lipids (MAPE = ; N = 100). The GT metabolite and lipid contents of the dataset are known and in the realistic ranges of the patient livers. The MAPE values that were obtained by the qCNNs (5.7%–29.4%; 12.9% ± 13.3%) were significantly lower than those that were obtained by NLSF (18.1%–140.5%; 70.0% ± 80.4%) for all metabolites and lipids (P ≤ 0.004).
Quantification with NLSF vs. qCNN on In Vivo Dataset I
Figure 4a shows the representative true in vivo spectra (in vivo) of the HBV-LC (first and second columns) and HBV-LC-HCC (third and fourth columns) patients from In Vivo Dataset I. The fitted spectra (Fitted) and residual spectra (Residual = in vivo – Fitted) resulting from the NLSF analysis are also shown. The fitting residual was negligible.
Figure 4b and 4c depict the results from qCNNmetab and qCNNlipid, respectively, when using the in vivo spectra shown in Fig. 4a as inputs. Although qCNNmetab and qCNNlipid were trained solely on the simulated spectra, the qCNN-predicted metabolite-only (Fig. 4b; Predicted) and lipid-only (Fig. 4c; Predicted) spectra were completely accounted for by a linear combination of the basis spectra, as demonstrated by the Difference (Pred. – Recon.) spectra.
Figure 5 compares the results of the metabolite and lipid quantification between the 2 patient groups of In Vivo Dataset I when using (a) NLSF and (b) the qCNNs. When using NLSF, no statistical difference was found in the metabolite and lipid contents between the 2 patient groups, except for Glyc (P = 0.024). When using the qCNNs, Ala (P = 0.045), Gln (P = 0.011), Lac (P = 0.047), PCr (P = 0.048), tCr (P = 0.021), and Lip 13 (P = 0.029) were significantly higher, whereas Lip 225 (P = 0.025) and Lip 280 (P = 0.041) were significantly lower, in HBV-LC-HCC than in HBV-LC.
qCNN-based quantification on In Vivo Dataset II
The SNR, linewidth, and metabolite-to-lipid ratio of In Vivo Dataset II ranged 11.56–116.58 (47.41 ± 31.44), 14.58–33.20 (21.44 ± 4.81) Hz, and 1.407–2.961 (2.290 ± 0.458), respectively, none of which were statistically different from those of In Vivo Dataset I (P = 0.6259, 0.3544, and 0.7352, respectively).
Figure 6a presents a comparison of the metabolite and lipid contents between the 2 patient groups in In Vivo Dataset II, as quantified by the qCNNs. Gln (P = 0.038), Lac (P = 0.038), and Lip 13 (P = 0.022) were significantly higher, whereas Lip 225 (P = 0.020) was significantly lower in HBV-LC-HCC than in HBV-LC, as in In Vivo Dataset I (Fig. 5b). Ala, PCr, tCr, and Lip 280 did not exhibit significant differences between the 2 patient groups, unlike in In Vivo Dataset I. However, there were trends toward higher Ala, PCr, and tCr, and lower Lip 280 in HBV-LC-HCC than in HBV-LC, which is in line with the findings from In Vivo Dataset I.
A comparison of the overall metabolite and lipid contents between In Vivo Datasets I and II is presented in Fig. 6b. No statistical differences were observed in any of the metabolites or lipids (P ≥ 0.232).
Performance of cCNNs
In the training phase, the accuracy of the differentiation between HBV-LC and HBV-LC-HCC was 90%, that is, 18 correct cases out of 20 patients (2 test patients from each of the 10 folds of In Vivo Dataset I). In the test phase, the application of the 10 cCNNs to In Vivo Dataset II resulted in sensitivity (correct identification of HBV-LC-HCC), specificity, and overall diagnostic accuracy values of 100% (7/7), 90% (9/10), and 94% (16/17), respectively.
The prediction results from each of the 10 cCNN models on In Vivo Dataset II are shown in Supplementary Information 8. The results of the Grad-CAM analyses using the cCNNs on In Vivo Dataset II are shown in Supplementary Information 9, which demonstrates that the cCNNs utilized almost the entire spectral range for their decisions and did not seem to be fooled by some unexpected information in the spectra such as SNR, linewidth, or spectral artifacts.
Performance of LDA models
In the training phase, the accuracy of the differentiation between the 2 patient groups was 75%, that is, 15 correct cases out of 20 patients (2 test patients from each of the 10 folds of In Vivo Dataset I) for the LDA models that were developed by including only those metabolites and lipids that showed statistical differences in the qCNN-estimated contents between the HBV-LC and HBV-LC-HCC groups in In Vivo Dataset I (Fig. 5b). The accuracy was 50% (10/20) for the LDA models that were developed by including all metabolites and lipids.
Based on these results, the performance of the LDA models that were developed by including only statistically different metabolites and lipids was tested further on In Vivo Dataset II. The resulting sensitivity, specificity, and accuracy were 86% (6/7), 70% (7/10), and 76% (13/17), respectively. The prediction results from each of the 10 LDA models on In Vivo Dataset II are described in Supplementary Information 10.
Discussion
Discussion
We found that DL-1H-MRS, with data augmentation by spectral simulation, holds promise for distinguishing patients with HBV-LC with and without HCC. The liver parenchymal tissue of patients with HBV-LC-HCC may exhibit a difference in metabolism compared to that of patients with HBV-LC. MRS is the method of choice for noninvasively testing this hypothesis, as the potential of phosphorus MRS (31P-MRS) in liver diseases has been previously demonstrated.35 However, it requires additional hardware for RF transmission and signal reception that are tuned to 31P nuclei, as well as a pulse sequence with a suitable localization scheme, which are less widely available for clinical scanners. The proton provides the highest MR sensitivity among the MR-detectable nuclei.8 However, typical 1H-MRS liver spectra are dominated by water and lipids, and the remaining spectral regions appear as humps owing to the relatively low concentrations of other metabolites and field inhomogeneity.8,10 Therefore, only a few metabolites have typically been reported even in the well-designed and sophisticated previous human liver studies, such as Glx, Glyc, tCho, and tCr13, and Glx, phosphomonoesters (PME), and Glyc-glucose complex.12 LCModel9 is widely used for the NLSF analysis of 1H-MRS spectra of the brain, where up to approximately 20 metabolites can be quantified. However, in the default mode for the analysis of liver spectra, only tCho and Glyc are considered for quantification, along with lipids.36 Although a previous 1H-MRS study investigated the possible metabolic alterations that are associated with the development of HCC in patients with HBV-LC, only tCho and lipids were assessed.37 Therefore, we explored the potential of DL-1H-MRS, which has been reported for the brain.15–22
As In Vivo Dataset I was used for the preparation of the training data and In Vivo Dataset II was used for the performance evaluation of the models, each of the 2 datasets should represent the HBV-LC and HBV-LC-HCC patient groups simultaneously and independently. An unambiguous verification of whether they both represent the 2 patient groups effectively would require far more in vivo data. However, the quantitative (distribution of the metabolite and lipid contents) and spectral (SNR, linewidth, and metabolite-to-lipid ratio) characteristics of the 2 datasets were comparable. The statistical findings from In Vivo Dataset II did not perfectly match those from In Vivo Dataset I (Figs. 5b, 6a, and 6b). Gln, Lac, and Lip 13 were significantly higher, whereas Lip 225 was significantly lower in HBV-LC-HCC than in HBV-LC in both datasets. However, significant quantitative differences in Ala, PCr, tCr, and Lip 280 were observed only in In Vivo Dataset I. Nonetheless, the trends of the high and low contents of each of these 4 resonances in In Vivo Dataset II were identical to those in In Vivo Dataset I, and none of the 15 metabolites or 5 lipid resonances showed statistically different contents between the 2 in vivo datasets (Fig. 6b). The fact that the number of metabolites and lipids that differed significantly in content between the HBV-LC and HBV-LC-HCC groups was lower In Vivo Dataset II than in In Vivo Dataset I may mean that the differentiation between the 2 patient groups was even more challenging for In Vivo Dataset II. Nonetheless, both of our deep learning-aided approaches (not only the cCNNs but also the LDA models in the sense that the LDA models were developed also from the qCNN-based quantitative results) achieved high diagnostic performance on In Vivo Dataset II.
As discussed above, we found significantly higher Gln, Lac, and Lip 13, and significantly lower Lip 225 in HBV-LC-HCC than in HBV-LC in both In Vivo Dataset I and In Vivo Dataset II. These findings align with existing literature. Firstly, alterations in Gln metabolism play a critical role in HCC38,39 and can lead to increased Gln levels in HCC tissue,38 which was depicted in our quantitative results. Yao et al. highlighted that Lac and its metabolism are essential in liver disease progression, with Lac levels and related metabolic genes serving as potential prognostic markers.40 Regarding lipid metabolism, Ismail et al. reported that lipid remodeling occurs during the transition from chronic liver disease to HCC, leading to changes in lipid saturation and desaturation.41 The 2.25 ppm lipid signal (Lip 225) is primarily associated with unsaturated fatty acids (UFAs), particularly polyunsaturated fatty acids (PUFAs). Our finding that HBV-LC patients exhibited a higher Lip 225 suggests that they retain relatively higher PUFA levels, whereas HBV-LC-HCC patients show a reduction in unsaturated lipids. This aligns with the previous findings that hepatic triglyceride composition shifts toward lower unsaturation as liver fat content increases, supporting the notion that HCC progression is accompanied by a decline in unsaturated lipid content.42 Moreover, Pan et al. found that lipid metabolic alterations are observed in serum metabolomic profiles differentiating liver cirrhosis from HCC, providing further evidence of shifts in lipid homeostasis.43
In the quantitative analysis of In Vivo Dataset I, only Glyc was statistically different between the 2 patient groups when using NLSF, from which the development of a classifier was not realistic. However, when using the qCNNs, 4 metabolites and 3 lipids were statistically different, and the LDA models that were developed based on this finding yielded sensitivity, specificity, and accuracy values of 86%, 70%, and 76%, respectively, on In Vivo Dataset II. The cCNNs were trained solely on the simulated dataset that was generated based on In Vivo Dataset I. However, when they were tested on In Vivo Dataset II, they achieved sensitivity, specificity, and accuracy values of up to 100%, 90%, and 94%, respectively. Although further studies with a larger amount of patient data are required, these findings together support our hypothesis of the altered metabolic profile in HBV-LC-HCC patients with respect to that in HBV-LC patients, the potential of DL-1H-MRS in the quantitative analysis and classification of HBV-LC patient liver spectra, and the utility of our spectral simulation.
Among the 17 patient spectra in In Vivo Dataset II, only 1 spectrum was misclassified by the cCNNs (1 HBV-LC predicted as HBV-LC-HCC; patient #5 in Supplementary Information 8). As discussed in Supplementary Information 8, in general, the cCNNs utilize almost the entire spectral range for their decisions. However, for the misclassified spectrum, ~2.5–4.0 ppm region was elevated due to residual water signal, and rarely used by the cCNNs for their decision. The resulting relatively narrow spectral region usable by the cCNNs may be responsible for the incorrect classification. For instance, the content of Gln was statistically different between the HBV-LC and HBV-LC-HCC groups in both In Vivo Dataset I and In Vivo Dataset II, but, for the misclassified spectrum, the information about Gln may have not been fully available to the cCNNs due to the proximity of the Gln resonance (centered at ~2.3 ppm) to ~2.5–4.0 ppm region. Given this potential influence of residual water, its modeling and incorporation into spectral simulation may be necessary for more robust performance of our deep learning models.
1H-MRS liver spectra are typically acquired without water suppression, and in studies involving HCC, the spectra are acquired from the lesion. However, our data needed to be acquired from normal-appearing parenchymal tissue with water suppression. Owing to the highly limited access to such datasets that would be suitable for our study, we could not validate our deep learning models further on an external dataset. Therefore, In Vivo Dataset II was reserved as a completely unseen test dataset throughout the study, rather than being combined with In Vivo Dataset I at the onset of the study to secure a greater amount of training data, despite the limited overall amount of available in vivo data. If external validation is performed, it is expected that the majority voting method we employed will help our model’s generalizability to a certain extent. However, given the small amount of data in In Vivo Dataset I, additional efforts—such as domain adaptation or data harmonization—would most likely be required.
There are a few limitations. First, as a single-center study, the generalizability of our findings may be limited as discussed above. Multicenter studies with larger and more diverse populations are warranted to validate our results. Second, we selected participants with liver cirrhosis by using the imaging findings including surface nodularity. Although more invasive or advanced studies such as liver biopsy or elastography would be more robust, those studies are not routinely performed in clinical practice in our institution because of invasiveness and/or cost. Therefore, we relied on noninvasive diagnostic criteria based on routine imaging studies of ultrasound, CT or MRI. Another limitation is the small sample size. Given the exploratory and preliminary nature of our study, no prior literature or established metabolite data were available to calculate an optimal sample size formally. Consequently, our findings should be interpreted cautiously, and further validation through larger, prospective studies is warranted to confirm the generalizability and clinical applicability of our approach. Lastly, although this study focused on patients with HBV-related liver cirrhosis due to their higher risk of HCC development, future studies comparing patients with chronic hepatitis B without cirrhosis, both with and without HCC, as well as comparing HBV-related cirrhotic and non-cirrhotic patients who develop HCC, could offer deeper insights into the mechanisms of hepatocarcinogenesis.
We found that DL-1H-MRS, with data augmentation by spectral simulation, holds promise for distinguishing patients with HBV-LC with and without HCC. The liver parenchymal tissue of patients with HBV-LC-HCC may exhibit a difference in metabolism compared to that of patients with HBV-LC. MRS is the method of choice for noninvasively testing this hypothesis, as the potential of phosphorus MRS (31P-MRS) in liver diseases has been previously demonstrated.35 However, it requires additional hardware for RF transmission and signal reception that are tuned to 31P nuclei, as well as a pulse sequence with a suitable localization scheme, which are less widely available for clinical scanners. The proton provides the highest MR sensitivity among the MR-detectable nuclei.8 However, typical 1H-MRS liver spectra are dominated by water and lipids, and the remaining spectral regions appear as humps owing to the relatively low concentrations of other metabolites and field inhomogeneity.8,10 Therefore, only a few metabolites have typically been reported even in the well-designed and sophisticated previous human liver studies, such as Glx, Glyc, tCho, and tCr13, and Glx, phosphomonoesters (PME), and Glyc-glucose complex.12 LCModel9 is widely used for the NLSF analysis of 1H-MRS spectra of the brain, where up to approximately 20 metabolites can be quantified. However, in the default mode for the analysis of liver spectra, only tCho and Glyc are considered for quantification, along with lipids.36 Although a previous 1H-MRS study investigated the possible metabolic alterations that are associated with the development of HCC in patients with HBV-LC, only tCho and lipids were assessed.37 Therefore, we explored the potential of DL-1H-MRS, which has been reported for the brain.15–22
As In Vivo Dataset I was used for the preparation of the training data and In Vivo Dataset II was used for the performance evaluation of the models, each of the 2 datasets should represent the HBV-LC and HBV-LC-HCC patient groups simultaneously and independently. An unambiguous verification of whether they both represent the 2 patient groups effectively would require far more in vivo data. However, the quantitative (distribution of the metabolite and lipid contents) and spectral (SNR, linewidth, and metabolite-to-lipid ratio) characteristics of the 2 datasets were comparable. The statistical findings from In Vivo Dataset II did not perfectly match those from In Vivo Dataset I (Figs. 5b, 6a, and 6b). Gln, Lac, and Lip 13 were significantly higher, whereas Lip 225 was significantly lower in HBV-LC-HCC than in HBV-LC in both datasets. However, significant quantitative differences in Ala, PCr, tCr, and Lip 280 were observed only in In Vivo Dataset I. Nonetheless, the trends of the high and low contents of each of these 4 resonances in In Vivo Dataset II were identical to those in In Vivo Dataset I, and none of the 15 metabolites or 5 lipid resonances showed statistically different contents between the 2 in vivo datasets (Fig. 6b). The fact that the number of metabolites and lipids that differed significantly in content between the HBV-LC and HBV-LC-HCC groups was lower In Vivo Dataset II than in In Vivo Dataset I may mean that the differentiation between the 2 patient groups was even more challenging for In Vivo Dataset II. Nonetheless, both of our deep learning-aided approaches (not only the cCNNs but also the LDA models in the sense that the LDA models were developed also from the qCNN-based quantitative results) achieved high diagnostic performance on In Vivo Dataset II.
As discussed above, we found significantly higher Gln, Lac, and Lip 13, and significantly lower Lip 225 in HBV-LC-HCC than in HBV-LC in both In Vivo Dataset I and In Vivo Dataset II. These findings align with existing literature. Firstly, alterations in Gln metabolism play a critical role in HCC38,39 and can lead to increased Gln levels in HCC tissue,38 which was depicted in our quantitative results. Yao et al. highlighted that Lac and its metabolism are essential in liver disease progression, with Lac levels and related metabolic genes serving as potential prognostic markers.40 Regarding lipid metabolism, Ismail et al. reported that lipid remodeling occurs during the transition from chronic liver disease to HCC, leading to changes in lipid saturation and desaturation.41 The 2.25 ppm lipid signal (Lip 225) is primarily associated with unsaturated fatty acids (UFAs), particularly polyunsaturated fatty acids (PUFAs). Our finding that HBV-LC patients exhibited a higher Lip 225 suggests that they retain relatively higher PUFA levels, whereas HBV-LC-HCC patients show a reduction in unsaturated lipids. This aligns with the previous findings that hepatic triglyceride composition shifts toward lower unsaturation as liver fat content increases, supporting the notion that HCC progression is accompanied by a decline in unsaturated lipid content.42 Moreover, Pan et al. found that lipid metabolic alterations are observed in serum metabolomic profiles differentiating liver cirrhosis from HCC, providing further evidence of shifts in lipid homeostasis.43
In the quantitative analysis of In Vivo Dataset I, only Glyc was statistically different between the 2 patient groups when using NLSF, from which the development of a classifier was not realistic. However, when using the qCNNs, 4 metabolites and 3 lipids were statistically different, and the LDA models that were developed based on this finding yielded sensitivity, specificity, and accuracy values of 86%, 70%, and 76%, respectively, on In Vivo Dataset II. The cCNNs were trained solely on the simulated dataset that was generated based on In Vivo Dataset I. However, when they were tested on In Vivo Dataset II, they achieved sensitivity, specificity, and accuracy values of up to 100%, 90%, and 94%, respectively. Although further studies with a larger amount of patient data are required, these findings together support our hypothesis of the altered metabolic profile in HBV-LC-HCC patients with respect to that in HBV-LC patients, the potential of DL-1H-MRS in the quantitative analysis and classification of HBV-LC patient liver spectra, and the utility of our spectral simulation.
Among the 17 patient spectra in In Vivo Dataset II, only 1 spectrum was misclassified by the cCNNs (1 HBV-LC predicted as HBV-LC-HCC; patient #5 in Supplementary Information 8). As discussed in Supplementary Information 8, in general, the cCNNs utilize almost the entire spectral range for their decisions. However, for the misclassified spectrum, ~2.5–4.0 ppm region was elevated due to residual water signal, and rarely used by the cCNNs for their decision. The resulting relatively narrow spectral region usable by the cCNNs may be responsible for the incorrect classification. For instance, the content of Gln was statistically different between the HBV-LC and HBV-LC-HCC groups in both In Vivo Dataset I and In Vivo Dataset II, but, for the misclassified spectrum, the information about Gln may have not been fully available to the cCNNs due to the proximity of the Gln resonance (centered at ~2.3 ppm) to ~2.5–4.0 ppm region. Given this potential influence of residual water, its modeling and incorporation into spectral simulation may be necessary for more robust performance of our deep learning models.
1H-MRS liver spectra are typically acquired without water suppression, and in studies involving HCC, the spectra are acquired from the lesion. However, our data needed to be acquired from normal-appearing parenchymal tissue with water suppression. Owing to the highly limited access to such datasets that would be suitable for our study, we could not validate our deep learning models further on an external dataset. Therefore, In Vivo Dataset II was reserved as a completely unseen test dataset throughout the study, rather than being combined with In Vivo Dataset I at the onset of the study to secure a greater amount of training data, despite the limited overall amount of available in vivo data. If external validation is performed, it is expected that the majority voting method we employed will help our model’s generalizability to a certain extent. However, given the small amount of data in In Vivo Dataset I, additional efforts—such as domain adaptation or data harmonization—would most likely be required.
There are a few limitations. First, as a single-center study, the generalizability of our findings may be limited as discussed above. Multicenter studies with larger and more diverse populations are warranted to validate our results. Second, we selected participants with liver cirrhosis by using the imaging findings including surface nodularity. Although more invasive or advanced studies such as liver biopsy or elastography would be more robust, those studies are not routinely performed in clinical practice in our institution because of invasiveness and/or cost. Therefore, we relied on noninvasive diagnostic criteria based on routine imaging studies of ultrasound, CT or MRI. Another limitation is the small sample size. Given the exploratory and preliminary nature of our study, no prior literature or established metabolite data were available to calculate an optimal sample size formally. Consequently, our findings should be interpreted cautiously, and further validation through larger, prospective studies is warranted to confirm the generalizability and clinical applicability of our approach. Lastly, although this study focused on patients with HBV-related liver cirrhosis due to their higher risk of HCC development, future studies comparing patients with chronic hepatitis B without cirrhosis, both with and without HCC, as well as comparing HBV-related cirrhotic and non-cirrhotic patients who develop HCC, could offer deeper insights into the mechanisms of hepatocarcinogenesis.
Conclusion
Conclusion
DL-1H-MRS, in combination with data augmentation by spectral simulation, have potential in differentiating between HBV-LC patients with and without HCC. Follow-up (long-term longitudinal) studies should also investigate how early such metabolic alterations can be detected and whether our approach may be of any value in the prediction of HCC in HBV-LC livers. Our approach may extend the applicability of 1H-MRS in the diagnosis of liver diseases.
DL-1H-MRS, in combination with data augmentation by spectral simulation, have potential in differentiating between HBV-LC patients with and without HCC. Follow-up (long-term longitudinal) studies should also investigate how early such metabolic alterations can be detected and whether our approach may be of any value in the prediction of HCC in HBV-LC livers. Our approach may extend the applicability of 1H-MRS in the diagnosis of liver diseases.
출처: PubMed Central (JATS). 라이선스는 원 publisher 정책을 따릅니다 — 인용 시 원문을 표기해 주세요.
🏷️ 같은 키워드 · 무료전문 — 이 논문 MeSH/keyword 기반
- A Phase I Study of Hydroxychloroquine and Suba-Itraconazole in Men with Biochemical Relapse of Prostate Cancer (HITMAN-PC): Dose Escalation Results.
- Self-management of male urinary symptoms: qualitative findings from a primary care trial.
- Clinical and Liquid Biomarkers of 20-Year Prostate Cancer Risk in Men Aged 45 to 70 Years.
- Diagnostic accuracy of Ga-PSMA PET/CT versus multiparametric MRI for preoperative pelvic invasion in the patients with prostate cancer.
- Comprehensive analysis of androgen receptor splice variant target gene expression in prostate cancer.
- Clinical Presentation and Outcomes of Patients Undergoing Surgery for Thyroid Cancer.