본문으로 건너뛰기
← 뒤로

CRLM-GAN: a feature-constrained GAN-based deep learning framework for multi-parametric MRI-based segmentation of colorectal liver metastases before and after chemotherapy.

1/5 보강
Cancer imaging : the official publication of the International Cancer Imaging Society 📖 저널 OA 97.2% 2022: 1/1 OA 2023: 3/3 OA 2024: 5/5 OA 2025: 35/35 OA 2026: 26/28 OA 2022~2026 2025 Vol.26(1) p. 10
Retraction 확인
출처

PICO 자동 추출 (휴리스틱, conf 3/4)

유사 논문
P · Population 대상 환자/모집단
111 patients (444 cases under four conditions) with pathologically confirmed CRLMs was enrolled, and 2,546 two-dimensional tumour images were obtained.
I · Intervention 중재 / 시술
multi-parametric magnetic resonance imaging (MRI) or contrast-enhanced computed tomography (CT)
C · Comparison 대조 / 비교
추출되지 않음
O · Outcome 결과 / 결론
Future work will focus on model generalization across multi-centre datasets to enhance clinical applicability. [SUPPLEMENTARY INFORMATION] The online version contains supplementary material available at 10.1186/s40644-025-00964-z.

Xia SJ, Zhu HB, Gu XL, Bao J, Sun A, Cui Y

📝 환자 설명용 한 줄

[BACKGROUND] The manual segmentation of colorectal liver metastases (CRLMs) is time-consuming and labour-intensive because of their high degree of heterogeneity, and identifying indistinct boundaries

이 논문을 인용하기

↓ .bib ↓ .ris
APA Xia SJ, Zhu HB, et al. (2025). CRLM-GAN: a feature-constrained GAN-based deep learning framework for multi-parametric MRI-based segmentation of colorectal liver metastases before and after chemotherapy.. Cancer imaging : the official publication of the International Cancer Imaging Society, 26(1), 10. https://doi.org/10.1186/s40644-025-00964-z
MLA Xia SJ, et al.. "CRLM-GAN: a feature-constrained GAN-based deep learning framework for multi-parametric MRI-based segmentation of colorectal liver metastases before and after chemotherapy.." Cancer imaging : the official publication of the International Cancer Imaging Society, vol. 26, no. 1, 2025, pp. 10.
PMID 41372955 ↗

Abstract

[BACKGROUND] The manual segmentation of colorectal liver metastases (CRLMs) is time-consuming and labour-intensive because of their high degree of heterogeneity, and identifying indistinct boundaries in post-treatment cases is more challenging. Automated segmentation techniques based on deep learning (DL) can alleviate these challenges. Generative adversarial networks (GANs) have emerged as an important development in deep learning for medical image segmentation. The aim of this study was to develop a GAN-based model for multi-parametric MRI-based CRLM segmentation and validate its clinical efficacy.

[METHODS] This study included a total of 641 CRLM cases who underwent multi-parametric magnetic resonance imaging (MRI) or contrast-enhanced computed tomography (CT). A retrospective cohort of 111 patients (444 cases under four conditions) with pathologically confirmed CRLMs was enrolled, and 2,546 two-dimensional tumour images were obtained. All patients underwent pre- and post-neoadjuvant chemotherapy (NACT) MRI scanning, including diffusion-weighted imaging (DWI) and T2-weighted imaging (T2WI). The dataset was split at a 6:4 ratio for training and testing. A GAN-based DL framework was proposed for CRLM segmentation, and five single-condition evaluations and four cross-sequence evaluations were systematically performed. The multi-feature constrained GAN-based model incorporated UNet++ as a generator and pre-trained ResNet-50 as a discriminator. The Dice similarity coefficient (DSC) was used as the primary evaluation metric. By extracting and fusing deep convolutional features, this approach utilized a multi-scale constrained strategy in the discriminator and combined binary cross-entropy with Dice loss in the generator. In addition, 197 publicly available contrast-enhanced CT scans containing 3,593 two-dimensional tumour images were collected to evaluate the adaptability of the model as a complementary modality.

[RESULTS] In the single-condition evaluations, CRLM-GAN achieved DSCs of 0.81 (95% CI: 0.76, 0.85), 0.70 (95% CI: 0.63, 0.77), 0.67 (95% CI: 0.55, 0.77), and 0.61 (95% CI: 0.53, 0.68) on pre-NACT DWI, post-NACT DWI, pre-NACT T2WI, and post-NACT T2WI, respectively. Moreover, the model obtained a DSC of 0.70 (95% CI: 0.66, 0.73) on the public CT dataset. The results of cross-sequence experiments revealed that training the model with combined pre-/post-NACT DWI and T2WI data led to improvements in DSCs for both DWI and T2WI sequences.

[CONCLUSIONS] CRLM-GAN demonstrated superior one-stage segmentation performance across the multi-parametric MRI-based dataset before and after chemotherapy, as well as on the CT dataset. Future work will focus on model generalization across multi-centre datasets to enhance clinical applicability.

[SUPPLEMENTARY INFORMATION] The online version contains supplementary material available at 10.1186/s40644-025-00964-z.

🏷️ 키워드 / MeSH 📖 같은 키워드 OA만

📖 전문 본문 읽기 PMC JATS · ~78 KB · 영문

Introduction

Introduction
Colorectal cancer accounts for 10.0% of new cancer cases worldwide, while its mortality rate is 9.4%, making it the second leading cause of cancer-related deaths [1]. Colorectal liver metastases (CRLMs) refer to secondary malignant liver tumours originating from primary colorectal cancer and represent the most common site of distant spread. Notably, approximately 25% of colorectal cancer patients are found to have liver metastases at initial diagnosis, and more than half of individuals diagnosed with colorectal cancer will eventually present with metastatic spread to the liver [2].
Routine imaging techniques for the detection and characterization of CRLMs depend on the optimization of imaging modalities, including magnetic resonance imaging (MRI) and contrast-enhanced computed tomography (CT) [3, 4]. The addition of diffusion-weighted imaging (DWI) sequences to MRI increases the accuracy of detecting CRLM, with T2-weighted imaging (T2WI) serving as a valuable supplementary modality [5, 6]. Compared with primary liver tumours, the imaging diagnosis of CRLMs presents unique challenges because of their heterogeneous appearance and less well-defined boundaries. For characterization, combining MRI images and pre- and post-contrast CT images is crucial for dynamic imaging analysis. CRLMs typically demonstrate restricted diffusion and marked hyperintensity on DWI images, while on T2-weighted images, they commonly have intermediate-to-high signal intensity and may demonstrate a hypointense halo of viable tissue and a hyperintense centre (mucin or necrosis) [3]. There are also some atypical presentations (e.g., complete cystic change or some post-treatment changes). CRLMs are hypovascular lesions; hence, they usually demonstrate a hypoenhancing centre with a hyperenhanced peripheral rim on hepatic arterial phase images, and they are most conspicuous in the portal venous phase as hypoenhancing lesions relative to the background liver parenchyma on contrast-enhanced CT or MRI images [7, 8].
Preoperative neoadjuvant chemotherapy (NACT) plays a crucial role in controlling micrometastatic disease, thereby reducing recurrence and improving patient survival [9, 10]. Radiological assessment of treatment response is frequently an essential step for guiding clinical decisions in therapy for CRLMs. The Response Evaluation Criteria in Solid Tumours 1.1 (RECIST 1.1) provides a standardized framework for assessing changes in tumour size [11–13]. However, manual segmentation, although critical for accurate tumour delineation, is time-consuming and labour-intensive because of the high heterogeneity of CRLMs in terms of lesion size, morphology, and signal intensity, especially in cases involving slightly larger datasets or multiple lesions [14, 15]. Furthermore, changes in the NACT-induced composition, such as residual cancer cell amount, extent of necrosis, fibrosis, and cystic degeneration, were demonstrated on imaging as decreased attenuation on CT and increased signal intensity on T2-weighted images, decreased enhancement, and more indistinct boundaries, making the manual delineation of tumour boundaries on post-NACT MRI even more challenging than on baseline MRI (pre-NACT MRI) [16]. By performing precise identification of tumour boundaries, automated segmentation techniques can alleviate these challenges. This not only enhances the efficiency of tumour-related measurements but also enables a more intelligent procedure for treatment response assessments.
With the advancement of artificial intelligence (AI) in medical image analysis, deep learning (DL) is an important branch of machine learning (ML) that typically takes advantage of convolutional neural networks [17, 18] and has been extensively utilized in biomedical image segmentation [19]. One particularly groundbreaking development in DL is generative adversarial networks (GANs) [20], which have demonstrated remarkable success across various domains of medical image transformation, reconstruction, and synthesis [21–23]. Some studies have explored the practical value of GANs in medical image segmentation [24–26], but the focus has initially been on primary cancers and solitary tumour lesions. However, the potential applications of GANs have not yet been fully investigated in the context of metastatic cancers, particularly in complex situations with multiple lesions.
GAN-based segmentation of CRLMs can provide a foundation for subsequent quantitative lesion-related analysis, such as radiomics feature extraction, radiotherapy target contouring, dose optimization, surgical planning, and prognosis prediction [27–29]. In clinical practice, CRLM patients usually have more than one random lesion, which makes manual delineation by general image processing software extremely trivial, especially in cases involving multiple tumour lesions. Moreover, most studies have focused on segmenting CT images [14, 15, 30, 31] or have adopted a two-step approach (i.e., the first segment of the liver region) [32, 33]. Nevertheless, one-stage segmentation models for multiple imaging conditions have been less frequently investigated, particularly in multi-parametric MRI and cross-modality (MRI-CT) settings. Therefore, developing a highly effective one-stage segmentation model applicable to multi-parametric MRI images can greatly reduce the burden on both abdominal physicians and radiologists.
This study proposed an advanced generative adversarial framework for multi-parametric MRI-based segmentation of CRLMs. The objective of this study was to develop a DL-based tool for abdominal radiologists and to systematically assess the performance of the GAN-based model. The proposed framework introduces a novel integration of precision feature constraints and a tailored training strategy. This design maximizes the potential of each component and effectively preserves high-dimensional feature representations, leading to a one-stage pipeline that markedly differs from conventional two-stage approaches. The developed model can effectively handle five different imaging conditions of CRLMs, including pre-NACT DWI, pre-NACT T2WI, post-NACT DWI, post-NACT T2WI, and CT. DWI provides superior tumour-to-liver contrast due to the densely packed tumour cells and is considered as a commonly used sequence, whereas T2WI offers clearer visualization of anatomical structures (including liver vessels, bile ducts, and normal liver parenchyma), which facilitates precise lesion localization and reduces the risk of missegmenting non-tumorous hyperintense areas as metastases. Consequently, these two sequences were selected for CRLM segmentation. Comprehensive evaluations and analyses across the five clinical imaging conditions are presented. This is also the first attempt to extend the principle of adversarial learning approach in CRLM segmentation, expanding the utility of GANs in metastatic cancer imaging.

Materials and methods

Materials and methods

Multi-parametric MRI dataset and contrast-enhanced CT dataset
A total of 444 cases encompassing four MRI imaging conditions were included in the study. A retrospective cohort of 111 patients with pathologically confirmed CRLMs was included, and the number of metastatic lesions in each patient ranged from 1 to 5. The study protocol was approved by the corresponding institutional review board. The requirement for written informed consent was waived for the retrospective cohort. All the patients underwent preoperative chemotherapy followed by liver resection at our institution between January 2013 and November 2016. Patient characteristics, including age, sex, location of the primary tumour, and largest lesion diameter, are summarized in Table 1. Statistical analyses corresponding to Table 1 were performed to compare typical clinical and imaging characteristics between groups. The Welch’s t-test was used to evaluate age differences, and the Mann-Whitney U test was applied to the largest lesion diameters. The Pearson chi-square test was used for categorical variables, including gender and location of the primary tumour. All statistical tests were two-sided, and a p-value < 0.05 was considered statistically significant (α = 0.05).

As part of the study protocol, all the candidates were scheduled for two abdominal MRI examinations: one performed 1–2 weeks before treatment and the other prior to surgery. These examinations included both pre-treatment and post-treatment DWI and T2WI sequences. The inclusion criteria were as follows: (1) patients who received at least two cycles of standard preoperative systemic therapy (FOLFOX/FOLFIRI/XELOX), with or without bevacizumab and combined with anti-EGFR antibody if gene testing showed Kras, Nras, and Braf wild-type followed by liver resection; and (2) patients who underwent two abdominal MRI examinations, with one prior to systemic therapy and the other MRI examination prior to surgery. The exclusion criteria were as follows: (1) patients who had more than five lesions; (2) patients who received fewer than two cycles of standard preoperative systemic therapy or who received previous systemic therapy or local treatments (including transcatheter arterial chemoembolization, radiofrequency ablation, or radiation therapy); (3) patients who underwent surgical resection previously or with R2 resection; (4) patients who missed one of the two MRI examinations or who had poor image quality for measurements; and (5) patients who declined further participation in the study.
For complementary modality evaluation, a dataset comprising 197 CT cases was included. The dataset [34] encompasses preoperative portal venous phase contrast-enhanced CT scans from 197 patients who underwent hepatic resection for CRLMs at a single institution and is publicly accessible via The Cancer Imaging Archive as a collection. It includes high-quality segmentations of CRLMs, which were generated semi-automatically using Scout Liver (Pathfinder Technologies, Inc.) and converted to the DICOM segmentation objects format via the 3D Slicer API. These segmentations provide detailed delineations of CRLM tumours and related anatomical structures, facilitating the development and validation of quantitative imaging biomarkers and ML models focused on tumour characterization and treatment response assessment. In this study, the 197 contrast-enhanced CT scans were split at the same ratio as the multi-parametric MRI dataset, with 119 used for training and 78 for testing. The overall workflow of data enrolment is shown in Fig. 1.

MRI acquisition
All the MRI examinations were performed on a 1.5-T MRI scanner (Signa Excite II; GE Healthcare, Milwaukee, Wisconsin) using an 8-channel phased array body coil. The routine MRI protocol for liver scans included T2-weighted imaging (with fat suppression), T1-weighted gradient-recalled-echo, DWI, and dynamic contrast-enhanced T1-weighted imaging. For the dynamic contrast-enhanced MRI image, a transverse LAVA-Flex was scanned before and after intravenous administration of 0.1 mmol/kg Gd-DTPA (Magnevist; Bayer, Berlin, Germany) at 2.5 mL/s, followed by a flush with 20 mL of saline solution using a power injector. This study focused mainly on the DWI and T2WI sequences because of the superior tumour-to-liver contrast from DWI and the rich anatomical context from T2WI aiming to achieve more accurate and robust segmentation. The imaging parameters were as follows: (1) T2-weighted sequence: repetition time (TR) = 12,630 ms; echo time (TE) = 70 ms; matrix size = 288 × 224; slice thickness = 6 mm; slice gap = 1 mm; and field of view (FOV) = 380 × 380 mm². (2) Diffusion-weighted sequence: TR = 4000 ms; TE = 80 ms; matrix size = 128 × 90; slice thickness = 6 mm; slice gap = 1 mm; FOV = 380 × 380 mm²; multiple b values (0, 20, 50, 100, 200, 600, 800, 1000, 1200, and 1500 s/mm²).

Manual delineation of colorectal liver metastases
For the acquired MRI images of the CRLMs, manual delineation was performed by two radiologists with 8 years and 15 years of abdominal MRI interpretation experience using the “Editor” module in 3D Slicer software (version 4.8.1). The region of interest (ROI) contained the entire volume of the tumour on the T2-weighted and DWI images (b = 1000 s/mm²). A senior radiologist with more than 20 years of expertise in liver cancer imaging subsequently performed the inspection. All regions containing metastatic lesions (ranging from 1 to 5) were delineated layer by layer. Throughout the delineation process, clinicians were kept unaware of the pathological information. Finally, all the images were saved in NIFTI format. For the CT dataset, existing annotations were used directly. Examples of MRI and CT images along with delineated CRLM lesions are shown in Fig. 2.

Data preprocessing and augmentation workflow
A multi-parametric MRI dataset was created by splitting the patients at a 6:4 ratio. A total of 66 patients were partitioned into the training dataset, while 45 patients were used for testing. The original 3D MRI images and corresponding ROI delineations were preprocessed to fit the input of the GAN-based model, as depicted in Fig. 3a. At the initial stage, the 3D volumes were split into 2D slices along the Z-axis for individual segmentation. Slices containing tumours were identified and extracted using the corresponding ROI masks, which were based on a non-zero sum of pixel values. In total, 2,546 2D MRI images and 3,593 2D CT images of tumours were collected. All the obtained 2D images were uniformly resampled to 224 × 224 pixels. Three stacking operations were performed to adapt to the third dimension of the neural network. Following similar data augmentation operations in previous U-Net-based biomedical segmentation studies [35, 36], the subsequent steps included random 90° rotations, random horizontal and vertical flip transformations with 50% probability, the addition of random values to hue and saturation channels, brightness and contrast adjustments, and Z-score normalization with a mean of (0.485, 0.456, 0.406) and a standard deviation of (0.229, 0.224, 0.225) based on the ImageNet convention. Collectively, these augmentations help the model capture domain-invariant features, stabilize convergence during training, and ultimately improve performance on unseen test data, as consistently reported in the literature [37, 38].

CRLM-GAN: GAN-based DL framework
To enhance the clinical applicability of segmentation, a GAN-based DL framework incorporating single-condition evaluation and cross-sequence evaluation for CRLM segmentation was developed (Fig. 3b–c). Five single imaging conditions, including pre-NACT DWI, pre-NACT T2WI, post-NACT DWI, post-NACT T2WI, and CT, were trained and tested separately in each single-condition setting. Pre-NACT and post-NACT DWI and T2WI sequences were used together for model training, followed by separate performance evaluations on each sequence as the cross-sequence evaluation phase.
In general, a generative adversarial model primarily consists of two subnetworks: a generator (denoted as ) and a discriminator (denoted as ) [20]. The mathematical principle can be formulated by an optimization function, which minimizes the error between the generated samples from random noise and real samples while maximizing the accuracy of the discriminator predictions:

On this basis, this work proposes the first GAN-based DL model used for multi-parametric MRI segmentation of CRLMs. The code supporting this work is available at the following repository: https://github.com/Xuezai-wq/CRLM-GAN. As illustrated in Fig. 4, the overall architecture effectively integrated the strengths of UNet++ [39] in segmentation and pre-trained ResNet-50 [40] in multi-scale feature extraction. UNet++ was employed as the generator to obtain initial probabilistic maps that were considered preliminary segmentations. Afterwards, the generated probabilistic maps and real labels were multiplied with the original images, resulting in the predicted tumour images and the labelled tumour images. Through intermediate connections, the above two types of images were input into the discriminator, i.e., a pre-trained ResNet-50, to obtain multi-scale deep convolutional features.

Specifically, the adaptive ResNet-50 was split into five stages. Stage 0 was the initial processing step. Stages 1–4 acted as independent feature extraction units and captured feature maps at different scales. There were 3, 4, 6, and 3 ResNet bottleneck blocks in the four stages. Two forms of bottlenecks were employed in each stage, using identity shortcut and projection shortcut, respectively. The depths of each feature map were sequentially 256, 512, 1024, and 2048. The rear output layers used for classification were discarded. The pre-trained weights on ImageNet were adopted for the initialization of the discriminator. The discriminator assessed the similarity of the fused feature maps to drive the generator to produce more accurate segmentation results. The number of constrained feature maps is highly flexible and can be adjusted from stages 1–4.
During practical training, a comprehensive loss function was constructed by combining the binary cross-entropy and Dice in the generator and the multi-scale constrained loss in the discriminator .

calculates the binary cross-entropy and Dice coefficient between the th ROI and the predicted result , followed by averaging over the total number of images .

In the context of , represents the original image, ⊗ denotes element-wise multiplication between images, refers to the index of feature layers (1–4), and represents the L1 regularization.
The overall loss function is the sum of and :

Evaluation metrics
To evaluate segmentation performance, the DL models were assessed using standard quantitative metrics: Dice similarity coefficient (DSC), intersection over union (IoU), precision, recall, and F1-score. Among these, the DSC served as the primary metric, quantifying the overlap between the predicted segmentation and the ground truth. Precision, recall, and F1-score were used further to characterize the balance between false-positive and false-negative predictions. All metrics were calculated on the testing dataset, providing a comprehensive evaluation of segmentation performance. Detailed metric definitions are provided in the supplementary material. Simultaneously, each metric is accompanied by the analysis of the 95% confidence interval (95% CI).

Technical details on model training
The GAN-based framework was developed in Python 3.7, PyTorch 1.12.1, and SimpleITK 2.1.1. The network was trained on one NVIDIA RTX 3090 GPU. The framework was trained using stochastic gradient descent (SGD) optimization with a learning rate of 1 × 10⁻³. The SGD optimizer was employed with momentum of 0.9 and weight decay of 1 × 10⁻⁴. L2 regularization was implemented through weight decay to prevent overfitting. A batch size of 16 and an input image size of 224 × 224 were used for all the experiments, with 200 training epochs. The combined binary cross-entropy and Dice loss functions were adopted to jointly optimize pixel-wise classification and overlap consistency. U-Net and U-Net++ models were also trained using the same loss functions and training parameters. In all three evaluation scenarios, including two single-condition evaluations and one cross-sequence evaluation, the same network architecture was trained across different imaging conditions.

Results

Results

Single-condition evaluation 1: performance on multi-parametric MRI dataset
The CRLM-GAN model demonstrated robust segmentation performance across the multi-parametric MRI dataset, with different results for pre- and post-NACT MRI images. The results of the five evaluation metrics are listed in Table 2. For the pre-NACT images, the models achieved DSC, IoU, Recall, Precision, and F1-score values of 0.81 (95% CI: 0.76, 0.85), 0.70 (95% CI: 0.63, 0.75), 0.83 (95% CI: 0.77, 0.87), 0.82 (95% CI: 0.76, 0.87), 0.82 (95% CI: 0.77, 0.86) and 0.70 (95% CI: 0.63, 0.77), 0.57 (95% CI: 0.49, 0.65), 0.74 (95% CI: 0.67, 0.81), 0.70 (95% CI: 0.62, 0.79), 0.71 (95% CI: 0.64, 0.78) on pre-NACT DWI and pre-NACT T2WI, respectively. Moreover, segmentation on post-NACT images, including post-NACT DWI and post-NACT T2WI, showed slight decrease in these metrics, with DSC, IoU, Recall, Precision, and F1-score values of 0.67 (95% CI: 0.55, 0.77), 0.54 (95% CI: 0.42, 0.65), 0.67 (95% CI: 0.53, 0.80), 0.71 (95% CI: 0.62, 0.79), 0.67 (95% CI: 0.56, 0.78) and 0.61 (95% CI: 0.53, 0.68), 0.47 (95% CI: 0.39, 0.55), 0.60 (95% CI: 0.51, 0.69), 0.67 (95% CI: 0.59, 0.75), 0.62 (95% CI: 0.55, 0.70). Despite this, the model continued to provide reliable segmentation performance (DSC > 0.5), demonstrating its robustness under challenging post-treatment conditions. The segmentation examples from the multi-parametric MRI dataset are shown in Fig. 5 and are visualized alongside radiologist-defined ROIs for comparison. Table S1 presents the total and average delineation times across ten patients, comparing the proposed model with the abdominal radiologists.

Single-condition evaluation 2: complementary results on the CT dataset
To evaluate the generalizability of the model beyond MRI, the model was trained and tested on the publicly available CT dataset using the same training-to-test split at the patient level. Specifically, 2,142 images were used for training and 1,451 for testing. As shown in Table 5, the model achieved DSC, IoU, Recall, Precision, and F1-score values of 0.70 (95% CI: 0.66, 0.73), 0.57 (95% CI: 0.54, 0.61), 0.72 (95% CI: 0.68, 0.76), 0.74 (95% CI: 0.71, 0.78), and 0.71 (95% CI: 0.68, 0.75), respectively, demonstrating its applicability in CT-based CRLM segmentation. Visualization results of segmentation with tumour numbers ranging from 1 to 4 are provided in Fig. 6, and the overlaps illustrate the differences between the CRLM-GAN model and the defined ROIs.

Quantitative results of cross-sequence evaluation
In cross-sequence evaluation, pre-NACT or post-NACT DWI and T2 sequences were mixed for model training, followed by independent performance evaluation with respect to each sequence. The DSC, IoU, Recall, Precision, and F1-score were still calculated, and the results are summarized in Table 3. The DSC values were 0.82 (95% CI: 0.78, 0.85), 0.71 (95% CI: 0.63, 0.78), 0.72 (95% CI: 0.64, 0.79), and 0.62 (95% CI: 0.53, 0.70), respectively. The results of mixed experiments 1–2 reveal that training the model with combined pre-NACT DWI and T2WI data led to a 1% improvement in DSCs for both DWI and T2WI. A similar trend was observed in the post-NACT setting, where incorporating both post-NACT DWI and T2WI during training enhanced the segmentation accuracy. Notably, the DSC coefficient for post-NACT DWI increased by 4%.

Comparisons with typical DL models
This model was compared with typical DL-based segmentation models, including UNet [41], UNet++ [39], nnU-Net [42], TransUNet [43], and SegAN [44]. As reported in Table 4, this approach outperformed either the baseline UNet++ model or the other four models under all the MRI imaging conditions. The CRLM-GAN model demonstrated highest DSC, IoU, and F1-score values on pre-NACT DWI, post-NACT DWI, pre-NACT T2WI, and post-NACT T2WI, respectively. Furthermore, the results indicate greater improvements in segmentation performance under post-NACT conditions than pre-NACT conditions. The comparative analysis proves the advancement of the generative adversarial framework in segmenting more challenging CRLM lesions after chemotherapy intervention. Table 5 demonstrates that the CRLM-GAN model yielded the highest DSC value of 0.70 (95% CI: 0.66, 0.73) for CT segmentation, exceeding the performance of UNet++ (regarded as the baseline model) and the remaining models. Figure 7 shows representative samples of automated segmentation contours (red) and radiologist-defined contours (green) under these five imaging conditions (pre-NACT DWI, pre-NACT T2WI, post-NACT DWI, post-NACT T2WI, and CT) compared with these five typical DL models. Visually, the proposed model produced more refined and accurate contour segmentation results.

Discussion

Discussion
In this study, an innovative generative adversarial framework for multi-parametric MRI-based segmentation of CRLMs with simultaneous robust performance on CT images was proposed. This framework effectively integrates the segmentation capability of UNet++ and the advantage of ResNet-50 in feature extraction. By further leveraging adversarial training with multi-scale feature constraints, the model achieved DSCs of 0.81 (95% CI: 0.76, 0.85), 0.70 (95% CI: 0.63, 0.77), 0.67 (95% CI: 0.55, 0.77), and 0.61 (95% CI: 0.53, 0.68) on pre-NACT DWI, post-NACT DWI, pre-NACT T2WI, and post-NACT T2WI images, respectively, surpassing UNet, UNet++, nnU-Net, TransUNet, and SegAN models. The model also demonstrated a superior DSC of 0.70 (95% CI: 0.66, 0.73) compared with these typical deep learning models on the publicly available CT dataset. To our knowledge, this is the first GAN-based semantic segmentation method applied to both multi-parametric MRI images and CT images of CRLMs.
Although the ML-based segmentation of primary liver tumours has been widely explored [45–47], the DL-based segmentation of CRLMs remains particularly challenging because of the complexity and heterogeneity of multiple tumours—in other words, their diverse structural variations. Early studies in the field focused mainly on automated segmentation of contrast-enhanced CT images [15, 29, 48]. In liver metastasis diagnosis, MRI enables clearer differentiation of tumour layers and surrounding fat gaps, thereby facilitating more precise assessment of tumour infiltration depth [49]. Therefore, this study places greater emphasis on segmentation among multi-parametric MRI images. Moreover, as a crucial supplementary modality [12, 50], the automatic segmentation performance of T2WI images should be fully explored because of its ability to improve the adaptability of automated segmentation models, which are often ignored in CRLM imaging analysis.
Previous studies have demonstrated that the size-based RECIST 1.1 criteria offer a reliable standard for evaluating responses to targeted CRLM therapy: the criteria have been widely adopted in clinical trials [51, 52]. Following chemotherapy, tumours typically decrease in size, and the affected areas often exhibit varying degrees of fibrosis [53]. Consequently, lesion segmentation on post-NACT MRI is more challenging than that on baseline imaging because of these treatment-induced changes in shape characteristics. Beyond providing segmentation results for baseline MRI, establishing a quantitative and systematic assessment of DL-based segmentation for post-NACT MRI is essential for enhancing the utility of RECIST 1.1 in clinical practice [54, 55].
Furthermore, many studies have concentrated on the segmentation of one imaging modality with a two-step approach: that is, first segmenting the liver region before segmenting all tumours [32, 33]. One-stage segmentation models have been less commonly investigated, especially in multi-parametric MRI and cross-modality (MRI-CT) scenarios. Vorontsov et al. [29] used an FCN for liver lesion segmentation based on 156 contrast material-enhanced CT examinations and achieved DSC values ranging from 0.14 to 0.68. Kamkova et al. [33] developed a complex cascaded neural network incorporating a cascade of four models: UNet, VNet, SegResNet, and HighResNet. The model was trained on an internal dataset comprising 84 MRI images and achieved a Dice score of 0.71 for liver metastasis segmentation. Liu et al. [32] applied UNet to liver segmentation and then further segmented liver metastatic lesions on a DWI sequence, obtaining a DSC value of 0.85. These results were also consistent with our findings, while our model achieved comparable performance via a one-stage deep learning method. The model is highly convenient and does not require prior liver identification or adaptability across various imaging conditions. The novel integration of precision feature constraints with a training strategy (a generator trained from scratch and a pre-trained discriminator) promotes the full potential of each component of the model and preserves high-dimensional feature details, which makes the one-stage pipeline significantly different from those used in previous studies, mainly involving two-stage designs.
The results of mixed-sequence experiments suggest the additional advantage of incorporating two sequences. These results illustrate that leveraging complementary information from different training modalities can strengthen the ability of the model to delineate tumours, particularly on post-NACT images. In clinical diagnosis, T2WI serves as a crucial auxiliary sequence, and its synergistic role with DWI in GAN-based segmentation has not been fully explored. Compared with the primary DWI sequence, T2WI tends to include more structural and spatial information. Hence, this study systematically analysed the utility of combining DWI and T2WI images. These findings highlight the adaptability of the model to diverse CRLM cases and contribute to the understanding of the potential benefits of integrating different sequences into the GAN-based model. Moreover, the stable results with respect to the public contrast-enhanced CT scans to some extent indicate the adaptability of the model to datasets with larger domain variations in the more challenging complementary modality evaluation. Future work will further validate the model on external multi-centre and multi-vendor datasets.
This study has several limitations. First, the research was restricted to a relatively moderate patient scale. To mitigate the risk of overfitting, several strategies were employed in this study, including extensive data augmentation (e.g., rotation, flipping, and colour jittering), the use of a ResNet-50 pre-trained on ImageNet as the discriminator to leverage transfer learning, and the introduction of an independent public CT dataset for complementary validation, which somewhat tested the generalization ability of the model. In clinical practice, the inclusion and delineation of CRLM patients are extremely time-consuming. Although the GAN-based model demonstrated adequate segmentation performance in independent testing, the use of multi-centre datasets and prospective cohorts is important for thoroughly assessing the robustness and reproducibility of the proposed DL algorithm. And further studies could incorporate complementary boundary-based measures, such as Hausdorff distance or surface distance, to provide a more comprehensive assessment of segmentation quality. Secondly, the current CRLM-GAN model starts with a 2D architecture, and a 3D segmentation model may offer greater convenience in practical applications. Future endeavours will expand the developed architecture to the 3D domain. Extending the GAN framework from 2D to 3D can potentially improve segmentation accuracy by capturing volumetric contextual information that 2D slices cannot represent. This richer spatial information can help the model better delineate complex anatomical structures and inter-slice continuity. However, the increased computational complexity, training instability, and poor generative quality associated with 3D architectures may also introduce noise or artefacts [56–58], which could partially offset the gains in accuracy. Given that DL-based segmentation models often act as black boxes, subsequent investigations could explore explainable AI methods such as pixel-attribution maps or region-based explanations [59, 60] to enhance interpretability and clinical trust by visualizing model reasoning and identifying key image features that drive predictions. Finally, to simplify the modelling process, this work focused solely on T2WI and DWI images. The potential utility of other MRI sequences, such as DCE and T1WI, should be further investigated. Moreover, incorporating scanners from additional vendors is advocated to reduce the equipment-based sensitivity of the model.

Conclusion

Conclusion
In conclusion, the present study constructed an innovative GAN-based model that can facilitate the automatic segmentation of CRLMs across diverse imaging conditions, which contributes to expediting treatment procedures and the clinical efficacy of RECIST 1.1. Given the circumstances of the inefficient and time-demanding manual annotation in multi-lesion CRLM samples, CRLM-GAN can significantly increase the segmentation efficiency across MRI scans, regardless of pre- or post-treatment status. The approach also demonstrates strong potential for application in CT imaging. Furthermore, this work expands the utilization of T2WI images in AI-based segmentation, providing a systematic and abundant segmentation analysis rather than a single image modality. Future work will focus on validating the generalization performance through multi-centre data, with the final goal of extending the model to clinical practice.

Supplementary Information

Supplementary Information
Below is the link to the electronic supplementary material.

출처: PubMed Central (JATS). 라이선스는 원 publisher 정책을 따릅니다 — 인용 시 원문을 표기해 주세요.

🏷️ 같은 키워드 · 무료전문 — 이 논문 MeSH/keyword 기반

🟢 PMC 전문 열기