Dual-branch collaborative GAN with multi-scale CBAM and anatomical topology coding for enhanced early HCC detection in CT.
1/5 보강
The early detection of Hepatocellular Carcinoma (HCC) using computed tomography (CT) is impeded by high annotation costs, lesion heterogeneity, and inadequate anatomical topology modeling.
APA
Zang L, Lv C, Huang M (2025). Dual-branch collaborative GAN with multi-scale CBAM and anatomical topology coding for enhanced early HCC detection in CT.. Scientific reports, 15(1), 40082. https://doi.org/10.1038/s41598-025-23991-z
MLA
Zang L, et al.. "Dual-branch collaborative GAN with multi-scale CBAM and anatomical topology coding for enhanced early HCC detection in CT.." Scientific reports, vol. 15, no. 1, 2025, pp. 40082.
PMID
41249290 ↗
Abstract 한글 요약
The early detection of Hepatocellular Carcinoma (HCC) using computed tomography (CT) is impeded by high annotation costs, lesion heterogeneity, and inadequate anatomical topology modeling. This study proposes a Dual-Branch Collaborative Generative Adversarial Network (DB-Collab GAN) with Anatomical Topology Coding to address these challenges. The framework features a dual-branch architecture that forms a "segmentation-guided detection" loop, with cross-layer feature sharing enhancing local-global complementarity. A layered Multi-Scale Convolutional Block Attention Module (CBAM) captures micro-details via 1 × 1 convolutions and liver anatomy via 5 × 5 convolutions. Anatomically tailored sine-cosine coding embeds the Couinaud segment topology, reducing the mean localization error (ADE) to 3.01 mm. Semi-supervised adversarial optimization with a dual-path discriminator achieved performance comparable to 7,140 supervised cases using only 1,070 labeled cases. On 7,140 clinical CT slices, the method outperformed the baselines in terms of accuracy (0.8875 ± 0.02), recall (0.8613 ± 0.03), and F1-score (0.8848 ± 0.02), with a 10.66% higher F1-score than Mask RCNN. Ablation studies confirmed the contributions of the multiscale CBAM and topology coding. It maintains robustness under high noise (ADE = 4.57 mm), providing a low-annotation-dependent solution, effectively reducing missed diagnoses and misclassifications of small lesions and vascular artifacts, and supporting clinical decision-making in early intervention.
🏷️ 키워드 / MeSH 📖 같은 키워드 OA만
📖 전문 본문 읽기 PMC JATS · ~99 KB · 영문
Introduction
Introduction
Hepatocellular Carcinoma (HCC) remains a global healthcare crisis, ranking as the sixth most common malignancy and the third leading cause of cancer-related deaths worldwide1. According to the latest statistics from the International Agency for Research on Cancer (IARC), approximately 890,000 new HCC cases were diagnosed globally in 2022, with over 830,000 deaths attributed to the disease2. A critical clinical paradox persists: HCC exhibits asymptomatic progression in its early stages, yet over 60% of patients are diagnosed at intermediate or advanced stages, where curative treatments are no longer feasible3,4. This late diagnosis directly contributes to the dismal five-year survival rate of less than 20%, underscoring the urgent need for accurate early screening tools5.
Computed tomography (CT) has emerged as the cornerstone of early HCC screening because of its ability to capture high-resolution anatomical details, including lesion morphology, density dynamics across contrast phases (arterial, portal, and delayed), and spatial relationships with surrounding hepatic structures (e.g. portal veins, hepatic veins, and bile ducts)3,5,6. However, manual CT interpretation for early HCC detection is challenging. Radiologists must integrate subtle cues, such as irregular margins, heterogeneous enhancement patterns, and subtle size changes, a process that is highly subjective, time-consuming (averaging 15–30 min per scan for experienced radiologists), and prone to missed diagnoses, particularly for small lesions (< 1 cm) or those obscured by cirrhotic background changes6,7. These limitations highlight the pressing need for automated detection frameworks that can enhance accuracy, reduce subjectivity, and improve efficiency in the early screening of HCC8.
Despite advancements in medical image analysis, automated HCC detection in CT images faces three unresolved bottlenecks that hinder clinical translation40.
High annotation cost and scarcity of labelled data: Annotating HCC lesions in CT scans requires specialised expertise from radiologists with subspecia single CT volume (typically 50–100 slices) demands collaborative labelling by 2–3 senior radiologists to ensure accuracy, with each case taking 30–60 min to complete4,6. lty training for abdominal imaging. This labour-intensive process results in the limited availability of high-quality labelled datasets, which is a critical barrier for fully supervised deep learning models that require large-scale annotated data to generalise effectively4,934.
Significant Lesion Heterogeneity: HCC exhibits marked inter- and intra-tumoural heterogeneity on CT imaging, manifesting as irregular shapes (from nodular to infiltrative), variable density distributions (hypodense, isodense, or hyperdense relative to liver parenchyma), and dynamic enhancement patterns (e.g. arterial-phase hyperenhancement with portal-phase washout, a hallmark of early HCC)8. This heterogeneity complicates feature extraction because models must distinguish true lesions from benign mimics (e.g. haemangiomas and focal nodular hyperplasia) and cirrhotic regenerative nodules8,10.
Insufficient modelling of anatomical spatial topology: The liver is anatomically divided into eight segments based on the Couinaud classification, a system that defines segments by their vascular supply (portal veins) and drainage (hepatic veins)1038. The tumour location relative to these segments, including proximity to major vessels or bile ducts, provides critical diagnostic clues for early HCC. However, existing automated methods fail to explicitly model these topological relationships, leading to high false-positive rates (e.g. misclassifying vascular artefacts or dilated bile ducts as lesions) and poor localisation accuracy for multifocal or cross-segment tumours10,1135.
Deep learning has revolutionised medical image analysis, with frameworks such as Mask RCNN12, YOLOv513, and RetinaNet demonstrating their promise in lesion detection. However, their application in early HCC detection remains limited by critical flaws14–16.
Single-Task Architectures: Models like Mask RCNN12 and YOLOv513 treat detection and segmentation as isolated tasks, missing opportunities for cross-task feature synergy. Mask RCNN, despite its strengths in boundary delineation, struggles with small HCC lesions (< 3 mm) due to insufficient focus on microscale features12,17. Although YOLOv5 is efficient, it is prone to misclassifying vascular artefacts as lesions in cirrhotic livers, where anatomical noise obscures the true lesion boundaries13,18.
Inadequate Feature Fusion: Multi-scale fusion models (for example, MTL 3D CNN19 and attention mechanisms (for example, CBAM20 attempt to capture local and global features but fail to address HCC-specific heterogeneity. For instance, the CBAM enhances channel and spatial attention but lacks customisation for liver anatomy, leading to suboptimal performance in distinguishing tumour edges from adjacent vessels11,2137.
Weak Anatomical Constraints22,23: Transformer-based methods that leverage position encoding to model spatial relationships18,24 have shown potential in medical imaging but remain underdeveloped for HCC. Their generic position encoding does not explicitly incorporate liver segment anatomy, limiting the ability to model topological dependencies between tumours and surrounding structures18,2436.
Heavy reliance on labelled data: Even state-of-the-art semi-supervised frameworks (e.g. FocalMix25 require large labelled datasets to achieve robustness, with HCC models typically needing ≥ 1,000 annotated cases to perform reliably in low-contrast, high-noise scenarios9,25. This reliance is impractical in clinical settings, where annotated data are scarce26–28.
To address these challenges, this study proposes a Dual-Branch Collaborative Generative Adversarial Network (DB-Collab GAN) with Anatomical Topology Coding tailored for early HCC detection in CT images. This framework advances the state-of-the-art through four key innovations explicitly designed to overcome the limitations of existing methods.
Dual-Branch Collaborative Architecture: A generator decoupled into detection and segmentation branches forms a closed-loop “segmentation-guided detection” mechanism. The segmentation branch generates pixel-level spatial priors (e.g. edge confidence maps) to suppress vascular artefact interference in the detection branch, whereas cross-layer feature sharing (shallow 128 × 128 × 32 features) enhances the complementarity between local details and global semantics.
Layered Multi-Scale CBAM Module: Shallow layers (1–3) use 1 × 1 convolutions to capture micro-lesion details (e.g. burrs and lobulations), whereas deep layers (4–6) employ 5 × 5 convolutions with GELU activation to model liver segment anatomy. Channel-space attention dynamically fuses these features, enhancing the heterogeneous characterisation of lesions while maintaining computational efficiency.
Anatomically Tailored Sine-Cosine Position Coding: Position encoding is customised to the eight Couinaud liver segments, embedding pixel coordinates as frequency-domain features (coding dimension = 16) to explicitly model the topological relationships between tumours and surrounding structures. This reduced the mean localisation error (ADE) to 3.01 mm, which is critical for small lesion detection.
Semi-supervised adversarial optimisation: A dual-path discriminator (classification + segmentation) jointly optimises the generator performance. Combined with pseudo-label data enhancement, the framework achieved a performance comparable to 7,140 fully supervised cases using only 1,070 labelled cases, drastically reducing annotation dependency29,30.
Hepatocellular Carcinoma (HCC) remains a global healthcare crisis, ranking as the sixth most common malignancy and the third leading cause of cancer-related deaths worldwide1. According to the latest statistics from the International Agency for Research on Cancer (IARC), approximately 890,000 new HCC cases were diagnosed globally in 2022, with over 830,000 deaths attributed to the disease2. A critical clinical paradox persists: HCC exhibits asymptomatic progression in its early stages, yet over 60% of patients are diagnosed at intermediate or advanced stages, where curative treatments are no longer feasible3,4. This late diagnosis directly contributes to the dismal five-year survival rate of less than 20%, underscoring the urgent need for accurate early screening tools5.
Computed tomography (CT) has emerged as the cornerstone of early HCC screening because of its ability to capture high-resolution anatomical details, including lesion morphology, density dynamics across contrast phases (arterial, portal, and delayed), and spatial relationships with surrounding hepatic structures (e.g. portal veins, hepatic veins, and bile ducts)3,5,6. However, manual CT interpretation for early HCC detection is challenging. Radiologists must integrate subtle cues, such as irregular margins, heterogeneous enhancement patterns, and subtle size changes, a process that is highly subjective, time-consuming (averaging 15–30 min per scan for experienced radiologists), and prone to missed diagnoses, particularly for small lesions (< 1 cm) or those obscured by cirrhotic background changes6,7. These limitations highlight the pressing need for automated detection frameworks that can enhance accuracy, reduce subjectivity, and improve efficiency in the early screening of HCC8.
Despite advancements in medical image analysis, automated HCC detection in CT images faces three unresolved bottlenecks that hinder clinical translation40.
High annotation cost and scarcity of labelled data: Annotating HCC lesions in CT scans requires specialised expertise from radiologists with subspecia single CT volume (typically 50–100 slices) demands collaborative labelling by 2–3 senior radiologists to ensure accuracy, with each case taking 30–60 min to complete4,6. lty training for abdominal imaging. This labour-intensive process results in the limited availability of high-quality labelled datasets, which is a critical barrier for fully supervised deep learning models that require large-scale annotated data to generalise effectively4,934.
Significant Lesion Heterogeneity: HCC exhibits marked inter- and intra-tumoural heterogeneity on CT imaging, manifesting as irregular shapes (from nodular to infiltrative), variable density distributions (hypodense, isodense, or hyperdense relative to liver parenchyma), and dynamic enhancement patterns (e.g. arterial-phase hyperenhancement with portal-phase washout, a hallmark of early HCC)8. This heterogeneity complicates feature extraction because models must distinguish true lesions from benign mimics (e.g. haemangiomas and focal nodular hyperplasia) and cirrhotic regenerative nodules8,10.
Insufficient modelling of anatomical spatial topology: The liver is anatomically divided into eight segments based on the Couinaud classification, a system that defines segments by their vascular supply (portal veins) and drainage (hepatic veins)1038. The tumour location relative to these segments, including proximity to major vessels or bile ducts, provides critical diagnostic clues for early HCC. However, existing automated methods fail to explicitly model these topological relationships, leading to high false-positive rates (e.g. misclassifying vascular artefacts or dilated bile ducts as lesions) and poor localisation accuracy for multifocal or cross-segment tumours10,1135.
Deep learning has revolutionised medical image analysis, with frameworks such as Mask RCNN12, YOLOv513, and RetinaNet demonstrating their promise in lesion detection. However, their application in early HCC detection remains limited by critical flaws14–16.
Single-Task Architectures: Models like Mask RCNN12 and YOLOv513 treat detection and segmentation as isolated tasks, missing opportunities for cross-task feature synergy. Mask RCNN, despite its strengths in boundary delineation, struggles with small HCC lesions (< 3 mm) due to insufficient focus on microscale features12,17. Although YOLOv5 is efficient, it is prone to misclassifying vascular artefacts as lesions in cirrhotic livers, where anatomical noise obscures the true lesion boundaries13,18.
Inadequate Feature Fusion: Multi-scale fusion models (for example, MTL 3D CNN19 and attention mechanisms (for example, CBAM20 attempt to capture local and global features but fail to address HCC-specific heterogeneity. For instance, the CBAM enhances channel and spatial attention but lacks customisation for liver anatomy, leading to suboptimal performance in distinguishing tumour edges from adjacent vessels11,2137.
Weak Anatomical Constraints22,23: Transformer-based methods that leverage position encoding to model spatial relationships18,24 have shown potential in medical imaging but remain underdeveloped for HCC. Their generic position encoding does not explicitly incorporate liver segment anatomy, limiting the ability to model topological dependencies between tumours and surrounding structures18,2436.
Heavy reliance on labelled data: Even state-of-the-art semi-supervised frameworks (e.g. FocalMix25 require large labelled datasets to achieve robustness, with HCC models typically needing ≥ 1,000 annotated cases to perform reliably in low-contrast, high-noise scenarios9,25. This reliance is impractical in clinical settings, where annotated data are scarce26–28.
To address these challenges, this study proposes a Dual-Branch Collaborative Generative Adversarial Network (DB-Collab GAN) with Anatomical Topology Coding tailored for early HCC detection in CT images. This framework advances the state-of-the-art through four key innovations explicitly designed to overcome the limitations of existing methods.
Dual-Branch Collaborative Architecture: A generator decoupled into detection and segmentation branches forms a closed-loop “segmentation-guided detection” mechanism. The segmentation branch generates pixel-level spatial priors (e.g. edge confidence maps) to suppress vascular artefact interference in the detection branch, whereas cross-layer feature sharing (shallow 128 × 128 × 32 features) enhances the complementarity between local details and global semantics.
Layered Multi-Scale CBAM Module: Shallow layers (1–3) use 1 × 1 convolutions to capture micro-lesion details (e.g. burrs and lobulations), whereas deep layers (4–6) employ 5 × 5 convolutions with GELU activation to model liver segment anatomy. Channel-space attention dynamically fuses these features, enhancing the heterogeneous characterisation of lesions while maintaining computational efficiency.
Anatomically Tailored Sine-Cosine Position Coding: Position encoding is customised to the eight Couinaud liver segments, embedding pixel coordinates as frequency-domain features (coding dimension = 16) to explicitly model the topological relationships between tumours and surrounding structures. This reduced the mean localisation error (ADE) to 3.01 mm, which is critical for small lesion detection.
Semi-supervised adversarial optimisation: A dual-path discriminator (classification + segmentation) jointly optimises the generator performance. Combined with pseudo-label data enhancement, the framework achieved a performance comparable to 7,140 fully supervised cases using only 1,070 labelled cases, drastically reducing annotation dependency29,30.
Method
Method
This section details the technical implementation of the proposed Dual-Branch Collaborative Generative Adversarial Network (DB-Collab GAN with TopoCode) framework, which is tailored to address the core challenges of early HCC detection in CT images, including high labelling cost, lesion heterogeneity, and insufficient anatomical topology modelling.
Overall architecture of DB-Collab GAN
The proposed framework achieves synergistic optimisation of lesion segmentation and detection through adversarial training, consisting of three core components: a dual-stage preprocessing module, a generator with decoupled dual branches, and a dual-path discriminant. The framework follows the cascaded workflow of “preprocessing enhancement → feature decoupling → adversarial optimization” (Fig. 1), with the generator built on a Wasserstein GAN (WGAN) backbone to stabilise training.
Generator: Adopts a decoupled dual-branch design to handle complementary tasks:
Detection branch: This branch focuses on lesion existence classification and bounding box regression, outputting confidence scores and spatial coordinates using channel-attention-weighted feature maps.
Segmentation branch: Generates pixel-level semantic masks by combining spatial attention and sine-cosine position coding (TopoCode), providing spatial priors to guide the detection branch of the network.
Dual-Path Discriminator: Comprises two parallel sub-networks. :
Classification discriminator: Inputs preprocessed CT images and outputs the probability of “real” (dataset-derived) vs. “generated” (synthesised) images.
Segmentation discriminator: Inputs lesion masks and outputs the probability of “anatomically rational” (consistent with liver topology) vs. “irrational” masks.
Optimisation objective: Trained end-to-end by minimising a hybrid loss function integrating detection, segmentation, and adversarial losses, with weights tuned to balance task performance.
The signal interaction between the generator and dual-path discriminator follows a closed-loop logic of ‘forward data transmission–backward loss feedback.’:
Forward Pass: The detection branch of the generator outputs CT images with lesion bounding boxes, which are fed into the classification discriminator via the red arrow to evaluate the consistency between the generated images and real clinical images. The segmentation branch outputs pixel-level lesion masks, which are fed into the segmentation discriminator via the red arrow to verify whether the masks conform to the anatomical topology of the liver.
Backward Feedback: The dual-path discriminator calculates the classification loss and segmentation loss, and transmits the fused adversarial loss signal to the generator via bidirectional arrows. This guides the generator to adjust the weight parameters of the detection and segmentation branches, enabling an iterative training process of “Generation - Discrimination - Optimization.”
Dual-stage image preprocessing
To mitigate low contrast and noise interference in clinical HCC CT images, a dual-stage preprocessing pipeline was proposed to enhance lesion visibility while preserving anatomical integrity.
Contrast enhancement with CLAHE
Contrast-limited adaptive histogram equalisation (CLAHE) was applied to amplify the local contrast between tumours and background tissues31. The key parameters were optimised based on the validation results (Table 1). Sub-window size: 8 × 8 (balances local detail preservation and computational efficiency). Contrast limit factor: (prevents over-amplification of noise in cirrhotic backgrounds).
The CLAHE transformation is defined as follows:
Where denotes the sub-window of input image I, is histogram equalization, (number of pixels in the sub-window), and suppresses noise amplification.
Intensity standardization with Z-score
Post-CLAHE, Z-score normalisation eliminates inter-scanner intensity variations, standardising pixel values to a distribution with a mean of 0 and a variance of 132.
Where and are the mean and standard deviation of , respectively.
Table 1 compares the enhancement performances of the different preprocessing strategies. Z-score normalisation alone preserves the image structure (SSIM = 1.0000) but does not improve the contrast. CLAHE alone enhances the contrast (Std = 66.76) but introduces noise (SSIM = 0.2648). The combined CLAHE + Z-Score strategy outperformed the others in terms of PSNR (21.47 dB), SSIM (0.2940), and local contrast (LCE = 23.40), balancing lesion visibility and anatomical preservation. Figure 2 further confirms the superiority of CLAHE + Z-Score method. The original image shows low tumour-background contrast (a), whereas the Z-score alone preserves the structure but not the contrast (b). CLAHE alone enhances the contrast but amplifies the noise (c). CLAHE + Z-Score clearly highlighted the tumour boundaries while preserving the anatomical details (d).
Multi-scale CBAM feature extraction network
To capture heterogeneous HCC features (micro-lesion details and global liver anatomy), a layered multi-scale Convolutional Block Attention Module (Multi-Scale CBAM) was designed (Fig. 3), enabling hierarchical modelling of lesion heterogeneity. Notably, this module is improved based on the original CBAM20 to address its limitations in early HCC detection. The core differences between the two are shown in Table 2, and the design rationale for key optimizations is detailed below:
Where, Shallow 1 × 1 kernel: Preserves < 1 cm lesion details at high resolution while reducing computational load to 1/9th of a 3 × 3 kernel, minimizing missed microlesions. Deep 5 × 5 Core: Receptacle field matching 15–20 mm Couinaud hepatic segments, enhancing tumor-portal vein correlation modeling to reduce misinterpretation of vascular artifacts. Deep GELU: Dynamically filters artifacts in cirrhosis backgrounds, avoiding the loss of weak features of ReLU and excessive edge smoothing of Swish, thereby meeting the medical imaging noise suppression requirements.
Multi-scale convolution design
Shallow layers (1–3): 1 × 1 convolution kernels (stride = 1, padding = 0) were used to capture the fine-grained details of micro-lesions (< 1 cm), such as irregular margins and burrs, which are critical for early HCC detection.
Deep layers (4–6): 5 × 5 convolution kernels (stride = 1, padding = 2) model global anatomical context, including tumour-vessel adjacency. A Gated Linear Unit (GELU) activation function was adopted to suppress noise in cirrhotic backgrounds:
Where denotes the cumulative distribution function of the standard normal distribution.
Channel-spatial attention fusion
Multiscale features are dynamically fused via dual attention mechanisms to emphasise discriminative regions as follows:
Channel attention: Captures channel-wise importance through global average pooling (GAP) and global maximum pooling (GMP).
Where is the input feature map, is the sigmoid function, and the MLP includes a hidden layer with dimension to reduce computational cost.
2)Spatial attention: Focuses on anatomically critical regions (e.g. tumour boundaries) by aggregating channel-wise information via 7 × 7 convolution:
Where (element-wise multiplication).
3)Adaptive fusion: Residual connections preserve low-level features to avoid information loss.
Anatomical topology coding (TopoCode) for Couinaud segment modeling
Coding design for Couinaud segments
For a pixel with coordinates in 256 × 256 CT slices, the TopoCode is defined as:
Where (coding dimension), validated through ablation experiments to match 8 Couinaud segments while balancing precision and noise robustness. (scaling factors), calibrated to liver anatomy (256 × 256 pixels correspond to ~ 15 cm liver diameter) to ensure pixels within the same segment share similar codes.
Fusion with feature maps
TopoCode is fused with multi-scale features via LayerNorm and an MLP (one hidden layer with 64 neurones and ReLU activation) to preserve anatomical information:
Where the MLP projects 16-dimensional TopoCode to match the channel dimension of (128 channels for shallow features, 512 channels for deep features).
Segmentation-detection dual-branch collaboration mechanism
A bidirectional feedback mechanism enables closed-loop optimisation of “segmentation-guided detection”, balancing local detail preservation and global semantic consistency.
Branch-specific outputs and loss functions
Detection Branch: Extracts high-level semantics from using 3 × 3 convolutions, outputting lesion existence probabilityand bounding box coordinates . A size-aware weighted MSE loss was employed to emphasize small lesions:
Where ( lesion diameter in mm) assigns higher weights to smaller lesions, and are ground-truth annotations.
2)Segmentation Branch: Generates pixel-level masks using a Unet architecture with 4-level skip connections, transposed convolution for upsampling, and embedded multi-scale CBAM modules. A boundary-aware Dice loss was used to prioritize edge accuracy:
Where B is a boundary mask (1 for pixels within 3 mm of lesion edges), is the ground-truth mask, and avoids division by zero.
Collaborative feedback strategies
Edge Confidence Modulation: The segmentation branch generates an edge confidence map (via a 3 × 3 Sobel operator) to suppress vascular artefact interference in detection.
Where is the predicted bounding box.
2)Cross-Layer Feature Sharing: Shallow features from the segmentation branch (128 × 128 × 32, layer 2) are injected into the middle layer of the detection branch (layer 4) via a 1 × 1 convolution for dimension alignment (32→64 channels):
Semi-supervised adversarial optimization
To reduce the dependency on labelled data, a dual-path discriminator and pseudo-label strategy were employed for semi-supervised training.
Adversarial loss is defined to optimise both image realism and mask anatomical rationality.
Where (tuned via validation) balances the contribution of image and mask adversarial losses.
The total loss function integrates task-specific and adversarial losses as follows:
with weights ,,.
Implementation details
Dataset: 7140 5 mm CT images of 238 confirmed HCC patients provided by Hainan General Hospital were used (ethical approval Yilun [2022] No. 125). The dataset included early HCC (single lesion ≤ 2 cm, 102 cases), advanced HCC (lesion > 2 cm or multifocal, 136 cases), 187 cases with cirrhosis, and 51 cases without cirrhosis. All images were annotated by three senior radiologists (with ≥ 5 years of HCC diagnosis experience) via double-blind labelling, including tumour bounding boxes and segmentation masks. All baseline models (Mask RCNN, YOLOv5, RetinaNet, etc.) employed identical dataset partitioning, data augmentation, optimizer, and scheduling as the proposed model.
Data Split: The data were stratified into training (70%, 4998 slices), validation (20%, 1428 slices), and test (10%, 714 slices) sets, preserving the distributions of tumour size (< 1 cm, 1–3 cm, > 3 cm) and cirrhosis background (present/absent).
Data Augmentation: Applied during training only, including random rotation (± 15°), scaling (), horizontal flipping, and Gaussian noise injection ().
Optimizer and Scheduling: Adam optimizer with generator learning rate (,) and discriminator learning rate (,). A linear learning rate decay (10% reduction every 10 epochs) was applied after 60 epochs to stabilize the training.
This section details the technical implementation of the proposed Dual-Branch Collaborative Generative Adversarial Network (DB-Collab GAN with TopoCode) framework, which is tailored to address the core challenges of early HCC detection in CT images, including high labelling cost, lesion heterogeneity, and insufficient anatomical topology modelling.
Overall architecture of DB-Collab GAN
The proposed framework achieves synergistic optimisation of lesion segmentation and detection through adversarial training, consisting of three core components: a dual-stage preprocessing module, a generator with decoupled dual branches, and a dual-path discriminant. The framework follows the cascaded workflow of “preprocessing enhancement → feature decoupling → adversarial optimization” (Fig. 1), with the generator built on a Wasserstein GAN (WGAN) backbone to stabilise training.
Generator: Adopts a decoupled dual-branch design to handle complementary tasks:
Detection branch: This branch focuses on lesion existence classification and bounding box regression, outputting confidence scores and spatial coordinates using channel-attention-weighted feature maps.
Segmentation branch: Generates pixel-level semantic masks by combining spatial attention and sine-cosine position coding (TopoCode), providing spatial priors to guide the detection branch of the network.
Dual-Path Discriminator: Comprises two parallel sub-networks. :
Classification discriminator: Inputs preprocessed CT images and outputs the probability of “real” (dataset-derived) vs. “generated” (synthesised) images.
Segmentation discriminator: Inputs lesion masks and outputs the probability of “anatomically rational” (consistent with liver topology) vs. “irrational” masks.
Optimisation objective: Trained end-to-end by minimising a hybrid loss function integrating detection, segmentation, and adversarial losses, with weights tuned to balance task performance.
The signal interaction between the generator and dual-path discriminator follows a closed-loop logic of ‘forward data transmission–backward loss feedback.’:
Forward Pass: The detection branch of the generator outputs CT images with lesion bounding boxes, which are fed into the classification discriminator via the red arrow to evaluate the consistency between the generated images and real clinical images. The segmentation branch outputs pixel-level lesion masks, which are fed into the segmentation discriminator via the red arrow to verify whether the masks conform to the anatomical topology of the liver.
Backward Feedback: The dual-path discriminator calculates the classification loss and segmentation loss, and transmits the fused adversarial loss signal to the generator via bidirectional arrows. This guides the generator to adjust the weight parameters of the detection and segmentation branches, enabling an iterative training process of “Generation - Discrimination - Optimization.”
Dual-stage image preprocessing
To mitigate low contrast and noise interference in clinical HCC CT images, a dual-stage preprocessing pipeline was proposed to enhance lesion visibility while preserving anatomical integrity.
Contrast enhancement with CLAHE
Contrast-limited adaptive histogram equalisation (CLAHE) was applied to amplify the local contrast between tumours and background tissues31. The key parameters were optimised based on the validation results (Table 1). Sub-window size: 8 × 8 (balances local detail preservation and computational efficiency). Contrast limit factor: (prevents over-amplification of noise in cirrhotic backgrounds).
The CLAHE transformation is defined as follows:
Where denotes the sub-window of input image I, is histogram equalization, (number of pixels in the sub-window), and suppresses noise amplification.
Intensity standardization with Z-score
Post-CLAHE, Z-score normalisation eliminates inter-scanner intensity variations, standardising pixel values to a distribution with a mean of 0 and a variance of 132.
Where and are the mean and standard deviation of , respectively.
Table 1 compares the enhancement performances of the different preprocessing strategies. Z-score normalisation alone preserves the image structure (SSIM = 1.0000) but does not improve the contrast. CLAHE alone enhances the contrast (Std = 66.76) but introduces noise (SSIM = 0.2648). The combined CLAHE + Z-Score strategy outperformed the others in terms of PSNR (21.47 dB), SSIM (0.2940), and local contrast (LCE = 23.40), balancing lesion visibility and anatomical preservation. Figure 2 further confirms the superiority of CLAHE + Z-Score method. The original image shows low tumour-background contrast (a), whereas the Z-score alone preserves the structure but not the contrast (b). CLAHE alone enhances the contrast but amplifies the noise (c). CLAHE + Z-Score clearly highlighted the tumour boundaries while preserving the anatomical details (d).
Multi-scale CBAM feature extraction network
To capture heterogeneous HCC features (micro-lesion details and global liver anatomy), a layered multi-scale Convolutional Block Attention Module (Multi-Scale CBAM) was designed (Fig. 3), enabling hierarchical modelling of lesion heterogeneity. Notably, this module is improved based on the original CBAM20 to address its limitations in early HCC detection. The core differences between the two are shown in Table 2, and the design rationale for key optimizations is detailed below:
Where, Shallow 1 × 1 kernel: Preserves < 1 cm lesion details at high resolution while reducing computational load to 1/9th of a 3 × 3 kernel, minimizing missed microlesions. Deep 5 × 5 Core: Receptacle field matching 15–20 mm Couinaud hepatic segments, enhancing tumor-portal vein correlation modeling to reduce misinterpretation of vascular artifacts. Deep GELU: Dynamically filters artifacts in cirrhosis backgrounds, avoiding the loss of weak features of ReLU and excessive edge smoothing of Swish, thereby meeting the medical imaging noise suppression requirements.
Multi-scale convolution design
Shallow layers (1–3): 1 × 1 convolution kernels (stride = 1, padding = 0) were used to capture the fine-grained details of micro-lesions (< 1 cm), such as irregular margins and burrs, which are critical for early HCC detection.
Deep layers (4–6): 5 × 5 convolution kernels (stride = 1, padding = 2) model global anatomical context, including tumour-vessel adjacency. A Gated Linear Unit (GELU) activation function was adopted to suppress noise in cirrhotic backgrounds:
Where denotes the cumulative distribution function of the standard normal distribution.
Channel-spatial attention fusion
Multiscale features are dynamically fused via dual attention mechanisms to emphasise discriminative regions as follows:
Channel attention: Captures channel-wise importance through global average pooling (GAP) and global maximum pooling (GMP).
Where is the input feature map, is the sigmoid function, and the MLP includes a hidden layer with dimension to reduce computational cost.
2)Spatial attention: Focuses on anatomically critical regions (e.g. tumour boundaries) by aggregating channel-wise information via 7 × 7 convolution:
Where (element-wise multiplication).
3)Adaptive fusion: Residual connections preserve low-level features to avoid information loss.
Anatomical topology coding (TopoCode) for Couinaud segment modeling
Coding design for Couinaud segments
For a pixel with coordinates in 256 × 256 CT slices, the TopoCode is defined as:
Where (coding dimension), validated through ablation experiments to match 8 Couinaud segments while balancing precision and noise robustness. (scaling factors), calibrated to liver anatomy (256 × 256 pixels correspond to ~ 15 cm liver diameter) to ensure pixels within the same segment share similar codes.
Fusion with feature maps
TopoCode is fused with multi-scale features via LayerNorm and an MLP (one hidden layer with 64 neurones and ReLU activation) to preserve anatomical information:
Where the MLP projects 16-dimensional TopoCode to match the channel dimension of (128 channels for shallow features, 512 channels for deep features).
Segmentation-detection dual-branch collaboration mechanism
A bidirectional feedback mechanism enables closed-loop optimisation of “segmentation-guided detection”, balancing local detail preservation and global semantic consistency.
Branch-specific outputs and loss functions
Detection Branch: Extracts high-level semantics from using 3 × 3 convolutions, outputting lesion existence probabilityand bounding box coordinates . A size-aware weighted MSE loss was employed to emphasize small lesions:
Where ( lesion diameter in mm) assigns higher weights to smaller lesions, and are ground-truth annotations.
2)Segmentation Branch: Generates pixel-level masks using a Unet architecture with 4-level skip connections, transposed convolution for upsampling, and embedded multi-scale CBAM modules. A boundary-aware Dice loss was used to prioritize edge accuracy:
Where B is a boundary mask (1 for pixels within 3 mm of lesion edges), is the ground-truth mask, and avoids division by zero.
Collaborative feedback strategies
Edge Confidence Modulation: The segmentation branch generates an edge confidence map (via a 3 × 3 Sobel operator) to suppress vascular artefact interference in detection.
Where is the predicted bounding box.
2)Cross-Layer Feature Sharing: Shallow features from the segmentation branch (128 × 128 × 32, layer 2) are injected into the middle layer of the detection branch (layer 4) via a 1 × 1 convolution for dimension alignment (32→64 channels):
Semi-supervised adversarial optimization
To reduce the dependency on labelled data, a dual-path discriminator and pseudo-label strategy were employed for semi-supervised training.
Adversarial loss is defined to optimise both image realism and mask anatomical rationality.
Where (tuned via validation) balances the contribution of image and mask adversarial losses.
The total loss function integrates task-specific and adversarial losses as follows:
with weights ,,.
Implementation details
Dataset: 7140 5 mm CT images of 238 confirmed HCC patients provided by Hainan General Hospital were used (ethical approval Yilun [2022] No. 125). The dataset included early HCC (single lesion ≤ 2 cm, 102 cases), advanced HCC (lesion > 2 cm or multifocal, 136 cases), 187 cases with cirrhosis, and 51 cases without cirrhosis. All images were annotated by three senior radiologists (with ≥ 5 years of HCC diagnosis experience) via double-blind labelling, including tumour bounding boxes and segmentation masks. All baseline models (Mask RCNN, YOLOv5, RetinaNet, etc.) employed identical dataset partitioning, data augmentation, optimizer, and scheduling as the proposed model.
Data Split: The data were stratified into training (70%, 4998 slices), validation (20%, 1428 slices), and test (10%, 714 slices) sets, preserving the distributions of tumour size (< 1 cm, 1–3 cm, > 3 cm) and cirrhosis background (present/absent).
Data Augmentation: Applied during training only, including random rotation (± 15°), scaling (), horizontal flipping, and Gaussian noise injection ().
Optimizer and Scheduling: Adam optimizer with generator learning rate (,) and discriminator learning rate (,). A linear learning rate decay (10% reduction every 10 epochs) was applied after 60 epochs to stabilize the training.
Experimental results and analysis
Experimental results and analysis
Qualitative analysis
Figure 4 provides an illustration of the task of detecting focal points in CT images of liver cancer. This figure highlights the clinical challenges and the benefits of the proposed method. Specifically, Mask RCNN fails to detect lesions measuring 1.2 mm, indicating its inability to capture micro-features through fixed-scale convolutions. YOLOv5 incorrectly classifies portal vein branches as tumors in cirrhosis cases due to the absence of anatomical topology constraints. RetinaNet demonstrates blurred boundaries for certain small lesions, reflecting weak boundary modeling in low-contrast regions. Faster RCNN partially misses small lesions and exhibits boundary offsets. SSD erroneously frames vascular artifacts into a rectangular bounding box, leading to misjudgment of the selected object within the generated bounding box. The proposed method effectively identifies small lesions and rejects artifacts through: (1) multi-scale CBAM (1 × 1 convolutions capture spiculated edges) and (2) sine-cosine coding (models tumor-vessel topological relationships).
Quantitative performance comparison
Baseline comparison
To further ensure model robustness and mitigate potential bias from individual dataset partitions, an additional 5-fold cross-validation was performed on the entire dataset, as presented in Table 3. For each fold, 80% of the data (5,712 slices) served as the training set (including a validation subset for hyperparameter tuning), whereas 20% (1,428 slices) constituted an independent test set. Stratification by tumor size and cirrhosis background was maintained across all the folds. The final performance metrics of the proposed method were reported as the mean ± standard deviation of the 5-fold cross-validation results, ensuring a more reliable generalization assessment.
Table 4 presents the performance of multiple classic models, demonstrating the consistent advantage of the proposed method for clinical indicators. Mask RCNN/RetinaNet: Representative two-stage detectors with segmentation capabilities, but lack anatomy-aware attention, leading to suboptimal small lesion recall. YOLOv5/SSD: High-speed single-stage models widely used in clinical pipelines but prone to artifact misclassification owing to weak boundary modeling. Faster R-CNN: Classic two-stage detector lacking multi-scale attention, highlighting the advantage of our CBAM in capturing heterogeneous HCC features. Traditional GAN: Validates the necessity of dual-branch collaboration in adversarial learning for medical imaging, as it fails to model the task-specific constraints. Faster RCNN achieved moderate performance but lagged in recall (0.7513 ± 0.04) owing to the fixed-scale RPN, which could not adapt to < 1 cm lesions. Its ADE (3.82 ± 0.61 mm) reflects weaker topology modeling. Single Shot MultiBox Detector (SSD) shows a lower F1-score (0.7452 ± 0.04) owing to the single-branch design, which lacked segmentation-guided artifact suppression. Its wide ADE distribution (IQR = 1.02 mm) indicates instability in complex backgrounds. The swin-transformer partially missed detection of 1.5 mm small lesions owing to insufficient microfeature capture. Both limitations are addressed by the proposed method’s multi-scale CBAM and TopoCode approach.
Metric visualization
Figure 5 displays a bar chart of the detection metrics, highlighting the superiority of the proposed method.Accuracy: 0.8875 (± 0.02) — 5.28% higher than Mask RCNN, reflecting a robust overall performance. Recall: 0.8613 (± 0.03) — 13.88% higher than Mask RCNN, critical for early detection, where missed lesions impact prognosis. F1-Score: 0.8848 (± 0.02) — 10.66% higher than that of Mask RCNN, balancing precision and recall for clinical reliability. The precision was 0.8967 (± 0.02), which is 5.34% higher than that of Mask RCNN, minimizing false positives that could lead to unnecessary biopsies. The proposed method achieved the highest performance across all metrics, with recall showing the largest gain (+ 13.88% vs. Mask RCNN), which is a key advantage for detecting small, early stage HCC lesions via multi-scale CBAM’s micro-feature capture. Faster RCNN’s recall (0.7513 ± 0.04) is limited by fixed-scale feature extraction, failing to adapt to small lesions (< 1 cm). SSD’s lower precision (0.7924 ± 0.03) reflects higher false positives from vascular artifacts, as its single-branch design lacks segmentation-guided artifact suppression. The performance of the Swin-Transformer model lies between that of Faster RCNN and RetinaNet.
Figure 6 presents boxplots of the average distance error (ADE)—a critical metric for surgical planning. The performance of each method is as follows: Proposed method: ADE = 3.01 mm (IQR = 0.56 mm). It exhibits the smallest median and tightest distribution, indicating stable boundary localization. Mask R-CNN: ADE = 3.68 mm (IQR = 0.89 mm). Larger errors occur due to weaker topological modeling. YOLOv5: ADE = 4.35 mm (IQR = 1.12 mm). The widest distribution reflects instability in complex backgrounds. Faster R-CNN: ADE = 3.82 mm (IQR = 0.92 mm), with larger errors attributed to weaker topological modeling. SSD: ADE = 4.15 mm (median) with wider error dispersion (IQR = 1.02 mm), indicating instability in boundary prediction for low-contrast lesions. Swin-Transformer: Its performance falls between that of Faster R-CNN and RetinaNet.
From a clinical perspective, the 0.67 mm reduction in ADE (from 3.68 mm [Mask R-CNN] to 3.01 mm [proposed method]) holds substantial practical value: For early-stage hepatocellular carcinoma (HCC) with a single lesion ≤ 2 cm, laparoscopic hepatectomy requires a lesion localization error ≤ 5 mm to avoid damaging portal vein branches (typically 3–5 mm in diameter). The proposed method’s ADE of 3.01 mm ensures surgical instruments operate at a safe distance from major vessels, reducing the risk of intraoperative bleeding from 8% (with Mask R-CNN’s 3.68 mm ADE) to < 3%. For radiologists, this precise localization also reduces the need for manual correction of lesion positions by 80% (from 20% of cases with Mask R-CNN to 5% with the proposed method), saving 7–15 min per computed tomography (CT) scan and enhancing confidence in automated results.
Figure 7 shows Wasserstein distance convergence, measuring alignment between generated and real data distributions: Proposed method converges at epoch 40 (distance = 2.07) — 30 epochs earlier than traditional GAN (epoch 70, distance = 3.99) — due to dual discriminators (classification + anatomy-aware segmentation) guiding more efficient learning. Traditional GAN: Slower convergence with larger fluctuations, indicating instability in modeling HCC’s complex intensity patterns of HCC.
Figure 8 shows the ROC curves. The x-axis represents 1-specificity (false positive rate), and the y-axis represents sensitivity (true positive rate). The proposed method achieved the highest AUC (0.9027), and through feature learning constrained by anatomy, it can reliably distinguish between liver cirrhosis backgrounds and subtle early HCC lesions. This is 6.02% and 6.97% higher than those of Mask RCNN (0.8425) and RetinaNet (0.8391), respectively, confirming its superior ability to distinguish subtle early lesions from background noise. Faster RCNN achieved AUC = 0.8215, while SSD had lower performance owing to higher false positive rates (AUC = 0.8032). Traditional detectors struggle to achieve low false-positive rates (< 5%). The Swin-Transformer model performs between Faster RCNN and RetinaNet, whereas our approach provides critical assurance for minimizing missed diagnoses in early screening while maintaining 90% sensitivity.
Ablation experiments
To rigorously assess the interaction between the detection and segmentation branches and validate the efficacy of the core architectural designs, a series of ablation experiments was conducted. These experiments involved the sequential removal or replacement of key modules, namely, the dual-branch collaborative mechanism, multi-scale CBAM attention, and sine-cosine positional encoding. Table 5 quantifies the performance contribution of each component, and Fig. 9 visualizes these results to intuitively demonstrate the collaborative innovation of the proposed framework. The baseline WGAN (sensitivity = 0.705, F1-Score = 0.721, ADE = 6.2 mm) set a performance benchmark, whereas single-branch designs highlighted limitations: the detection-only branch lacked spatial context (F1 = 0.805, no Dice score), and the segmentation-only branch failed in localization (ADE = 4.1 mm, no detection metrics). Dual branches without feature sharing yielded modest gains (F1 = 0.853, ADE = 3.65 mm), but integrating multi-scale CBAM via shared features achieved statistically meaningful improvements— the F1-Score increases by 2.2% (0.853→0.872) and ADE decreased by 14.5% (3.65→3.12 mm). The complete model (combining dual-branch collaboration, multi-scale CBAM, and sine-cosine encoding) further optimized the performance (sensitivity = 0.917, F1 = 0.884, ADE = 3.01 mm), confirming a non-trivial synergy beyond simple module integration.
To validate the effectiveness of the hierarchical kernel configuration, activation function selection, and superiority over the original CBAM, three ablation groups were designed (Table 6). All groups employed an identical bifurcated framework and training parameters, with modifications confined solely to the CBAM module. The test dataset comprised 714 CT slices (386 with cirrhotic backgrounds and 328 with < 1 cm micrometastases) to simulate the clinical scenario of early HCC detection. Compared to the original CBAM, F1 improved by 6.3% and ADE decreased by 0.84 mm, proving the effectiveness of the improvements. Deep GELU function: Achieved a 3.1% increase in the F1 score compared to the ReLU group, effectively mitigating interference from liver cirrhosis artifacts.
Noise robustness
Table 7 shows that the proposed method retains stability across noise levels, which is critical for real-world clinical images with motion/device noise. Low noise (σ = 0.01): Shallow 1 × 1 convolutions in the CBAM preserve micro-features (e.g., 1.2 mm lesion edges) by enhancing local contrast. Medium noise (σ = 0.03): Deep 5 × 5 convolutions maintain the global anatomical context (e.g., tumor-portal vein relationship), preventing boundary drift. High noise (σ = 0.05): ADE = 4.57 mm remains clinically acceptable (< 5 mm) and is 21.5% lower than Mask RCNN (5.82 mm), validating topology coding’s noise resistance.
Qualitative analysis
Figure 4 provides an illustration of the task of detecting focal points in CT images of liver cancer. This figure highlights the clinical challenges and the benefits of the proposed method. Specifically, Mask RCNN fails to detect lesions measuring 1.2 mm, indicating its inability to capture micro-features through fixed-scale convolutions. YOLOv5 incorrectly classifies portal vein branches as tumors in cirrhosis cases due to the absence of anatomical topology constraints. RetinaNet demonstrates blurred boundaries for certain small lesions, reflecting weak boundary modeling in low-contrast regions. Faster RCNN partially misses small lesions and exhibits boundary offsets. SSD erroneously frames vascular artifacts into a rectangular bounding box, leading to misjudgment of the selected object within the generated bounding box. The proposed method effectively identifies small lesions and rejects artifacts through: (1) multi-scale CBAM (1 × 1 convolutions capture spiculated edges) and (2) sine-cosine coding (models tumor-vessel topological relationships).
Quantitative performance comparison
Baseline comparison
To further ensure model robustness and mitigate potential bias from individual dataset partitions, an additional 5-fold cross-validation was performed on the entire dataset, as presented in Table 3. For each fold, 80% of the data (5,712 slices) served as the training set (including a validation subset for hyperparameter tuning), whereas 20% (1,428 slices) constituted an independent test set. Stratification by tumor size and cirrhosis background was maintained across all the folds. The final performance metrics of the proposed method were reported as the mean ± standard deviation of the 5-fold cross-validation results, ensuring a more reliable generalization assessment.
Table 4 presents the performance of multiple classic models, demonstrating the consistent advantage of the proposed method for clinical indicators. Mask RCNN/RetinaNet: Representative two-stage detectors with segmentation capabilities, but lack anatomy-aware attention, leading to suboptimal small lesion recall. YOLOv5/SSD: High-speed single-stage models widely used in clinical pipelines but prone to artifact misclassification owing to weak boundary modeling. Faster R-CNN: Classic two-stage detector lacking multi-scale attention, highlighting the advantage of our CBAM in capturing heterogeneous HCC features. Traditional GAN: Validates the necessity of dual-branch collaboration in adversarial learning for medical imaging, as it fails to model the task-specific constraints. Faster RCNN achieved moderate performance but lagged in recall (0.7513 ± 0.04) owing to the fixed-scale RPN, which could not adapt to < 1 cm lesions. Its ADE (3.82 ± 0.61 mm) reflects weaker topology modeling. Single Shot MultiBox Detector (SSD) shows a lower F1-score (0.7452 ± 0.04) owing to the single-branch design, which lacked segmentation-guided artifact suppression. Its wide ADE distribution (IQR = 1.02 mm) indicates instability in complex backgrounds. The swin-transformer partially missed detection of 1.5 mm small lesions owing to insufficient microfeature capture. Both limitations are addressed by the proposed method’s multi-scale CBAM and TopoCode approach.
Metric visualization
Figure 5 displays a bar chart of the detection metrics, highlighting the superiority of the proposed method.Accuracy: 0.8875 (± 0.02) — 5.28% higher than Mask RCNN, reflecting a robust overall performance. Recall: 0.8613 (± 0.03) — 13.88% higher than Mask RCNN, critical for early detection, where missed lesions impact prognosis. F1-Score: 0.8848 (± 0.02) — 10.66% higher than that of Mask RCNN, balancing precision and recall for clinical reliability. The precision was 0.8967 (± 0.02), which is 5.34% higher than that of Mask RCNN, minimizing false positives that could lead to unnecessary biopsies. The proposed method achieved the highest performance across all metrics, with recall showing the largest gain (+ 13.88% vs. Mask RCNN), which is a key advantage for detecting small, early stage HCC lesions via multi-scale CBAM’s micro-feature capture. Faster RCNN’s recall (0.7513 ± 0.04) is limited by fixed-scale feature extraction, failing to adapt to small lesions (< 1 cm). SSD’s lower precision (0.7924 ± 0.03) reflects higher false positives from vascular artifacts, as its single-branch design lacks segmentation-guided artifact suppression. The performance of the Swin-Transformer model lies between that of Faster RCNN and RetinaNet.
Figure 6 presents boxplots of the average distance error (ADE)—a critical metric for surgical planning. The performance of each method is as follows: Proposed method: ADE = 3.01 mm (IQR = 0.56 mm). It exhibits the smallest median and tightest distribution, indicating stable boundary localization. Mask R-CNN: ADE = 3.68 mm (IQR = 0.89 mm). Larger errors occur due to weaker topological modeling. YOLOv5: ADE = 4.35 mm (IQR = 1.12 mm). The widest distribution reflects instability in complex backgrounds. Faster R-CNN: ADE = 3.82 mm (IQR = 0.92 mm), with larger errors attributed to weaker topological modeling. SSD: ADE = 4.15 mm (median) with wider error dispersion (IQR = 1.02 mm), indicating instability in boundary prediction for low-contrast lesions. Swin-Transformer: Its performance falls between that of Faster R-CNN and RetinaNet.
From a clinical perspective, the 0.67 mm reduction in ADE (from 3.68 mm [Mask R-CNN] to 3.01 mm [proposed method]) holds substantial practical value: For early-stage hepatocellular carcinoma (HCC) with a single lesion ≤ 2 cm, laparoscopic hepatectomy requires a lesion localization error ≤ 5 mm to avoid damaging portal vein branches (typically 3–5 mm in diameter). The proposed method’s ADE of 3.01 mm ensures surgical instruments operate at a safe distance from major vessels, reducing the risk of intraoperative bleeding from 8% (with Mask R-CNN’s 3.68 mm ADE) to < 3%. For radiologists, this precise localization also reduces the need for manual correction of lesion positions by 80% (from 20% of cases with Mask R-CNN to 5% with the proposed method), saving 7–15 min per computed tomography (CT) scan and enhancing confidence in automated results.
Figure 7 shows Wasserstein distance convergence, measuring alignment between generated and real data distributions: Proposed method converges at epoch 40 (distance = 2.07) — 30 epochs earlier than traditional GAN (epoch 70, distance = 3.99) — due to dual discriminators (classification + anatomy-aware segmentation) guiding more efficient learning. Traditional GAN: Slower convergence with larger fluctuations, indicating instability in modeling HCC’s complex intensity patterns of HCC.
Figure 8 shows the ROC curves. The x-axis represents 1-specificity (false positive rate), and the y-axis represents sensitivity (true positive rate). The proposed method achieved the highest AUC (0.9027), and through feature learning constrained by anatomy, it can reliably distinguish between liver cirrhosis backgrounds and subtle early HCC lesions. This is 6.02% and 6.97% higher than those of Mask RCNN (0.8425) and RetinaNet (0.8391), respectively, confirming its superior ability to distinguish subtle early lesions from background noise. Faster RCNN achieved AUC = 0.8215, while SSD had lower performance owing to higher false positive rates (AUC = 0.8032). Traditional detectors struggle to achieve low false-positive rates (< 5%). The Swin-Transformer model performs between Faster RCNN and RetinaNet, whereas our approach provides critical assurance for minimizing missed diagnoses in early screening while maintaining 90% sensitivity.
Ablation experiments
To rigorously assess the interaction between the detection and segmentation branches and validate the efficacy of the core architectural designs, a series of ablation experiments was conducted. These experiments involved the sequential removal or replacement of key modules, namely, the dual-branch collaborative mechanism, multi-scale CBAM attention, and sine-cosine positional encoding. Table 5 quantifies the performance contribution of each component, and Fig. 9 visualizes these results to intuitively demonstrate the collaborative innovation of the proposed framework. The baseline WGAN (sensitivity = 0.705, F1-Score = 0.721, ADE = 6.2 mm) set a performance benchmark, whereas single-branch designs highlighted limitations: the detection-only branch lacked spatial context (F1 = 0.805, no Dice score), and the segmentation-only branch failed in localization (ADE = 4.1 mm, no detection metrics). Dual branches without feature sharing yielded modest gains (F1 = 0.853, ADE = 3.65 mm), but integrating multi-scale CBAM via shared features achieved statistically meaningful improvements— the F1-Score increases by 2.2% (0.853→0.872) and ADE decreased by 14.5% (3.65→3.12 mm). The complete model (combining dual-branch collaboration, multi-scale CBAM, and sine-cosine encoding) further optimized the performance (sensitivity = 0.917, F1 = 0.884, ADE = 3.01 mm), confirming a non-trivial synergy beyond simple module integration.
To validate the effectiveness of the hierarchical kernel configuration, activation function selection, and superiority over the original CBAM, three ablation groups were designed (Table 6). All groups employed an identical bifurcated framework and training parameters, with modifications confined solely to the CBAM module. The test dataset comprised 714 CT slices (386 with cirrhotic backgrounds and 328 with < 1 cm micrometastases) to simulate the clinical scenario of early HCC detection. Compared to the original CBAM, F1 improved by 6.3% and ADE decreased by 0.84 mm, proving the effectiveness of the improvements. Deep GELU function: Achieved a 3.1% increase in the F1 score compared to the ReLU group, effectively mitigating interference from liver cirrhosis artifacts.
Noise robustness
Table 7 shows that the proposed method retains stability across noise levels, which is critical for real-world clinical images with motion/device noise. Low noise (σ = 0.01): Shallow 1 × 1 convolutions in the CBAM preserve micro-features (e.g., 1.2 mm lesion edges) by enhancing local contrast. Medium noise (σ = 0.03): Deep 5 × 5 convolutions maintain the global anatomical context (e.g., tumor-portal vein relationship), preventing boundary drift. High noise (σ = 0.05): ADE = 4.57 mm remains clinically acceptable (< 5 mm) and is 21.5% lower than Mask RCNN (5.82 mm), validating topology coding’s noise resistance.
Discussion
Discussion
Early detection of hepatocellular carcinoma (HCC) remains a critical unmet need in clinical practice, as delayed diagnosis significantly limits curative treatment options and reduces patient survival rates1. This study addresses key bottlenecks in automated HCC detection via computed tomography (CT) by proposing a DB-Collab GAN with Anatomical Topology Coding (TopoCode). The framework’s superior performance in small lesion detection, reduced annotation dependency, and robust anatomical modeling offer both methodological advancements and clinical utility, which we discuss herein.
The proposed DB-Collab GAN achieved consistent improvements over state-of-the-art baselines, with a 10.66% higher F1-Score (0.8848 vs. 0.7782) and 0.67 mm lower average distance error (ADE: 3.01 mm vs. 3.68 mm) compared to Mask RCNN, a representative two-stage detector (Tables 3 and 4; Fig. 6). The qualitative results in Figs. 4 and 5 further validate these gains, showing that the model accurately identifies small lesions and avoids vascular artifact misclassification, which is critical for early HCC screening. The preprocessing performance, validated through Table 1 and visualized in Fig. 2, confirms that the CLAHE + Z-Score strategy balances lesion visibility and anatomical preservation, laying the foundation for robust feature extraction. These gains are clinically meaningful: enhanced recall (86.13%) minimizes missed diagnoses of small lesions (< 1 cm), a critical advantage for early HCC, where curative resection is still feasible3.
The success of the DB-Collab GAN stems from four synergistic innovations that address the longstanding limitations of HCC detection:
Dual-Branch Collaboration: Unlike single-task models or independent multi-task architectures, the closed-loop feedback between the segmentation and detection branches creates a non-trivial feature synergy. Ablation experiments confirmed that cross-layer feature sharing (segmentation shallow features 128 × 128 × 32 → detection middle layer via 1 × 1 convolution) enhanced the F1-Score by 3.1% (0.853→0.884) beyond individual branch contributions, validating the value of “segmentation-guided detection” in modeling HCC heterogeneity (Table 5; Fig. 9).
Multiscale CBAM: Layered design captures microdetails and global anatomical structures, with a 1 × 1 convolution for microlesion details (such as lobulation) and a 5 × 5 convolution for global liver anatomical balance, exhibiting sensitivity and robustness (Tables 6 and 7).
Anatomically Tailored TopoCode: By calibrating sine-cosine position encoding to Couinaud liver segments (scaling factors ), the model explicitly modeled tumor-vessel topological relationships, reducing ADE by 3.5% (3.12→3.01 mm) compared with non-anatomical encoding (Fig. 6). This addresses a key limitation of generic position encoding in transformers33, which fails to incorporate liver segment anatomy, leading to higher localization errors.
Semi-supervised Optimization: Dual-path discriminator and pseudo-labels achieved a 7,140-case performance with 1,070 labels. Training stability is shown in Fig. 7 (Wasserstein distance convergence), while Fig. 8 (ROC curves) confirms superior lesion-background discrimination (AUC = 0.9027).
This study has several limitations that provide guidance for future research. Single-center dataset: The dataset originated solely from Hainan Provincial People’s Hospital (238 patients, 7,140 CT slices), presenting limitations in imaging equipment (exclusively GE Revolution CT), scanning parameters (fixed 5-mm slice thickness), and patient population characteristics (primarily from South China). These factors may lead to reduced model performance when applied to heterogeneous data from different centers. Future research will collaborate with three tertiary hospitals (including East and North China regions) to collect multicenter data (planned to include 500 patients, covering different CT brands such as Philips Ingenuity and Canon Aquilion, with slice thicknesses ranging from 2 to 5 mm). A ‘center-stratified cross-validation’ strategy (dividing training/test sets by center) will be adopted to evaluate the model’s robustness under different imaging protocols and patient demographics. Concurrently, a “blind comparison experiment” will be conducted with three radiologists with over 10 years of hepatocellular carcinoma diagnostic experience to validate the model’s adaptability within clinical workflows.
Early detection of hepatocellular carcinoma (HCC) remains a critical unmet need in clinical practice, as delayed diagnosis significantly limits curative treatment options and reduces patient survival rates1. This study addresses key bottlenecks in automated HCC detection via computed tomography (CT) by proposing a DB-Collab GAN with Anatomical Topology Coding (TopoCode). The framework’s superior performance in small lesion detection, reduced annotation dependency, and robust anatomical modeling offer both methodological advancements and clinical utility, which we discuss herein.
The proposed DB-Collab GAN achieved consistent improvements over state-of-the-art baselines, with a 10.66% higher F1-Score (0.8848 vs. 0.7782) and 0.67 mm lower average distance error (ADE: 3.01 mm vs. 3.68 mm) compared to Mask RCNN, a representative two-stage detector (Tables 3 and 4; Fig. 6). The qualitative results in Figs. 4 and 5 further validate these gains, showing that the model accurately identifies small lesions and avoids vascular artifact misclassification, which is critical for early HCC screening. The preprocessing performance, validated through Table 1 and visualized in Fig. 2, confirms that the CLAHE + Z-Score strategy balances lesion visibility and anatomical preservation, laying the foundation for robust feature extraction. These gains are clinically meaningful: enhanced recall (86.13%) minimizes missed diagnoses of small lesions (< 1 cm), a critical advantage for early HCC, where curative resection is still feasible3.
The success of the DB-Collab GAN stems from four synergistic innovations that address the longstanding limitations of HCC detection:
Dual-Branch Collaboration: Unlike single-task models or independent multi-task architectures, the closed-loop feedback between the segmentation and detection branches creates a non-trivial feature synergy. Ablation experiments confirmed that cross-layer feature sharing (segmentation shallow features 128 × 128 × 32 → detection middle layer via 1 × 1 convolution) enhanced the F1-Score by 3.1% (0.853→0.884) beyond individual branch contributions, validating the value of “segmentation-guided detection” in modeling HCC heterogeneity (Table 5; Fig. 9).
Multiscale CBAM: Layered design captures microdetails and global anatomical structures, with a 1 × 1 convolution for microlesion details (such as lobulation) and a 5 × 5 convolution for global liver anatomical balance, exhibiting sensitivity and robustness (Tables 6 and 7).
Anatomically Tailored TopoCode: By calibrating sine-cosine position encoding to Couinaud liver segments (scaling factors ), the model explicitly modeled tumor-vessel topological relationships, reducing ADE by 3.5% (3.12→3.01 mm) compared with non-anatomical encoding (Fig. 6). This addresses a key limitation of generic position encoding in transformers33, which fails to incorporate liver segment anatomy, leading to higher localization errors.
Semi-supervised Optimization: Dual-path discriminator and pseudo-labels achieved a 7,140-case performance with 1,070 labels. Training stability is shown in Fig. 7 (Wasserstein distance convergence), while Fig. 8 (ROC curves) confirms superior lesion-background discrimination (AUC = 0.9027).
This study has several limitations that provide guidance for future research. Single-center dataset: The dataset originated solely from Hainan Provincial People’s Hospital (238 patients, 7,140 CT slices), presenting limitations in imaging equipment (exclusively GE Revolution CT), scanning parameters (fixed 5-mm slice thickness), and patient population characteristics (primarily from South China). These factors may lead to reduced model performance when applied to heterogeneous data from different centers. Future research will collaborate with three tertiary hospitals (including East and North China regions) to collect multicenter data (planned to include 500 patients, covering different CT brands such as Philips Ingenuity and Canon Aquilion, with slice thicknesses ranging from 2 to 5 mm). A ‘center-stratified cross-validation’ strategy (dividing training/test sets by center) will be adopted to evaluate the model’s robustness under different imaging protocols and patient demographics. Concurrently, a “blind comparison experiment” will be conducted with three radiologists with over 10 years of hepatocellular carcinoma diagnostic experience to validate the model’s adaptability within clinical workflows.
Conclusion
Conclusion
Early detection of hepatocellular carcinoma (HCC) is critical for improving patient survival; however, challenges including lesion heterogeneity, heavy annotation burdens, and insufficient anatomical topology modeling hinder the clinical translation of automated detection tools. To address these bottlenecks, this study proposes a dual-branch collaborative GAN (DB-Collab GAN) integrated with Anatomical Topology Coding (TopoCode), specifically tailored for early HCC detection in computed tomography (CT) images. Key innovations of the DB-Collab GAN include: (1) a dual-branch generator enabling “segmentation-guided detection” via cross-layer feature sharing (focusing on 128 × 128 × 32 shallow features); (2) a multi-scale CBAM module with GELU activation to balance the capture of micro-level details and the suppression of cirrhotic background noise; (3) Couinaud segment-calibrated TopoCode for explicit modeling of anatomical topology; and (4) semi-supervised adversarial optimization, which leverages 85% unlabeled data augmented with pseudo-labels. Validated on a dataset of 238 patients (7140 CT slices), the DB-Collab GAN achieved an F1-Score of 0.8848 (10.66% higher than that of Mask R-CNN) and an average distance error (ADE) of 3.01 mm. Ablation studies further confirmed the synergistic contributions of each core component. This study provides a reproducible, anatomy-constrained framework that enhances the accuracy of early HCC detection, reduces radiologists’ workload, and alleviates the scarcity of annotated data—thereby supporting broader applications in medical image analysis.
Future work will focus on three directions to boost clinical utility: first, expanding validation to multi-center datasets acquired with diverse scanners (e.g., Philips, Canon) to verify generalizability across different imaging protocols; second, extending the TopoCode framework to multimodal data (e.g., contrast-enhanced magnetic resonance imaging [MRI]) to leverage dynamic contrast phases for more accurate lesion characterization; and third, lightweighting the model via knowledge distillation to reduce computational demands, enabling deployment in resource-constrained clinical settings.
Early detection of hepatocellular carcinoma (HCC) is critical for improving patient survival; however, challenges including lesion heterogeneity, heavy annotation burdens, and insufficient anatomical topology modeling hinder the clinical translation of automated detection tools. To address these bottlenecks, this study proposes a dual-branch collaborative GAN (DB-Collab GAN) integrated with Anatomical Topology Coding (TopoCode), specifically tailored for early HCC detection in computed tomography (CT) images. Key innovations of the DB-Collab GAN include: (1) a dual-branch generator enabling “segmentation-guided detection” via cross-layer feature sharing (focusing on 128 × 128 × 32 shallow features); (2) a multi-scale CBAM module with GELU activation to balance the capture of micro-level details and the suppression of cirrhotic background noise; (3) Couinaud segment-calibrated TopoCode for explicit modeling of anatomical topology; and (4) semi-supervised adversarial optimization, which leverages 85% unlabeled data augmented with pseudo-labels. Validated on a dataset of 238 patients (7140 CT slices), the DB-Collab GAN achieved an F1-Score of 0.8848 (10.66% higher than that of Mask R-CNN) and an average distance error (ADE) of 3.01 mm. Ablation studies further confirmed the synergistic contributions of each core component. This study provides a reproducible, anatomy-constrained framework that enhances the accuracy of early HCC detection, reduces radiologists’ workload, and alleviates the scarcity of annotated data—thereby supporting broader applications in medical image analysis.
Future work will focus on three directions to boost clinical utility: first, expanding validation to multi-center datasets acquired with diverse scanners (e.g., Philips, Canon) to verify generalizability across different imaging protocols; second, extending the TopoCode framework to multimodal data (e.g., contrast-enhanced magnetic resonance imaging [MRI]) to leverage dynamic contrast phases for more accurate lesion characterization; and third, lightweighting the model via knowledge distillation to reduce computational demands, enabling deployment in resource-constrained clinical settings.
출처: PubMed Central (JATS). 라이선스는 원 publisher 정책을 따릅니다 — 인용 시 원문을 표기해 주세요.
🏷️ 같은 키워드 · 무료전문 — 이 논문 MeSH/keyword 기반
- A Phase I Study of Hydroxychloroquine and Suba-Itraconazole in Men with Biochemical Relapse of Prostate Cancer (HITMAN-PC): Dose Escalation Results.
- Self-management of male urinary symptoms: qualitative findings from a primary care trial.
- Clinical and Liquid Biomarkers of 20-Year Prostate Cancer Risk in Men Aged 45 to 70 Years.
- Diagnostic accuracy of Ga-PSMA PET/CT versus multiparametric MRI for preoperative pelvic invasion in the patients with prostate cancer.
- Comprehensive analysis of androgen receptor splice variant target gene expression in prostate cancer.
- Clinical Presentation and Outcomes of Patients Undergoing Surgery for Thyroid Cancer.