Interpretable hybrid ensemble with attention-based fusion and EAOO-GA optimization for lung cancer detection.
1/5 보강
Lung cancer’s high mortality rate underscores the critical need for early and accurate diagnosis, as late-stage diagnoses often lead to 5-year survival rates as low as 5% compared to 56% for early det
APA
Al Duhayyim M, Aldawsari MA, et al. (2026). Interpretable hybrid ensemble with attention-based fusion and EAOO-GA optimization for lung cancer detection.. Scientific reports, 16(1). https://doi.org/10.1038/s41598-026-37187-6
MLA
Al Duhayyim M, et al.. "Interpretable hybrid ensemble with attention-based fusion and EAOO-GA optimization for lung cancer detection.." Scientific reports, vol. 16, no. 1, 2026.
PMID
41776195 ↗
Abstract 한글 요약
Lung cancer’s high mortality rate underscores the critical need for early and accurate diagnosis, as late-stage diagnoses often lead to 5-year survival rates as low as 5% compared to 56% for early detection, imposing significant economic burdens on healthcare systems and diminishing patient quality of life. While deep learning models offer promising tools for analyzing Computed Tomography (CT) scans, they often suffer from limitations in generalizability, interpretability, and sensitivity to imbalanced data. This paper introduces SE-FusionEAOO Ensemble, a new robust framework for lung cancer classification. Our approach leverages the strengths of multiple deep learning architectures through a sophisticated two-stage process. First, we construct three powerful feature fusion models by strategically pairing diverse pre-trained networks (DenseNet201/EfficientNetB6, Inception v3/MobileNetV2, DenseNet121/ResNet50), each integrated with Squeeze-and-Excitation (SE) blocks for adaptive feature recalibration. Second, we amalgamate the predictions of these expert models using an intelligently weighted aggregation scheme. The key innovation of our framework is the deployment of a new metaheuristic, the Enhanced Animated Oat Optimization algorithm with Genetic Operators (EAOO-GA), to precisely optimize these ensemble weights, ensuring optimal contribution from each model. To address class imbalance in the IQ-OTH/NCCD lung cancer dataset, we employ the Synthetic Minority Over-sampling Technique (SMOTE), significantly improving the model’s sensitivity to minority classes. Extensive experimental results demonstrate that our framework achieves a state-of-the-art accuracy of 99.40%, with 99.2% precision, 99.5% recall, and 99.3% F1-score, outperforming individual models, conventional ensemble methods, and other metaheuristic optimizers. Additionally, the model was externally validated on the LIDC-IDRI dataset, achieving 97.9% accuracy and 97.8% F1-score, confirming its strong generalization capability across independent clinical domains. The proposed framework provides a highly accurate, reliable, and interpretable tool for automated lung cancer detection.
🏷️ 키워드 / MeSH 📖 같은 키워드 OA만
📖 전문 본문 읽기 PMC JATS · ~240 KB · 영문
Introduction
Introduction
Lung cancer remains one of the most formidable and pervasive oncological challenges worldwide, constituting a leading cause of cancer-related mortality and presenting a critical public health burden. Its significance is underscored not only by its high incidence and mortality rates but also by its profound impact on patient quality of life and healthcare infrastructure. A primary obstacle in mitigating this disease is the prevalence of late-stage diagnosis1. Early-stage lung cancer is often asymptomatic or presents with non-specific symptoms, leading to a substantial proportion of cases being detected only at advanced stages. This diagnostic delay severely constrains treatment efficacy and adversely affects survival outcomes. For instance, while the 5-year survival rate for early-stage detection can be as high as 56%, it plummets to approximately 5% for advanced-stage diagnoses2.
The ramifications of lung cancer extend beyond survival statistics, imposing a considerable economic burden on healthcare systems and exacting a heavy physical and emotional toll on patients and their families. Treatment modalities, including surgery, chemotherapy, and radiation therapy, are not only costly but also associated with significant morbidity, further diminishing quality of life3. Consequently, there is an urgent and pressing need for innovative diagnostic strategies that enable earlier, more accurate, and cost-effective detection of lung cancer.
From a clinical perspective, lung cancer is broadly categorized into two main histological types: small cell lung cancer (SCLC) and non-small cell lung cancer (NSCLC)4. This distinction is crucial for patient management, as each type requires a unique treatment strategy and prognostic assessment. Traditionally, the classification of lung cancer has relied on conventional wet-lab methods and the visual interpretation of medical imaging data, such as computed tomography (CT) scans. Radiologists assess features like tumor size, shape, texture, and location to determine the type and stage of cancer5. However, this process is inherently subjective, leading to significant inter-observer variability and potential diagnostic inconsistencies.
The integration of artificial intelligence (AI) and deep learning (DL) into radiology has emerged as a transformative approach to overcome these limitations. AI-driven computational methods offer a powerful solution for automating the analysis of medical images, enabling more objective, precise, and efficient lung cancer classification6. These automated systems are particularly vital for the early detection and characterization of pulmonary nodules, distinguishing between benign and malignant manifestations, which is a cornerstone for effective treatment planning and significantly improved patient outcomes7. Among these, Convolutional Neural Networks (CNNs) have demonstrated remarkable success, particularly in the domain of image processing and medical image analysis, establishing themselves as the cornerstone of modern computational radiology8. This paradigm shift is fundamentally transforming the field of medical image analysis. By leveraging large-scale datasets and sophisticated architectures, DL techniques can extract intricate features from medical images that often elude human perception. This capability has catalyzed substantial improvements in diagnostic accuracy, significantly accelerated analysis times, and reduced the potential for human error. Consequently, AI-based approaches now hold significant potential for the early diagnosis and classification of complex diseases, with lung cancer standing as a primary beneficiary9. The automated, precise analysis of medical imaging modalities, such as CT, Magnetic Resonance Imaging (MRI), and histopathology, is pivotal for expediting the diagnostic process and improving patient outcomes10.
Consequently, DL is positioned not merely as an assistive tool but as a foundational technology reshaping diagnostic paradigms. The success of CNNs extends beyond lung cancer, with significant achievements across diverse medical imaging modalities, including MRI, histopathology, and mammography. Their integration into Computer-Aided Diagnosis (CAD) systems has proven effective for various conditions, leading to more robust tools that assist clinicians in interpreting complex medical images. A pivotal technique enabling this progress, especially given the challenges of limited annotated medical data, is Transfer Learning (TL)11. TL mitigates data scarcity and intricate labeling requirements by adapting CNNs pre-trained on large-scale datasets, such as ImageNet, to specialized medical contexts. This process of knowledge transfer from general to specific domains has been instrumental in enhancing model performance and facilitating the development of accurate, efficient, and scalable diagnostic tools for various types of cancer. By leveraging CNNs pre-trained on large-scale natural image datasets such as ImageNet, transfer learning effectively bridges the domain gap and significantly enhances the performance of models designed for lung cancer detection and classification. This technique facilitates the transfer of generalized feature extraction capabilities, learned from millions of natural images, to the highly specialized domain of thoracic CT analysis. This knowledge transfer is crucial for developing accurate and efficient diagnostic tools, as it mitigates the data scarcity problem inherent in medical imaging and allows models to achieve robust performance even with limited annotated lung CT scans.
Metaheuristic algorithms offer robust and flexible optimization frameworks that can navigate complex search spaces, where traditional methods often falter. Their efficacy has been demonstrated across a vast array of disciplines, including medical image analysis12–14 and traffic signal control15. To ensure the scientific rigor and performance of our proposed framework, we employ a new proposed metaheuristic, the Enhanced Animated Oat Optimization algorithm with Genetic Operators (EAOO-GA), for critical optimization tasks. This advanced algorithm is utilized to fine-tune model parameters and, most importantly, to derive the optimal weighting scheme for our ensemble fusion, thereby maximizing diagnostic accuracy. Our framework’s performance is rigorously evaluated on a benchmark lung cancer dataset and compared against state-of-the-art methods, demonstrating its significant potential to advance the field of automated medical diagnostics.
In this paper, we introduce an innovative hybrid ensemble framework, the EAOO-Optimized Ensemble, specifically implemented for the accurate and robust detection of lung cancer from CT scans. Moving beyond conventional ensembles that aggregate single models, we leverage the superior representational power of feature fusion architectures. We meticulously designed three distinct fusion pairs: (1) DenseNet201 + EfficientNetB6, (2) Inception v3 + MobileNetV2, and (3) DenseNet121 + ResNet50, selected from a rigorous evaluation of eight top-performing pre-trained models to ensure maximum architectural diversity and complementary feature extraction. Each fusion is further enhanced with Squeeze-and-Excitation (SE) blocks to adaptively recalibrate features and emphasize the most discriminative patterns indicative of malignancy. The key innovation is a cost-effective method that utilizes heat from the breast surface to detect the Enhanced Animated Oat Optimization algorithm with Genetic Operators (EAOO-GA), which intelligently aggregates predictions from these fusion models. This algorithm performs a global search to fine-tune and derive the optimal weighting scheme for the ensemble, ensuring that the most accurate and informative models contribute most significantly to the final diagnosis. This two-stage approach constructs powerful SE, enhanced fusion base learners, and then optimizes their aggregation with EAOO-GA, ensuring superior performance, enhanced robustness, and reliable diagnostic capabilities, directly addressing the critical need for accuracy in medical applications where errors can have severe consequences.
To ensure a rigorous and comprehensive evaluation, the proposed hybrid model is validated using two public benchmark datasets: the Chest CT-Scan dataset and the IQ-OTH/NCCD lung cancer dataset. These datasets provide comprehensive annotations of both cancerous and non-cancerous nodules, enabling a robust assessment of the model’s diagnostic capabilities. A crucial pre-processing pipeline was employed to enhance image quality, standardize inputs, and augment the data, thereby ensuring robust detection accuracy and improving model generalization. Furthermore, to directly combat the pervasive issue of class imbalance common in medical data, the Synthetic Minority Over-sampling Technique (SMOTE) was applied. This critical step ensures the model is not biased toward the majority class and enhances its sensitivity towards detecting less frequent but critical malignant conditions. Furthermore, to visually interpret the decision-making process of our ensemble framework and understand how it focuses on identifying malignant regions within lung CT scans, we employ Gradient-weighted Class Activation Mapping (Grad-CAM) to generate insightful heatmaps. These heatmaps highlight the critical image regions that most significantly influenced the model’s prediction, providing a layer of transparency.
Problem statement
The application of deep learning to lung cancer detection, while promising, is fundamentally constrained by three persistent challenges: (1) limited model generalizability due to overfitting on training data specifics, (2) high sensitivity to data variability across imaging modalities and acquisition protocols, and (3) a lack of model interpretability, which hinders clinical trust and adoption. While ensemble methods have been employed to improve robustness, they often introduce complexity that exacerbates overfitting and remain a black box. Consequently, there is a critical need for a sophisticated framework that not only enhances accuracy and early detection performance but also explicitly addresses these limitations of generalization, variability, and interpretability through a principled and optimized approach. The proposed SE-FusionEAOO Ensemble directly tackles these issues by: (i) fusing diverse pre-trained models with Squeeze-and-Excitation (SE) blocks for adaptive feature recalibration and resilience to variability, (ii) employing the Enhanced Animated Oat Optimization algorithm with Genetic Operators (EAOO-GA) for precise ensemble weighting to mitigate overfitting and enhance generalization, and (iii) incorporating SMOTE for improved sensitivity to imbalanced classes alongside Grad-CAM for visual interpretability.
Research motivation
While deep learning has achieved state-of-the-art performance in lung cancer classification, a nuanced yet critical challenge often remains unaddressed: memory overfitting. This phenomenon occurs when powerful models, particularly complex ensembles, merely memorize the statistical noise and specific details of the training data, rather than learning to generalize the underlying diagnostic features. Common regularization techniques, such as dropout and data augmentation, offer partial solutions but are often insufficient to mitigate this subtle form of overfitting, particularly in high-dimensional medical imaging data, as seen in prior CNN and ensemble approaches (e.g., high complexity with limited cross-dataset reliability in16,17). Additionally, many methods exhibit sensitivity to data variability (e.g., differences in CT acquisition protocols) and class imbalance prevalent in datasets like IQ-OTH/NCCD, leading to biased predictions and reduced sensitivity for malignant cases6. Interpretability remains a significant gap, with most ensembles functioning as “black boxes” despite their strong performance18. This gap highlights the urgent need for innovative architectural designs and optimization strategies specifically tailored to counteract memory overfitting, enhance feature robustness, and ultimately ensure that models perform reliably on unseen clinical data, where diagnostic accuracy is crucial. Our framework addresses these challenges by integrating attention-based fusion (SE blocks) for focused, recalibrated features; metaheuristic-driven weighting (EAOO-GA) for optimal aggregation beyond uniform ensembles; SMOTE for balanced training; and Grad-CAM for visual explanations, thereby achieving superior generalization, robustness, and transparency compared to baselines.
Contribution
The following is a summary of this work’s main contributions, which advance beyond existing methods by incorporating attention-based fusion and metaheuristic optimization for enhanced interpretability and generalization:Development of the SE-FusionEAOO Ensemble framework: A novel hybrid architecture that integrates SE blocks into strategic feature fusion pairs of diverse pre-trained networks (DenseNet201/EfficientNetB6, Inception v3/MobileNetV2, DenseNet121/ResNet50). This design enhances feature representation, improves resilience to data variability across imaging protocols, and emphasizes discriminative patterns critical for malignancy detection, outperforming single-model and conventional fusion approaches by adaptive channel-wise recalibration.
Introduction of an advanced metaheuristic-driven aggregation strategy: Utilization of the novel Enhanced Animated Oat Optimization algorithm with Genetic Operators (EAOO-GA) to optimally determine ensemble weights. This global optimization extends beyond standard methods, precisely balancing contributions from each fusion model to significantly reduce overfitting and boost generalization on unseen data.
An inherently interpretable diagnostic framework that provides transparency through two mechanisms: SE-based channel-wise feature recalibration and EAOO-optimized weight assignment.
Comprehensive empirical validation on the IQ-OTH/NCCD lung cancer dataset demonstrating state-of-the-art performance. The proposed framework achieves superior accuracy (99.40%), robustness to data variability, and reduced overfitting compared to individual models, conventional ensemble methods, and other metaheuristic optimization approaches.
Paper organization
The organization of this paper is as follows: Section 2 offers a brief review of recent studies related to Lung cancer detection and classification. Section 3 provides an overview of the dataset description, architectural components, fusion model architecture, the proposed enhanced optimization algorithm, and the proposed EAOO-GA-Optimized Ensemble framework for lung cancer classification. Moreover, Experimental findings and corresponding analyses are presented in Section 4. Section 5 presented the Advantages, Limitations, and Future Research Directions. Finally, Section 6 outlines the concluding remarks and potential directions for future research.
Lung cancer remains one of the most formidable and pervasive oncological challenges worldwide, constituting a leading cause of cancer-related mortality and presenting a critical public health burden. Its significance is underscored not only by its high incidence and mortality rates but also by its profound impact on patient quality of life and healthcare infrastructure. A primary obstacle in mitigating this disease is the prevalence of late-stage diagnosis1. Early-stage lung cancer is often asymptomatic or presents with non-specific symptoms, leading to a substantial proportion of cases being detected only at advanced stages. This diagnostic delay severely constrains treatment efficacy and adversely affects survival outcomes. For instance, while the 5-year survival rate for early-stage detection can be as high as 56%, it plummets to approximately 5% for advanced-stage diagnoses2.
The ramifications of lung cancer extend beyond survival statistics, imposing a considerable economic burden on healthcare systems and exacting a heavy physical and emotional toll on patients and their families. Treatment modalities, including surgery, chemotherapy, and radiation therapy, are not only costly but also associated with significant morbidity, further diminishing quality of life3. Consequently, there is an urgent and pressing need for innovative diagnostic strategies that enable earlier, more accurate, and cost-effective detection of lung cancer.
From a clinical perspective, lung cancer is broadly categorized into two main histological types: small cell lung cancer (SCLC) and non-small cell lung cancer (NSCLC)4. This distinction is crucial for patient management, as each type requires a unique treatment strategy and prognostic assessment. Traditionally, the classification of lung cancer has relied on conventional wet-lab methods and the visual interpretation of medical imaging data, such as computed tomography (CT) scans. Radiologists assess features like tumor size, shape, texture, and location to determine the type and stage of cancer5. However, this process is inherently subjective, leading to significant inter-observer variability and potential diagnostic inconsistencies.
The integration of artificial intelligence (AI) and deep learning (DL) into radiology has emerged as a transformative approach to overcome these limitations. AI-driven computational methods offer a powerful solution for automating the analysis of medical images, enabling more objective, precise, and efficient lung cancer classification6. These automated systems are particularly vital for the early detection and characterization of pulmonary nodules, distinguishing between benign and malignant manifestations, which is a cornerstone for effective treatment planning and significantly improved patient outcomes7. Among these, Convolutional Neural Networks (CNNs) have demonstrated remarkable success, particularly in the domain of image processing and medical image analysis, establishing themselves as the cornerstone of modern computational radiology8. This paradigm shift is fundamentally transforming the field of medical image analysis. By leveraging large-scale datasets and sophisticated architectures, DL techniques can extract intricate features from medical images that often elude human perception. This capability has catalyzed substantial improvements in diagnostic accuracy, significantly accelerated analysis times, and reduced the potential for human error. Consequently, AI-based approaches now hold significant potential for the early diagnosis and classification of complex diseases, with lung cancer standing as a primary beneficiary9. The automated, precise analysis of medical imaging modalities, such as CT, Magnetic Resonance Imaging (MRI), and histopathology, is pivotal for expediting the diagnostic process and improving patient outcomes10.
Consequently, DL is positioned not merely as an assistive tool but as a foundational technology reshaping diagnostic paradigms. The success of CNNs extends beyond lung cancer, with significant achievements across diverse medical imaging modalities, including MRI, histopathology, and mammography. Their integration into Computer-Aided Diagnosis (CAD) systems has proven effective for various conditions, leading to more robust tools that assist clinicians in interpreting complex medical images. A pivotal technique enabling this progress, especially given the challenges of limited annotated medical data, is Transfer Learning (TL)11. TL mitigates data scarcity and intricate labeling requirements by adapting CNNs pre-trained on large-scale datasets, such as ImageNet, to specialized medical contexts. This process of knowledge transfer from general to specific domains has been instrumental in enhancing model performance and facilitating the development of accurate, efficient, and scalable diagnostic tools for various types of cancer. By leveraging CNNs pre-trained on large-scale natural image datasets such as ImageNet, transfer learning effectively bridges the domain gap and significantly enhances the performance of models designed for lung cancer detection and classification. This technique facilitates the transfer of generalized feature extraction capabilities, learned from millions of natural images, to the highly specialized domain of thoracic CT analysis. This knowledge transfer is crucial for developing accurate and efficient diagnostic tools, as it mitigates the data scarcity problem inherent in medical imaging and allows models to achieve robust performance even with limited annotated lung CT scans.
Metaheuristic algorithms offer robust and flexible optimization frameworks that can navigate complex search spaces, where traditional methods often falter. Their efficacy has been demonstrated across a vast array of disciplines, including medical image analysis12–14 and traffic signal control15. To ensure the scientific rigor and performance of our proposed framework, we employ a new proposed metaheuristic, the Enhanced Animated Oat Optimization algorithm with Genetic Operators (EAOO-GA), for critical optimization tasks. This advanced algorithm is utilized to fine-tune model parameters and, most importantly, to derive the optimal weighting scheme for our ensemble fusion, thereby maximizing diagnostic accuracy. Our framework’s performance is rigorously evaluated on a benchmark lung cancer dataset and compared against state-of-the-art methods, demonstrating its significant potential to advance the field of automated medical diagnostics.
In this paper, we introduce an innovative hybrid ensemble framework, the EAOO-Optimized Ensemble, specifically implemented for the accurate and robust detection of lung cancer from CT scans. Moving beyond conventional ensembles that aggregate single models, we leverage the superior representational power of feature fusion architectures. We meticulously designed three distinct fusion pairs: (1) DenseNet201 + EfficientNetB6, (2) Inception v3 + MobileNetV2, and (3) DenseNet121 + ResNet50, selected from a rigorous evaluation of eight top-performing pre-trained models to ensure maximum architectural diversity and complementary feature extraction. Each fusion is further enhanced with Squeeze-and-Excitation (SE) blocks to adaptively recalibrate features and emphasize the most discriminative patterns indicative of malignancy. The key innovation is a cost-effective method that utilizes heat from the breast surface to detect the Enhanced Animated Oat Optimization algorithm with Genetic Operators (EAOO-GA), which intelligently aggregates predictions from these fusion models. This algorithm performs a global search to fine-tune and derive the optimal weighting scheme for the ensemble, ensuring that the most accurate and informative models contribute most significantly to the final diagnosis. This two-stage approach constructs powerful SE, enhanced fusion base learners, and then optimizes their aggregation with EAOO-GA, ensuring superior performance, enhanced robustness, and reliable diagnostic capabilities, directly addressing the critical need for accuracy in medical applications where errors can have severe consequences.
To ensure a rigorous and comprehensive evaluation, the proposed hybrid model is validated using two public benchmark datasets: the Chest CT-Scan dataset and the IQ-OTH/NCCD lung cancer dataset. These datasets provide comprehensive annotations of both cancerous and non-cancerous nodules, enabling a robust assessment of the model’s diagnostic capabilities. A crucial pre-processing pipeline was employed to enhance image quality, standardize inputs, and augment the data, thereby ensuring robust detection accuracy and improving model generalization. Furthermore, to directly combat the pervasive issue of class imbalance common in medical data, the Synthetic Minority Over-sampling Technique (SMOTE) was applied. This critical step ensures the model is not biased toward the majority class and enhances its sensitivity towards detecting less frequent but critical malignant conditions. Furthermore, to visually interpret the decision-making process of our ensemble framework and understand how it focuses on identifying malignant regions within lung CT scans, we employ Gradient-weighted Class Activation Mapping (Grad-CAM) to generate insightful heatmaps. These heatmaps highlight the critical image regions that most significantly influenced the model’s prediction, providing a layer of transparency.
Problem statement
The application of deep learning to lung cancer detection, while promising, is fundamentally constrained by three persistent challenges: (1) limited model generalizability due to overfitting on training data specifics, (2) high sensitivity to data variability across imaging modalities and acquisition protocols, and (3) a lack of model interpretability, which hinders clinical trust and adoption. While ensemble methods have been employed to improve robustness, they often introduce complexity that exacerbates overfitting and remain a black box. Consequently, there is a critical need for a sophisticated framework that not only enhances accuracy and early detection performance but also explicitly addresses these limitations of generalization, variability, and interpretability through a principled and optimized approach. The proposed SE-FusionEAOO Ensemble directly tackles these issues by: (i) fusing diverse pre-trained models with Squeeze-and-Excitation (SE) blocks for adaptive feature recalibration and resilience to variability, (ii) employing the Enhanced Animated Oat Optimization algorithm with Genetic Operators (EAOO-GA) for precise ensemble weighting to mitigate overfitting and enhance generalization, and (iii) incorporating SMOTE for improved sensitivity to imbalanced classes alongside Grad-CAM for visual interpretability.
Research motivation
While deep learning has achieved state-of-the-art performance in lung cancer classification, a nuanced yet critical challenge often remains unaddressed: memory overfitting. This phenomenon occurs when powerful models, particularly complex ensembles, merely memorize the statistical noise and specific details of the training data, rather than learning to generalize the underlying diagnostic features. Common regularization techniques, such as dropout and data augmentation, offer partial solutions but are often insufficient to mitigate this subtle form of overfitting, particularly in high-dimensional medical imaging data, as seen in prior CNN and ensemble approaches (e.g., high complexity with limited cross-dataset reliability in16,17). Additionally, many methods exhibit sensitivity to data variability (e.g., differences in CT acquisition protocols) and class imbalance prevalent in datasets like IQ-OTH/NCCD, leading to biased predictions and reduced sensitivity for malignant cases6. Interpretability remains a significant gap, with most ensembles functioning as “black boxes” despite their strong performance18. This gap highlights the urgent need for innovative architectural designs and optimization strategies specifically tailored to counteract memory overfitting, enhance feature robustness, and ultimately ensure that models perform reliably on unseen clinical data, where diagnostic accuracy is crucial. Our framework addresses these challenges by integrating attention-based fusion (SE blocks) for focused, recalibrated features; metaheuristic-driven weighting (EAOO-GA) for optimal aggregation beyond uniform ensembles; SMOTE for balanced training; and Grad-CAM for visual explanations, thereby achieving superior generalization, robustness, and transparency compared to baselines.
Contribution
The following is a summary of this work’s main contributions, which advance beyond existing methods by incorporating attention-based fusion and metaheuristic optimization for enhanced interpretability and generalization:Development of the SE-FusionEAOO Ensemble framework: A novel hybrid architecture that integrates SE blocks into strategic feature fusion pairs of diverse pre-trained networks (DenseNet201/EfficientNetB6, Inception v3/MobileNetV2, DenseNet121/ResNet50). This design enhances feature representation, improves resilience to data variability across imaging protocols, and emphasizes discriminative patterns critical for malignancy detection, outperforming single-model and conventional fusion approaches by adaptive channel-wise recalibration.
Introduction of an advanced metaheuristic-driven aggregation strategy: Utilization of the novel Enhanced Animated Oat Optimization algorithm with Genetic Operators (EAOO-GA) to optimally determine ensemble weights. This global optimization extends beyond standard methods, precisely balancing contributions from each fusion model to significantly reduce overfitting and boost generalization on unseen data.
An inherently interpretable diagnostic framework that provides transparency through two mechanisms: SE-based channel-wise feature recalibration and EAOO-optimized weight assignment.
Comprehensive empirical validation on the IQ-OTH/NCCD lung cancer dataset demonstrating state-of-the-art performance. The proposed framework achieves superior accuracy (99.40%), robustness to data variability, and reduced overfitting compared to individual models, conventional ensemble methods, and other metaheuristic optimization approaches.
Paper organization
The organization of this paper is as follows: Section 2 offers a brief review of recent studies related to Lung cancer detection and classification. Section 3 provides an overview of the dataset description, architectural components, fusion model architecture, the proposed enhanced optimization algorithm, and the proposed EAOO-GA-Optimized Ensemble framework for lung cancer classification. Moreover, Experimental findings and corresponding analyses are presented in Section 4. Section 5 presented the Advantages, Limitations, and Future Research Directions. Finally, Section 6 outlines the concluding remarks and potential directions for future research.
Literature review
Literature review
Medical image analysis for pulmonary oncology has emerged as a critical research domain, driving innovation in computational diagnostics. This section surveys contemporary methodologies developed to augment the precision and robustness of automated lung cancer detection systems, critically examining their technical foundations, strengths, and inherent constraints. We organize the related studies into subsections focusing on deep learning models, ensemble and fusion methods, and metaheuristic optimization techniques, before discussing persistent limitations and gaps.
Deep learning models for lung cancer detection
Mohamed et al.19 proposed a hybrid model that combines CNNs with the Ebola Optimization Search Algorithm (EOSA) to enhance lung cancer classification in CT images. The approach aimed to optimize CNN weights and biases using EOSA, addressing challenges in CNN configuration. Evaluated on the IQ-OTH/NCCD dataset, the EOSA-CNN model achieved an accuracy of 93.21% and demonstrated superior performance compared to other metaheuristic-CNN methods, particularly in classifying normal and malignant cases. In addition, Eram et al.20 introduced an improved DenseNet201 model that combines transfer learning with explainable artificial intelligence to classify various lung diseases from X-ray images. Its performance was evaluated against other transfer learning models, including EfficientNetB0, InceptionV3, and LeNet, using standard evaluation metrics. Imran et al.21 proposed a transformer-based hierarchical model for non-small cell lung cancer (NSCLC) detection and classification from histopathological images. By integrating CNNs for local feature extraction and vision transformers (ViTs) for capturing long-range dependencies, the model classifies NSCLC into normal, adenocarcinoma, and squamous cell carcinoma categories. Evaluated on the LC25000 dataset, it achieved an accuracy of 98.8%, outperforming state-of-the-art methods in precision and recall. Sewatkar22 introduced the MeVs-deep CNN, an optimized deep learning model for efficient lung cancer classification using PET/CT images. The approach employs Memory-Enabled Vulture Search Optimization to segment and classify data after preprocessing with Non-Local Means filtering, demonstrating improved autonomy and accuracy in categorization.
Furthermore, Elkenawy et al.23 developed a hybrid framework combining the Greylag Goose Optimization (bGGO) algorithm with a multilayer perceptron for lung cancer classification. By enhancing feature selection and applying comprehensive preprocessing, their method outperformed several binary optimization algorithms and achieved 98.4% accuracy, with results validated through statistical tests and performance analysis. Zhang et al.1 introduced a DenseNet-based CNN framework enhanced with data fusion and mobile edge computing for lung cancer classification. The approach leverages preprocessing and multi-source data integration to improve accuracy, while edge computing enables real-time CT scan analysis. The model classifies lung tissue into Normal, Benign, or Malignant, with malignant cases further categorized into adenocarcinoma, squamous cell carcinoma, and large cell carcinoma. Shafi and Chinnappan24 presented a hybrid transformer-CNN and LSTM model for lung disease segmentation and classification. The approach includes preprocessing with median filtering, segmentation using an improved Transformer-based CNN (ITCNN), feature extraction (e.g., texture via modified LGIP), and classification with a hybrid LinkNet-Modified LSTM (L-MLSTM), outperforming existing models in accuracy. Lakshminarasimha et al.25 developed an optimization-driven deep learning approach for enhancing lung cancer diagnosis from CT images. The model integrates CBAM with EfficientNet for feature extraction and applies optimization algorithms like Gray Wolf Optimization (GWO) for hyperparameter tuning, achieving accuracies of 99.81% and 99.25% on the Lung-PET-CT-Dx and LIDC-IDRI datasets, respectively.
Ensemble and fusion methods
Moreover, Shariff et al.26 introduced a CNN framework enhanced with Differential Augmentation (DA) to mitigate memory overfitting and improve model generalization on unseen lung cancer data. The DA strategy involved applying targeted augmentations, such as hue, brightness, saturation, and contrast adjustments, to increase data diversity and robustness. The model was evaluated on multiple datasets, including IQ-OTH/NCCD, and optimized using Random Search for hyperparameter tuning. The CNN and DA model achieved a classification accuracy of 98.78%, surpassing several state-of-the-art architectures, including DenseNet, ResNet, EfficientNetB0, and ensemble-based approaches. Statistical validation, as confirmed by Tukey’s HSD post-hoc test, demonstrated the significance of the model’s superior performance, highlighting its effectiveness in overcoming overfitting and enhancing cross-dataset reliability.
Pamungkas et al.27 proposed a hybrid framework that combines VGG-based CNNs with GAN-driven data augmentation to enhance lung cancer classification. By generating realistic synthetic images for underrepresented classes, their method mitigates class imbalance and enhances model generalization. Validated on the IQ-OTH/NCCD and Lung Cancer CT Scan datasets, the VGG-GAN approach achieved improved performance across both binary and multi-class classifications. In addition, Kumaran et al.7 presented an explainable deep learning framework that integrates three pre-trained models: VGG16, ResNet50, and InceptionV3 within a unified ensemble to enhance lung cancer diagnosis from medical images. The approach standardizes input images through resizing and format conversion to maintain consistency across datasets and maximize model performance. Durgam et al.28 proposed the Cancer Nexus Synergy (CanNS) framework for enhancing lung cancer detection through integrated deep learning and transformer models. It combines Swin-Transformer UNet (SwiNet) for segmentation, Xception-LSTM GAN (XLG) CancerNet for classification, and Devilish Levy Optimization (DevLO) for parameter fine-tuning, achieving superior accuracy, sensitivity, and specificity compared to prior methods. Gandhi et al.29 introduced a Hybrid Attention Vision Transformer (HViT) for enhanced lung cancer detection. The model leverages attention mechanisms to capture complex features in multi-modal clinical images, improving accuracy in early-stage detection and generalizing across diverse datasets via transfer learning, though with potential high computational costs.
Metaheuristic optimization methods
Malik et al.30 developed a model for optimizing chemotherapeutic targets in non-small cell lung cancer using transfer learning for precision medicine. It employs a hybrid UNet transformer for feature extraction, modified Rime optimization (MRO) for dimensionality reduction, and a deep transfer learning (DTransL) model, achieving accuracies up to 98.398% on benchmark datasets like Davis, KIBA, and Binding-DB.
Moreover, some studies have explored multi-objective metaheuristic optimization and ensemble strategies across diverse classification domains. For instance, Dhal et al. proposed multi-stage and zone-oriented multi-objective frameworks for multi-label feature selection and classification31,32. They also introduced multi-objective deep learning systems for clinical disease prediction and histopathological cancer diagnosis33,34. While these works successfully demonstrate the utility of multi-objective optimization in handling complex feature interactions, our approach differs fundamentally by employing the Enhanced Animated Oat Optimization with Genetic Operators (EAOO-GA) to adaptively optimize ensemble weights in a single-label medical imaging context.
Recently, several advanced hybrid metaheuristic algorithms have been introduced to improve optimization performance in both engineering and medical domains. Mahapatra et al. proposed the Fast-Flying PSO (FF-PSO)35 and Quantized Orthogonal Experimentation SSA (QOX-SSA)36, which enhance swarm intelligence and salp swarm optimization by incorporating quantization and orthogonal experimentation strategies to balance global and local search. Their subsequent Adaptive Dimensional Search SSA (ADOX-SSA)37 further improved convergence through adaptive search dimension control. Similarly, Agrawal et al. introduced Local Search SSA-driven Deep CNN for brain tumor analysis38, demonstrating the utility of hybrid swarm optimizers for deep network fine-tuning, and later proposed the Quantum-Inspired Adaptive Mutation PSO (QAMO-PSO)39 for robust global search and parameter adaptation. In contrast, our proposed EAOO-GA framework employs a novel bio-inspired Animated Oat Optimization enhanced with genetic operators to adaptively optimize ensemble weights for lung cancer CT image classification. While conceptually related to these hybrid metaheuristics, EAOO-GA uniquely integrates evolutionary and biologically inspired dynamics within an ensemble deep learning context, ensuring both diagnostic interpretability and computational efficiency.
Limitations and research gaps
Moreover, a comprehensive review of the current literature reveals significant advancements in deep learning for lung cancer diagnosis, primarily through the use of sophisticated CNNs, ensemble methods, and transfer learning. However, as critically summarized in Table 1, these approaches consistently encounter persistent limitations that hinder their clinical deployment. Key among these are a prevalent trade-off between model complexity and interpretability, a high computational burden, especially for ensemble techniques, and challenges in generalizing to diverse, real-world data. Many models also remain vulnerable to data-specific biases and often lack transparency in their decision-making processes. It is these identified gaps, particularly the need for a robust yet interpretable model that generalizes effectively without prohibitive computational cost, that motivate the proposed framework in this study.
Medical image analysis for pulmonary oncology has emerged as a critical research domain, driving innovation in computational diagnostics. This section surveys contemporary methodologies developed to augment the precision and robustness of automated lung cancer detection systems, critically examining their technical foundations, strengths, and inherent constraints. We organize the related studies into subsections focusing on deep learning models, ensemble and fusion methods, and metaheuristic optimization techniques, before discussing persistent limitations and gaps.
Deep learning models for lung cancer detection
Mohamed et al.19 proposed a hybrid model that combines CNNs with the Ebola Optimization Search Algorithm (EOSA) to enhance lung cancer classification in CT images. The approach aimed to optimize CNN weights and biases using EOSA, addressing challenges in CNN configuration. Evaluated on the IQ-OTH/NCCD dataset, the EOSA-CNN model achieved an accuracy of 93.21% and demonstrated superior performance compared to other metaheuristic-CNN methods, particularly in classifying normal and malignant cases. In addition, Eram et al.20 introduced an improved DenseNet201 model that combines transfer learning with explainable artificial intelligence to classify various lung diseases from X-ray images. Its performance was evaluated against other transfer learning models, including EfficientNetB0, InceptionV3, and LeNet, using standard evaluation metrics. Imran et al.21 proposed a transformer-based hierarchical model for non-small cell lung cancer (NSCLC) detection and classification from histopathological images. By integrating CNNs for local feature extraction and vision transformers (ViTs) for capturing long-range dependencies, the model classifies NSCLC into normal, adenocarcinoma, and squamous cell carcinoma categories. Evaluated on the LC25000 dataset, it achieved an accuracy of 98.8%, outperforming state-of-the-art methods in precision and recall. Sewatkar22 introduced the MeVs-deep CNN, an optimized deep learning model for efficient lung cancer classification using PET/CT images. The approach employs Memory-Enabled Vulture Search Optimization to segment and classify data after preprocessing with Non-Local Means filtering, demonstrating improved autonomy and accuracy in categorization.
Furthermore, Elkenawy et al.23 developed a hybrid framework combining the Greylag Goose Optimization (bGGO) algorithm with a multilayer perceptron for lung cancer classification. By enhancing feature selection and applying comprehensive preprocessing, their method outperformed several binary optimization algorithms and achieved 98.4% accuracy, with results validated through statistical tests and performance analysis. Zhang et al.1 introduced a DenseNet-based CNN framework enhanced with data fusion and mobile edge computing for lung cancer classification. The approach leverages preprocessing and multi-source data integration to improve accuracy, while edge computing enables real-time CT scan analysis. The model classifies lung tissue into Normal, Benign, or Malignant, with malignant cases further categorized into adenocarcinoma, squamous cell carcinoma, and large cell carcinoma. Shafi and Chinnappan24 presented a hybrid transformer-CNN and LSTM model for lung disease segmentation and classification. The approach includes preprocessing with median filtering, segmentation using an improved Transformer-based CNN (ITCNN), feature extraction (e.g., texture via modified LGIP), and classification with a hybrid LinkNet-Modified LSTM (L-MLSTM), outperforming existing models in accuracy. Lakshminarasimha et al.25 developed an optimization-driven deep learning approach for enhancing lung cancer diagnosis from CT images. The model integrates CBAM with EfficientNet for feature extraction and applies optimization algorithms like Gray Wolf Optimization (GWO) for hyperparameter tuning, achieving accuracies of 99.81% and 99.25% on the Lung-PET-CT-Dx and LIDC-IDRI datasets, respectively.
Ensemble and fusion methods
Moreover, Shariff et al.26 introduced a CNN framework enhanced with Differential Augmentation (DA) to mitigate memory overfitting and improve model generalization on unseen lung cancer data. The DA strategy involved applying targeted augmentations, such as hue, brightness, saturation, and contrast adjustments, to increase data diversity and robustness. The model was evaluated on multiple datasets, including IQ-OTH/NCCD, and optimized using Random Search for hyperparameter tuning. The CNN and DA model achieved a classification accuracy of 98.78%, surpassing several state-of-the-art architectures, including DenseNet, ResNet, EfficientNetB0, and ensemble-based approaches. Statistical validation, as confirmed by Tukey’s HSD post-hoc test, demonstrated the significance of the model’s superior performance, highlighting its effectiveness in overcoming overfitting and enhancing cross-dataset reliability.
Pamungkas et al.27 proposed a hybrid framework that combines VGG-based CNNs with GAN-driven data augmentation to enhance lung cancer classification. By generating realistic synthetic images for underrepresented classes, their method mitigates class imbalance and enhances model generalization. Validated on the IQ-OTH/NCCD and Lung Cancer CT Scan datasets, the VGG-GAN approach achieved improved performance across both binary and multi-class classifications. In addition, Kumaran et al.7 presented an explainable deep learning framework that integrates three pre-trained models: VGG16, ResNet50, and InceptionV3 within a unified ensemble to enhance lung cancer diagnosis from medical images. The approach standardizes input images through resizing and format conversion to maintain consistency across datasets and maximize model performance. Durgam et al.28 proposed the Cancer Nexus Synergy (CanNS) framework for enhancing lung cancer detection through integrated deep learning and transformer models. It combines Swin-Transformer UNet (SwiNet) for segmentation, Xception-LSTM GAN (XLG) CancerNet for classification, and Devilish Levy Optimization (DevLO) for parameter fine-tuning, achieving superior accuracy, sensitivity, and specificity compared to prior methods. Gandhi et al.29 introduced a Hybrid Attention Vision Transformer (HViT) for enhanced lung cancer detection. The model leverages attention mechanisms to capture complex features in multi-modal clinical images, improving accuracy in early-stage detection and generalizing across diverse datasets via transfer learning, though with potential high computational costs.
Metaheuristic optimization methods
Malik et al.30 developed a model for optimizing chemotherapeutic targets in non-small cell lung cancer using transfer learning for precision medicine. It employs a hybrid UNet transformer for feature extraction, modified Rime optimization (MRO) for dimensionality reduction, and a deep transfer learning (DTransL) model, achieving accuracies up to 98.398% on benchmark datasets like Davis, KIBA, and Binding-DB.
Moreover, some studies have explored multi-objective metaheuristic optimization and ensemble strategies across diverse classification domains. For instance, Dhal et al. proposed multi-stage and zone-oriented multi-objective frameworks for multi-label feature selection and classification31,32. They also introduced multi-objective deep learning systems for clinical disease prediction and histopathological cancer diagnosis33,34. While these works successfully demonstrate the utility of multi-objective optimization in handling complex feature interactions, our approach differs fundamentally by employing the Enhanced Animated Oat Optimization with Genetic Operators (EAOO-GA) to adaptively optimize ensemble weights in a single-label medical imaging context.
Recently, several advanced hybrid metaheuristic algorithms have been introduced to improve optimization performance in both engineering and medical domains. Mahapatra et al. proposed the Fast-Flying PSO (FF-PSO)35 and Quantized Orthogonal Experimentation SSA (QOX-SSA)36, which enhance swarm intelligence and salp swarm optimization by incorporating quantization and orthogonal experimentation strategies to balance global and local search. Their subsequent Adaptive Dimensional Search SSA (ADOX-SSA)37 further improved convergence through adaptive search dimension control. Similarly, Agrawal et al. introduced Local Search SSA-driven Deep CNN for brain tumor analysis38, demonstrating the utility of hybrid swarm optimizers for deep network fine-tuning, and later proposed the Quantum-Inspired Adaptive Mutation PSO (QAMO-PSO)39 for robust global search and parameter adaptation. In contrast, our proposed EAOO-GA framework employs a novel bio-inspired Animated Oat Optimization enhanced with genetic operators to adaptively optimize ensemble weights for lung cancer CT image classification. While conceptually related to these hybrid metaheuristics, EAOO-GA uniquely integrates evolutionary and biologically inspired dynamics within an ensemble deep learning context, ensuring both diagnostic interpretability and computational efficiency.
Limitations and research gaps
Moreover, a comprehensive review of the current literature reveals significant advancements in deep learning for lung cancer diagnosis, primarily through the use of sophisticated CNNs, ensemble methods, and transfer learning. However, as critically summarized in Table 1, these approaches consistently encounter persistent limitations that hinder their clinical deployment. Key among these are a prevalent trade-off between model complexity and interpretability, a high computational burden, especially for ensemble techniques, and challenges in generalizing to diverse, real-world data. Many models also remain vulnerable to data-specific biases and often lack transparency in their decision-making processes. It is these identified gaps, particularly the need for a robust yet interpretable model that generalizes effectively without prohibitive computational cost, that motivate the proposed framework in this study.
Methodology
Methodology
This study presents the SE-FusionEAOO Ensemble, a multi-stage hybrid framework developed to achieve state-of-the-art accuracy, robustness, and interpretability in lung cancer detection from CT scans. The overall system architecture is depicted in Fig. 1. The proposed methodology proceeds through the following structured stages: (1) Dataset preprocessing, augmentation, and class imbalance mitigation using SMOTE (Section 3.1); (2) Incorporation of Squeeze-and-Excitation (SE) blocks for adaptive channel recalibration (Section 3.2); (3) Rigorous evaluation and selection of six top-performing pre-trained models from eight candidates via transfer learning (Section 3.3); (4) Development and deployment of the proposed Enhanced Animated Oat Optimization with Genetic Operators (EAOO-GA) for optimal ensemble weight determination (Section 3.4, Algorithm 1); (5) Construction of three SE-enhanced fusion architectures by strategically pairing the selected models: DenseNet201 + EfficientNetB6, InceptionV3 + MobileNetV2, and DenseNet121 + ResNet50, each enhanced with SE modules (Section 3.5, Fig. 5, 6); (6) Weighted aggregation of predictions from the three fusion models using EAOO-GA-optimized weights to produce the final classification output (Section 3.6, Algorithm 2); and (7) Integration of Grad-CAM for model interpretability and visualization of decision-relevant regions (Section 3.6.2).
Dataset description and preprocessing
Dataset description
This paper employs the publicly available IQ-OTH/NCCD lung cancer dataset54, a well-established benchmark for the development of CAD systems. The dataset comprises CT scan images categorized into three critical classes: Benign, Malignant, and Normal, as shown in Fig. 2. A significant challenge posed by this dataset is the inherent heterogeneity in image dimensions. A detailed breakdown of the image size distribution per class, provided in Table 2, reveals this complexity. While the majority of images ( pixels) form a homogeneous subset, there are notable exceptions that necessitate meticulous preprocessing. Specifically, the Malignant class contains images sized and , and a single image of . The Normal class also contains a single outlier sized . This precise understanding of the data structure is crucial for designing tailored preprocessing steps to ensure spatial uniformity for model input. Resizing and standardization strategies must account for these variations to prevent the loss of critical diagnostic information or the introduction of distortions.
The overall class distribution is also a key consideration. As shown in Table 2, the Malignant class is the most populous, followed by Normal and then Benign cases. This imbalance is a common characteristic of medical imaging datasets and must be addressed during model training and evaluation to prevent algorithmic bias towards the majority class.
The IQ-OTH/NCCD dataset, while a valuable resource, is subject to potential biases common in medical imaging datasets. These biases must be acknowledged as they impact the generalizability of trained models. A primary concern is the limited documentation regarding patient demographics (e.g., age, gender, ethnicity) and the variety of imaging equipment used. A homogeneity in these factors could limit model performance when applied to broader, more diverse populations or data from different scanners. Furthermore, as detailed in Table 2, the dataset exhibits a pronounced class imbalance, characterized by a substantial overrepresentation of Malignant instances relative to Benign cases. This disparity introduces a significant risk of algorithmic bias, predisposing the model to overfit the prevalent majority class and consequently impairing its predictive accuracy for the critically important minority class, a scenario that is untenable in medical diagnostics, where equitable performance across all pathologies is mandatory. To counteract this bias and fortify model generalizability, we implemented a dual strategy: comprehensive data augmentation to increase phenotypic diversity, and the Synthetic Minority Over-sampling Technique (SMOTE) to strategically oversample the Benign class. Furthermore, model validation was conducted on external cohorts to ensure robustness and clinical applicability across diverse patient demographics.
Furthermore, the proposed SE-FusionEAOO framework was performed using the publicly available Lung Image Database Consortium and Image Database Resource Initiative (LIDC-IDRI) dataset55. This dataset is widely adopted for developing and benchmarking CAD systems for pulmonary nodule detection and classification. It consists of thoracic CT scans from 1,018 subjects, each annotated by four experienced radiologists who provided nodule-level assessments for lesions with diameters 3 mm. Each annotation includes clinically relevant features such as malignancy likelihood, texture, sphericity, and subtlety, which are strongly correlated with cancer risk. For this study, we excluded nine CT cases exhibiting inconsistent slice spacing and 121 cases with slice thicknesses 3 mm, resulting in a final cohort of 888 valid scans. Nodules were categorized as benign or malignant based on consensus malignancy ratings among the radiologists. The retained scans had slice thicknesses ranging from 0.6 mm to 2.5 mm and in-plane resolutions between 0.48 mm and 0.72 mm. This dataset is widely adopted for developing and benchmarking computer-aided diagnosis (CAD) systems for pulmonary nodule detection and classification. It consists of thoracic CT scans from 1,018 subjects, each annotated by four experienced radiologists who provided nodule-level assessments for lesions with diameters 3 mm. Each annotation includes clinically relevant features such as malignancy likelihood, texture, sphericity, and subtlety, which are strongly correlated with cancer risk. For this study, we excluded nine CT cases exhibiting inconsistent slice spacing and 121 cases with slice thicknesses 3 mm, resulting in a final cohort of 888 valid scans. Nodules were categorized as benign or malignant based on consensus malignancy ratings among the radiologists. The retained scans had slice thicknesses ranging from 0.6 mm to 2.5 mm and in-plane resolutions between 0.48 mm and 0.72 mm. Importantly, the LIDC-IDRI dataset was used for external validation to ensure true out-of-distribution generalization, as model training and optimization were performed exclusively on the IQ-OTH/NCCD dataset.
Data preprocessing
The preprocessing pipeline used in the proposed framework is shown in Fig. 3.
Normalization serves as a vital preprocessing step, primarily for scaling images and standardizing pixel intensities. This standardization promotes stable and efficient model convergence during training. Initially, each input image undergoes intensity normalization, rescaled to a [0, 1] range using min-max normalization, as defined by the following equation56:where I denotes the input image with pixel values bounded by and , and represents the normalized image with values scaled to the range . Subsequently, each image in the dataset was resized to a spatial resolution of pixels to ensure dimensional consistency across all samples.
Data augmentation
To enhance model robustness, mitigate overfitting, and ensure generalization to unseen data, a comprehensive suite of data augmentation techniques was employed. These transformations were carefully selected to mimic real-world variations inherent in medical imaging. Furthermore, the pronounced class imbalance within the dataset was addressed using synthetic oversampling. The augmentation pipeline included the following transformations, applied randomly during training:Rotation: Images were randomly rotated by angles between and 10 degrees to simulate different anatomical orientations encountered during scanning.
Translation: Images were shifted horizontally and vertically by up to of their dimensions to account for variations in lung positioning within the scanner’s field of view.
Horizontal Flipping: Mirror images were generated to introduce natural variability in image presentation, a common occurrence in clinical settings.
Gaussian Blurring: A kernel was applied with an automatically calculated standard deviation to simulate slight focus variations and reduce image noise, thereby conditioning the model to handle practical diagnostic challenges without distorting critical lung tissue details.
Handling class imbalance
Initial data analysis revealed a significant class imbalance, with ’Malignant’ cases substantially outnumbering ’Benign’ and ’Normal’ cases (Table 3). To prevent model bias towards the majority class, the SMOTE was employed. SMOTE synthesizes new examples in the feature space of the minority classes by interpolating between existing instances, effectively balancing the class distribution and enriching the dataset with diverse examples of underrepresented classes.
This balanced distribution, achieved through SMOTE and visualized in Fig. 4, ensures all classes have equal representation during training. This prevention of model bias towards the majority class (‘Malignant’) forces the learning of discriminative features across all categories, Benign, Malignant, and Normal, with improved accuracy. Consequently, this strategy significantly enhances the model’s overall diagnostic reliability and generalizability across all case types.
Squeeze-and-excitation blocks
To enhance the feature representation and discriminative power of our CNN, we integrated SE blocks, a foundational architectural unit introduced by57. These blocks act as a self-attention mechanism that dynamically recalibrates channel-wise feature responses, enabling the network to autonomously emphasize informative features and suppress less relevant or redundant ones. The SE block operates through a sequential two-phase process: squeeze and excitation. The squeeze phase employs global average pooling to aggregate spatial information from each feature map. This operation transforms a feature map tensor into a channel-wise descriptor vector , effectively capturing the global distribution of responses for each channel:Subsequently, the excitation phase processes this descriptor using a simple multi-layer perceptron (MLP), typically comprising two fully connected layers, to model nonlinear interactions and dependencies between channels. This MLP acts as a gating mechanism, generating a vector of adaptive scaling weights where each value is between 0 and 1:Where, and are the learned weights of the MLP, denotes the ReLU activation function, and is the sigmoid activation function that normalizes the output. The final output of the SE block is produced by rescaling the original feature map channel-wise with the learned weights: .
To clarify the technical integration: SE blocks are inserted post-feature extraction in each fusion pair, dynamically modeling channel interdependencies to prioritize malignancy-indicative patterns (e.g., nodule texture) while suppressing noise, as formalized in Eqs. (6)-(7). This enhances both accuracy and interpretability by revealing the importance of features.
Rationale and integration of SE for lung cancer detection in feature fusion
In the specific context of lung cancer detection from CT scans, this adaptive feature recalibration mechanism is critically advantageous. The network learns to assign higher weights to feature channels that encode semantically salient patterns indicative of malignant nodules, such as spiculation, lobulation, or ground-glass opacity, while attenuating channels associated with normal parenchyma or benign structures. This selective emphasis not only bolsters classification accuracy by focusing on the most discriminative evidence but also inherently improves the model’s interpretability. By examining the excitation weights, one can gain insights into which visual features the model deems most significant for its diagnosis, thereby fostering greater clinical trust and verification. This capability is further amplified by the strategic integration of SE blocks within our feature fusion architecture. Rather than merely concatenating features from different backbone networks, the SE mechanism actively recalibrates the combined feature maps post-fusion. This induces a powerful complementary effect, prioritizing and amplifying the most salient diagnostic features from each network while suppressing redundant or contradictory information. Consequently, the final fused representation is not only comprehensive but also optimally attuned to the specific task of lung nodule classification.
Deep feature extraction via transfer learning
In the domain of medical image analysis, particularly for lung cancer detection, the scarcity of large, annotated datasets presents a significant challenge. Training CNNs from scratch under such constraints is computationally prohibitive and highly susceptible to overfitting, as it demands substantial data volumes to achieve optimal performance58. To overcome this fundamental limitation, transfer learning emerges as a pivotal strategy. This approach enables us to leverage the rich, hierarchical feature representations learned by state-of-the-art CNN models pre-trained on large-scale datasets (e.g., ImageNet)59. Rather than learning from random initialization, we adapt these powerful feature extractors to our specific task of lung nodule classification. This paradigm leverages generalized visual knowledge (e.g., edge detection, texture patterns, shape recognition) acquired from natural images, which proves highly transferable to medical imaging domains.
In this study, we meticulously evaluated a diverse suite of modern architectures renowned for their efficacy in image classification. Through rigorous empirical analysis on our target dataset, we identified the six top-performing models to serve as the foundational feature extractors for our ensemble framework. The selected models are: EfficientNetB6, Inception v3, ResNet50, DenseNet201, DenseNet121, and MobileNetV2. This selection ensures a blend of architectural diversity covering residual connections, dense connectivity, compound scaling, and inverted residuals, which is crucial for capturing a comprehensive spectrum of discriminative features from lung CT scans. By initializing our framework with these pre-trained weights and subsequently optimizing them on the lung cancer dataset, we effectively harness the power of transfer learning.
EfficientNetB6-based model implementation
EfficientNet-B6 is recognized for delivering strong classification accuracy while maintaining computational efficiency, owing to its innovative compound scaling technique. This strategy uniformly scales the network’s depth, width, and input resolution, resulting in balanced performance improvements without unnecessary computational overhead. At the heart of the architecture is the Mobile Inverted Bottleneck Convolution (MBConv) block, which integrates SE mechanisms to adaptively recalibrate channel-wise feature responses. This feature refinement enables the model to better emphasize relevant patterns and suppress less critical information, enhancing its effectiveness in detecting nuanced features within cervical cytology images60.
EfficientNet’s robustness persists even when operating on lower-resolution images, making it a strong candidate for applications involving variable image quality. The architecture’s scalability is guided by a set of compound coefficients: a global scaling factor , and constants , , and , which define how resolution, width, and depth are scaled from the baseline configuration , , and , respectively. These relationships are mathematically described in Eqs. (8), (9), and (10), ensuring a systematic approach to model scaling.
Inception v3 based model implementation
The Inception V3 model, developed by Google, represents an advanced version of the original Inception network. It was designed to increase classification accuracy while maintaining computational efficiency. The core idea behind the Inception architecture is to approximate an optimal sparse structure by transforming it into a dense one, where similar non-uniform sparse elements are clustered together. This approach enables the formation of deeper and wider networks without significantly increasing computational costs, as described in61. The architectural design of Inception V3 is based on several guiding principles: Locality of information: Many signals in the input space tend to be spatially correlated. This allows for the use of smaller convolution filters to capture meaningful patterns. By applying dimension reduction (e.g., using convolutions) before expensive operations, computational efficiency can be improved without significant information loss.
Balanced network expansion: To effectively utilize available computational resources, both the depth (number of layers) and width (number of filters) of the network should be increased in tandem. This enables the model to learn richer and more diverse feature representations.
Gradual dimensional reduction: Aggressive reduction in feature map dimensions at early stages of the network is discouraged, as it may lead to the loss of critical information. Instead, gradual downsampling is preferred to preserve spatial details during the early learning phase.
Wide layer efficiency: Wider layers tend to learn faster and are especially beneficial at deeper levels of the network, where abstract feature representations are formed.
ResNet50 based model implementation
ResNet5062, a popular and deeper variant of ResNet, comprises 50 layers and was initially trained on the ImageNet dataset, which includes over a million labeled images. Its architecture involves batch normalization layers, non-linearities, and a combination of identity and convolutional residual blocks. ResNet50 is frequently paired with other architectures, such as Highway Networks, which learn skip connections through additional weight matrices.
The network begins with a conv1 layer consisting of 64 filters using a kernel, followed by a max-pooling operation with a stride of 2. The subsequent layers conv2 through conv5 are composed of residual blocks. Each block includes three convolutional layers: two convolutions (first for dimensionality reduction, the second for restoration) and a convolution sandwiched between them, as described in Eq. 11. This structure is iteratively applied within each residual block, with the final convolutional stage feeding into an average pooling layer and a fully connected layer. Classification is performed using a Softmax activation function.
DenseNet-121 based model implementation
The DenseNet-121 architecture63 is a key advancement in deep learning for computer vision, designed to enhance feature reuse and mitigate the vanishing gradient problem in very deep networks. Its hallmark is the dense connectivity pattern, where each layer receives feature maps from all preceding layers and passes its outputs to all subsequent ones. This design provides implicit deep supervision, strengthening feature propagation. The network is structured into multiple dense blocks, where each layer applies batch normalization, a Rectified Linear Unit (ReLU), and a convolution, with outputs concatenated to preserve low-level details while building high-level representations. Between dense blocks, transition layers consisting of a convolution for feature compression followed by average pooling regulate spatial dimensions and computational cost. Overall, DenseNet-121 achieves strong performance while maintaining parameter efficiency through its compact yet expressive design.
DenseNet-201 based model implementation
Building upon the foundational DenseNet architecture63, DenseNet-201 represents a deeper and more powerful variant designed to further enhance feature representation and gradient flow in very deep convolutional networks. While it shares the core principles of dense connectivity and feature reuse with DenseNet-121, its increased depth makes it particularly adept at capturing complex, hierarchical patterns in medical images. The architecture is structured around four dense blocks interconnected by transition layers. The key differentiator from DenseNet-121 is its greater depth, achieved by incorporating more layers within these dense blocks. This enables the network to learn a more comprehensive hierarchy of features, ranging from simple edges and textures in the early layers to complex morphological structures in the deeper layers.
The hallmark dense connectivity remains central to its design: each layer within a block receives concatenated feature maps from all preceding layers and passes its own outputs to all subsequent layers. This design ensures:Maximized Feature Reuse: Low-level features, such as tissue textures or nodule edges, are preserved and utilized throughout the network, improving representational efficiency.
Alleviated Vanishing Gradient: The direct connections provide shortened paths for gradient flow during backpropagation, ensuring stable training even at this significant depth.
Implicit Deep Supervision: The feature reuse acts as a form of regularization, reducing the risk of overfitting on limited medical datasets.
MobileNetV2 based model implementation
To incorporate a highly efficient architecture into our ensemble, we selected MobileNetV264. This model was specifically designed for mobile and embedded vision applications, offering an excellent trade-off between computational efficiency and accuracy. Its core innovation is the inverted residual with a linear bottleneck structure. Unlike traditional residual blocks that first compress and then expand feature maps, MobileNetV2’s inverted residual block first expands the channel dimension using a lightweight convolution. This expansion is followed by a depthwise convolution that filters the expanded features, and a final convolution that projects the features back to a lower-dimensional representation. This design significantly reduces computational cost and model size while preserving critical information. A key feature of this architecture is the use of linear bottlenecks. Instead of applying a non-linear activation function (such as ReLU) after the final projection layer, a linear activation function is used. This prevents non-linearities from destroying important information in the low-dimensional space, leading to more stable and representative features.
Proposed enhanced animated oat optimization algorithm with genetic operators (EAOO-GA)
While the standard Animated Oat Optimization (AOO) algorithm demonstrates innovative inspiration from natural phenomena, our empirical analysis reveals certain limitations that can hinder its performance on complex optimization problems, such as those encountered in medical image analysis. Primarily, the algorithm can occasionally exhibit a tendency towards premature convergence, where the population loses diversity too rapidly and becomes trapped in local optima. This is often attributed to its strong exploitation bias in the later stages, without a sufficient mechanism to reintroduce lost diversity or to escape suboptimal regions once discovered. To mitigate these limitations and enhance the algorithm’s robustness and global search capability, we propose an Enhanced Animated Oat Optimization (EAOO-GA) algorithm. The core innovation lies in the strategic integration of genetic operators, including crossover and mutation, into the standard AOO workflow. This hybrid approach leverages the strength of AOO’s bio-inspired exploration and exploitation while bolstering its population diversity through well-established evolutionary mechanisms.
Standard AOO algorithm
The AOO is a new metaheuristic algorithm that emulates three specific behaviors observed in the natural conduct of animated oats within their environment65. Initial seed displacement through environmental forces, including wind, water currents, and animal-mediated transport.
Moisture-induced morphological changes in the seed’s awn structure generate mechanical forces that produce rolling motion across terrestrial surfaces.
Kinetic energy accumulation during locomotion phases, with obstacle interactions triggering mechanical energy release mechanisms for enhanced dispersal range.
Initialization AOO initializes its population by generating a set of random solutions, as formalized in Eq. 12The position of each individual in the subgroup is denoted by . The subpopulation size and the problem’s dimension are given by N and Dim, respectively. Each element x
i, j of the position matrix is computed using Eq. 13:where r is a random number between 0 and 1.
Parameter Calculation The dynamic behavior of the animated oat during dispersal is characterized by three key biomechanical parameters: seed mass, eccentric rolling coefficient, and primary awn length. These parameters are algorithmically computed as follows:where m is the seed mass, N is the population size, L is the primary awn length, e is the eccentric rotation coefficient, t is the current iteration, and c is a dynamic adjustment factor.
Exploration Phase Following abscission, the dispersal of animated oat seeds is primarily driven by environmental vectors such as wind, water, or fauna. The stochastic nature of this process facilitates extensive exploration of the search space. The corresponding position update is defined as:where and denote the positions of the individual and the best individual in the population at iteration , respectively.
Exploitation Phase The remaining seeds are partitioned into two subsets based on the presence of obstacles during dispersal, assuming both outcomes are equiprobable. In the absence of obstructions, seeds undergo hygroscopic rolling driven by moisture-induced stress gradients. This motion is modeled using curvature-induced snap buckling, inspired by the work of Lindtner et al.66, which demonstrated that anisotropic swelling is governed by cellulose microfiber orientation. The rolling dynamics are mathematically represented through torque equations and eccentric rotation:where is a random matrix of size Dim with entries uniformly distributed in . The mean value in the Lévy flight is a random number between 0 and 1, typically used to adjust the step size. This stochasticity helps regulate the direction and distance of movement during flight. The scale parameter controls the width of the step length distribution, defining the range of possible step sizes. The current velocity vector represents the motion state of the particle or individual. The stability parameter governs the shape of the distribution, influencing the diversity and randomness of step lengths. The gamma function generalizes the factorial function to continuous values. For Lévy flights, is typically set to 1.5. The position update is defined as follows:where denotes the launch angle relative to the ground, is the air drag coefficient governing aerial motion, is a uniformly sampled random value, k quantifies the elasticity of the primary awn, and x represents the change in awn length due to elastic energy storage prior to release.
EAOO-GA algorithm
To address the limitations of the standard AOO, we introduce two genetic operators applied after the core AOO update procedures. These operators serve as a diversity-preserving mechanism, enabling the algorithm to avoid premature convergence and explore the search space more effectively.
Crossover Operator We employ a differential crossover mechanism to facilitate the exchange of information between individuals. For each target individual in the population, a trial vector is generated by combining components from the target vector and a mutant vector created from three distinct randomly selected individuals:where , , are distinct random individuals, F is a scaling factor, CR is the crossover rate, and ensures at least one parameter is inherited from the mutant vector.
Mutation Operator A non-uniform mutation strategy is applied to introduce random perturbations, with decreasing magnitude over iterations to transition from exploration to exploitation:where calculates the mutation step size that decreases over time:Here, r is a random number in [0, 1], T is the maximum iterations, and b determines the non-uniformity degree.
Selection and Integration The genetic operators are applied after AOO’s exploitation phase. A greedy selection mechanism determines whether newly generated solutions replace existing ones:This hybrid approach maintains AOO’s bio-inspired search capabilities while enhancing its global optimization performance through evolutionary mechanisms. The pseudocode of EAOO-GA is provided in Algorithm 1.
The feature fusion framework
In this study, we propose an optimized ensemble framework that extends beyond traditional model aggregation. Rather than merely combining outputs from individual models, we employ feature fusion architectures that integrate complementary representations from multiple deep learning backbones to enhance accuracy. Three tailored fusion architectures: Fusion 1, Fusion 2, and Fusion 3 were developed, each designed to exploit the distinctive strengths of different pre-trained models.
The selection of models for these architectures followed a systematic process designed to maximize both diversity and efficiency. To prevent redundancy and capture a broader range of feature representations, models from the same architectural family were not combined. Instead, our fusion strategy paired networks of differing capacities and inductive biases, specifically, coupling a high-capacity model with a more lightweight counterpart to balance diagnostic accuracy with computational cost. The resulting three fusion pairs are as follows:Fusion 1: DenseNet201 + EfficientNetB6
Fusion 2: Inception v3 + MobileNetV2
Fusion 3: DenseNet121 + ResNet50
The feature extraction process begins by propagating input images through the pre-trained models within each fusion pair. The resulting feature maps are then concatenated, integrating the distinct hierarchical patterns learned by each network. This systematic selection process yielded an initial pool of eight pre-trained models. Through rigorous evaluation, the six top-performing models were identified and strategically paired to form the three distinct fusion architectures, as illustrated in the model selection pipeline (Fig. 5). To strengthen the discriminative capacity of the fused representations, we incorporate an SE block. This mechanism adaptively recalibrates channel-wise responses, enabling the network to highlight informative features, such as texture and morphological patterns associated with malignancy, while suppressing irrelevant ones. Such dynamic feature emphasis is especially vital in lung cancer detection, where subtle yet critical visual cues must be accurately identified.
The recalibrated features from the SE block are subsequently subjected to Global Average Pooling, which reduces dimensionality while preserving salient feature information and enhancing robustness to spatial translations. A Dropout layer is strategically inserted following this operation to serve as a regularization mechanism, effectively mitigating overfitting by preventing complex co-adaptations of features during training. The network culminates in a softmax activation layer, which generates the final probability distribution across the target diagnostic classes for each fusion pathway. The block diagram of the feature fusion architectures is shown in Fig. 6. This feature fusion methodology is fundamentally motivated by the need to compensate for the performance variability and inherent limitations of individual models. By harnessing complementary representations from multiple architectural families, our ensemble framework achieves synergistic improvements in predictive accuracy, generalization capability, and operational robustness for pulmonary nodule classification. The integrated approach provides three distinct advantages: (1) superior classification performance through multi-perspective feature integration, (2) expanded representational capacity for capturing heterogeneous pathological patterns, and (3) inherent interpretability facilitated by the SE block’s channel-wise attention mechanism, which collectively contribute to more reliable and clinically actionable diagnostic support.
Proposed SE-FusionEAOO ensemble framework for lung cancer detection
In this section, we present a novel hybrid optimized ensemble framework tailored for accurate and reliable lung cancer detection from CT scans. Ensemble learning is particularly advantageous in medical diagnostics, as it enhances predictive accuracy and robustness by combining the strengths of multiple diverse models67. Such reliability is critical in lung cancer detection, where diagnostic errors may have serious consequences. The proposed framework is designed to achieve superior classification performance through two stages: (i) constructing three SE-enhanced fusion models as strong base learners, and (ii) aggregating their outputs using an optimized weighting strategy. The distinguishing feature of our approach lies in employing the Enhanced Animated Oat Optimization algorithm (EAOO-GA), described in Section 3.4, to fine-tune the ensemble weights. This metaheuristic efficiently explores the global search space to identify the optimal weight configuration, ensuring that the most informative models contribute more prominently to the final decision. As a result, the system achieves higher reliability and diagnostic precision. The overall architecture of the proposed ensemble is depicted in Fig. 1.
Ensemble construction and optimization process
The process for building and optimizing our ensemble is methodically outlined below and summarized in Algorithm 2.
Step 1: Base Model Training and Feature Fusion Three separate fusion models (Fusion 1, Fusion 2, Fusion 3) are constructed by pairing different pre-trained architectures (DenseNet201+EfficientNetB6, Inception v3+MobileNetV2, DenseNet121+ResNet50). Each pair is integrated using concatenation and enhanced with SE blocks for adaptive feature recalibration, as described in Section 3.2. These models are trained on the lung cancer dataset to serve as expert feature extractors and classifiers.
Step 2: Prediction Generation Each of the three trained fusion models is used to generate prediction vectors (e.g., class probabilities for ’Benign’, ’Malignant’, ’Normal’) on the validation or test set. Let represent the prediction matrices from Fusion 1, Fusion 2, and Fusion 3, respectively.
Step 3: Fitness Function Definition The core objective of the EAOO algorithm is to find the optimal weight vector that maximizes the accuracy of the weighted ensemble prediction. The fitness function is defined as:where are the true labels, and , .
Step 4: Weight Optimization via EAOO The proposed EAOO algorithm is deployed to solve this optimization problem. The EAOO population consists of candidate weight vectors. The algorithm evolves these candidates over generations through its operations (exploration, exploitation, crossover, mutation) to maximize the fitness function, ultimately converging on the optimal set of weights .
Step 5: Final Ensemble Prediction Once optimized, the final prediction for a new input image is made by a weighted average of the base model predictions using the optimal weights :The class with the highest probability in is selected as the final diagnosis.
Our EAOO-optimized ensemble framework provides a dynamic and highly accurate solution for lung nodule classification. By synergistically combining the representational power of multiple deep fusion models with the global optimization capability of EAOO-GA, we achieve a system that is not only more accurate than its individual components but also inherently robust and reliable. This approach underscores the significant potential of integrating advanced meta-heuristic optimization with deep ensemble learning to tackle critical challenges in medical image analysis.
Integration of explainable AI (XAI)
The term “explainable artificial intelligence” (XAI) refers to a range of methods designed to elucidate and illustrate how complex AI models make decisions. In this paper, we employed the Gradient-weighted Class Activation Mapping (Grad-CAM) technique to enhance the interpretability of our proposed cervical cancer classification framework. Grad-CAM generates class-specific heatmaps that visually highlight the most influential regions in input images, enabling clinicians to verify the model’s focus and improve transparency in the diagnostic process68.
Grad-CAM: This technique generates a local visual explanation by utilizing the target class gradients flowing to the last convolutional layer, creating an approximate localization map at the end of the prediction stage, as described in Eq. (33).In this expression, denotes the class activation map associated with category . The operator ensures that only positively contributing features are retained by zeroing out negative values. refers to the activation output from the -th channel of the final convolutional layer, while signifies the importance score of this channel for the target class . This score is determined by computing the spatial average of the gradient of the class score with respect to the feature map:Here, is the network’s raw output (logit) for class , and represents the activation value at position in the -th feature map. The denominator corresponds to the total number of spatial locations in the feature map and acts as a normalization factor. The resulting heatmap visually emphasizes image regions most influential in the network’s decision-making process for class .
Computational complexity
The proposed SE-FusionEAOO Ensemble framework involves several components, each contributing to the overall time complexity. Feature extraction from the pre-trained CNN pairs dominates during training and inference, with a per-image complexity of for convolutional operations, where N is the batch size, C the number of channels, and H/W the spatial dimensions (typically ). The SE blocks add a lightweight overhead of per block, with r the reduction ratio (default 16), ensuring minimal impact. SMOTE for handling class imbalance has a preprocessing time complexity of , where D is the feature dimensionality and M the number of minority samples. The EAOO-GA metaheuristic for weight optimization has a time complexity of , where P is the population size, G the number of generations, E the time for ensemble prediction on the dataset (proportional to the sum of base model complexities), and F the fitness computation. Overall, the framework’s time complexity is dominated by the CNN forward passes and EAOO-GA iterations; however, it remains practical for medical imaging tasks due to the transfer learning and parallel computation capabilities.
Mathematical formulation of the proposed framework
Let denote an input CT image and represent the deep feature vector extracted from the pretrained backbone (). For each fusion branch, feature maps from two complementary networks are concatenated and refined through an SE block to obtain channel-wise recalibrated features:where denotes concatenation and represents the excitation function defined by:with being global average pooling, the ReLU activation, the sigmoid gating, and , learnable parameters. The ensemble output combines the softmax probabilities of M fusion experts, each producing a class-score vector :To determine the optimal contribution of each fusion model, the EAOO-GA algorithm minimizes the cross-entropy-based objective function:where is the ground-truth label and is the predicted probability for class c. Within EAOO-GA, the population of candidate weight vectors evolves iteratively via the animated oat dynamic operators (Eqs. 16–25) and the incorporated genetic crossover and mutation (Eqs. 27–29).
The best solution at convergence satisfies:yielding the final optimized ensemble decision:This mathematical formulation clarifies how the proposed framework unifies multi-model feature fusion, attention-based recalibration, and metaheuristic weight optimization into a coherent and analytically defined learning pipeline.
This study presents the SE-FusionEAOO Ensemble, a multi-stage hybrid framework developed to achieve state-of-the-art accuracy, robustness, and interpretability in lung cancer detection from CT scans. The overall system architecture is depicted in Fig. 1. The proposed methodology proceeds through the following structured stages: (1) Dataset preprocessing, augmentation, and class imbalance mitigation using SMOTE (Section 3.1); (2) Incorporation of Squeeze-and-Excitation (SE) blocks for adaptive channel recalibration (Section 3.2); (3) Rigorous evaluation and selection of six top-performing pre-trained models from eight candidates via transfer learning (Section 3.3); (4) Development and deployment of the proposed Enhanced Animated Oat Optimization with Genetic Operators (EAOO-GA) for optimal ensemble weight determination (Section 3.4, Algorithm 1); (5) Construction of three SE-enhanced fusion architectures by strategically pairing the selected models: DenseNet201 + EfficientNetB6, InceptionV3 + MobileNetV2, and DenseNet121 + ResNet50, each enhanced with SE modules (Section 3.5, Fig. 5, 6); (6) Weighted aggregation of predictions from the three fusion models using EAOO-GA-optimized weights to produce the final classification output (Section 3.6, Algorithm 2); and (7) Integration of Grad-CAM for model interpretability and visualization of decision-relevant regions (Section 3.6.2).
Dataset description and preprocessing
Dataset description
This paper employs the publicly available IQ-OTH/NCCD lung cancer dataset54, a well-established benchmark for the development of CAD systems. The dataset comprises CT scan images categorized into three critical classes: Benign, Malignant, and Normal, as shown in Fig. 2. A significant challenge posed by this dataset is the inherent heterogeneity in image dimensions. A detailed breakdown of the image size distribution per class, provided in Table 2, reveals this complexity. While the majority of images ( pixels) form a homogeneous subset, there are notable exceptions that necessitate meticulous preprocessing. Specifically, the Malignant class contains images sized and , and a single image of . The Normal class also contains a single outlier sized . This precise understanding of the data structure is crucial for designing tailored preprocessing steps to ensure spatial uniformity for model input. Resizing and standardization strategies must account for these variations to prevent the loss of critical diagnostic information or the introduction of distortions.
The overall class distribution is also a key consideration. As shown in Table 2, the Malignant class is the most populous, followed by Normal and then Benign cases. This imbalance is a common characteristic of medical imaging datasets and must be addressed during model training and evaluation to prevent algorithmic bias towards the majority class.
The IQ-OTH/NCCD dataset, while a valuable resource, is subject to potential biases common in medical imaging datasets. These biases must be acknowledged as they impact the generalizability of trained models. A primary concern is the limited documentation regarding patient demographics (e.g., age, gender, ethnicity) and the variety of imaging equipment used. A homogeneity in these factors could limit model performance when applied to broader, more diverse populations or data from different scanners. Furthermore, as detailed in Table 2, the dataset exhibits a pronounced class imbalance, characterized by a substantial overrepresentation of Malignant instances relative to Benign cases. This disparity introduces a significant risk of algorithmic bias, predisposing the model to overfit the prevalent majority class and consequently impairing its predictive accuracy for the critically important minority class, a scenario that is untenable in medical diagnostics, where equitable performance across all pathologies is mandatory. To counteract this bias and fortify model generalizability, we implemented a dual strategy: comprehensive data augmentation to increase phenotypic diversity, and the Synthetic Minority Over-sampling Technique (SMOTE) to strategically oversample the Benign class. Furthermore, model validation was conducted on external cohorts to ensure robustness and clinical applicability across diverse patient demographics.
Furthermore, the proposed SE-FusionEAOO framework was performed using the publicly available Lung Image Database Consortium and Image Database Resource Initiative (LIDC-IDRI) dataset55. This dataset is widely adopted for developing and benchmarking CAD systems for pulmonary nodule detection and classification. It consists of thoracic CT scans from 1,018 subjects, each annotated by four experienced radiologists who provided nodule-level assessments for lesions with diameters 3 mm. Each annotation includes clinically relevant features such as malignancy likelihood, texture, sphericity, and subtlety, which are strongly correlated with cancer risk. For this study, we excluded nine CT cases exhibiting inconsistent slice spacing and 121 cases with slice thicknesses 3 mm, resulting in a final cohort of 888 valid scans. Nodules were categorized as benign or malignant based on consensus malignancy ratings among the radiologists. The retained scans had slice thicknesses ranging from 0.6 mm to 2.5 mm and in-plane resolutions between 0.48 mm and 0.72 mm. This dataset is widely adopted for developing and benchmarking computer-aided diagnosis (CAD) systems for pulmonary nodule detection and classification. It consists of thoracic CT scans from 1,018 subjects, each annotated by four experienced radiologists who provided nodule-level assessments for lesions with diameters 3 mm. Each annotation includes clinically relevant features such as malignancy likelihood, texture, sphericity, and subtlety, which are strongly correlated with cancer risk. For this study, we excluded nine CT cases exhibiting inconsistent slice spacing and 121 cases with slice thicknesses 3 mm, resulting in a final cohort of 888 valid scans. Nodules were categorized as benign or malignant based on consensus malignancy ratings among the radiologists. The retained scans had slice thicknesses ranging from 0.6 mm to 2.5 mm and in-plane resolutions between 0.48 mm and 0.72 mm. Importantly, the LIDC-IDRI dataset was used for external validation to ensure true out-of-distribution generalization, as model training and optimization were performed exclusively on the IQ-OTH/NCCD dataset.
Data preprocessing
The preprocessing pipeline used in the proposed framework is shown in Fig. 3.
Normalization serves as a vital preprocessing step, primarily for scaling images and standardizing pixel intensities. This standardization promotes stable and efficient model convergence during training. Initially, each input image undergoes intensity normalization, rescaled to a [0, 1] range using min-max normalization, as defined by the following equation56:where I denotes the input image with pixel values bounded by and , and represents the normalized image with values scaled to the range . Subsequently, each image in the dataset was resized to a spatial resolution of pixels to ensure dimensional consistency across all samples.
Data augmentation
To enhance model robustness, mitigate overfitting, and ensure generalization to unseen data, a comprehensive suite of data augmentation techniques was employed. These transformations were carefully selected to mimic real-world variations inherent in medical imaging. Furthermore, the pronounced class imbalance within the dataset was addressed using synthetic oversampling. The augmentation pipeline included the following transformations, applied randomly during training:Rotation: Images were randomly rotated by angles between and 10 degrees to simulate different anatomical orientations encountered during scanning.
Translation: Images were shifted horizontally and vertically by up to of their dimensions to account for variations in lung positioning within the scanner’s field of view.
Horizontal Flipping: Mirror images were generated to introduce natural variability in image presentation, a common occurrence in clinical settings.
Gaussian Blurring: A kernel was applied with an automatically calculated standard deviation to simulate slight focus variations and reduce image noise, thereby conditioning the model to handle practical diagnostic challenges without distorting critical lung tissue details.
Handling class imbalance
Initial data analysis revealed a significant class imbalance, with ’Malignant’ cases substantially outnumbering ’Benign’ and ’Normal’ cases (Table 3). To prevent model bias towards the majority class, the SMOTE was employed. SMOTE synthesizes new examples in the feature space of the minority classes by interpolating between existing instances, effectively balancing the class distribution and enriching the dataset with diverse examples of underrepresented classes.
This balanced distribution, achieved through SMOTE and visualized in Fig. 4, ensures all classes have equal representation during training. This prevention of model bias towards the majority class (‘Malignant’) forces the learning of discriminative features across all categories, Benign, Malignant, and Normal, with improved accuracy. Consequently, this strategy significantly enhances the model’s overall diagnostic reliability and generalizability across all case types.
Squeeze-and-excitation blocks
To enhance the feature representation and discriminative power of our CNN, we integrated SE blocks, a foundational architectural unit introduced by57. These blocks act as a self-attention mechanism that dynamically recalibrates channel-wise feature responses, enabling the network to autonomously emphasize informative features and suppress less relevant or redundant ones. The SE block operates through a sequential two-phase process: squeeze and excitation. The squeeze phase employs global average pooling to aggregate spatial information from each feature map. This operation transforms a feature map tensor into a channel-wise descriptor vector , effectively capturing the global distribution of responses for each channel:Subsequently, the excitation phase processes this descriptor using a simple multi-layer perceptron (MLP), typically comprising two fully connected layers, to model nonlinear interactions and dependencies between channels. This MLP acts as a gating mechanism, generating a vector of adaptive scaling weights where each value is between 0 and 1:Where, and are the learned weights of the MLP, denotes the ReLU activation function, and is the sigmoid activation function that normalizes the output. The final output of the SE block is produced by rescaling the original feature map channel-wise with the learned weights: .
To clarify the technical integration: SE blocks are inserted post-feature extraction in each fusion pair, dynamically modeling channel interdependencies to prioritize malignancy-indicative patterns (e.g., nodule texture) while suppressing noise, as formalized in Eqs. (6)-(7). This enhances both accuracy and interpretability by revealing the importance of features.
Rationale and integration of SE for lung cancer detection in feature fusion
In the specific context of lung cancer detection from CT scans, this adaptive feature recalibration mechanism is critically advantageous. The network learns to assign higher weights to feature channels that encode semantically salient patterns indicative of malignant nodules, such as spiculation, lobulation, or ground-glass opacity, while attenuating channels associated with normal parenchyma or benign structures. This selective emphasis not only bolsters classification accuracy by focusing on the most discriminative evidence but also inherently improves the model’s interpretability. By examining the excitation weights, one can gain insights into which visual features the model deems most significant for its diagnosis, thereby fostering greater clinical trust and verification. This capability is further amplified by the strategic integration of SE blocks within our feature fusion architecture. Rather than merely concatenating features from different backbone networks, the SE mechanism actively recalibrates the combined feature maps post-fusion. This induces a powerful complementary effect, prioritizing and amplifying the most salient diagnostic features from each network while suppressing redundant or contradictory information. Consequently, the final fused representation is not only comprehensive but also optimally attuned to the specific task of lung nodule classification.
Deep feature extraction via transfer learning
In the domain of medical image analysis, particularly for lung cancer detection, the scarcity of large, annotated datasets presents a significant challenge. Training CNNs from scratch under such constraints is computationally prohibitive and highly susceptible to overfitting, as it demands substantial data volumes to achieve optimal performance58. To overcome this fundamental limitation, transfer learning emerges as a pivotal strategy. This approach enables us to leverage the rich, hierarchical feature representations learned by state-of-the-art CNN models pre-trained on large-scale datasets (e.g., ImageNet)59. Rather than learning from random initialization, we adapt these powerful feature extractors to our specific task of lung nodule classification. This paradigm leverages generalized visual knowledge (e.g., edge detection, texture patterns, shape recognition) acquired from natural images, which proves highly transferable to medical imaging domains.
In this study, we meticulously evaluated a diverse suite of modern architectures renowned for their efficacy in image classification. Through rigorous empirical analysis on our target dataset, we identified the six top-performing models to serve as the foundational feature extractors for our ensemble framework. The selected models are: EfficientNetB6, Inception v3, ResNet50, DenseNet201, DenseNet121, and MobileNetV2. This selection ensures a blend of architectural diversity covering residual connections, dense connectivity, compound scaling, and inverted residuals, which is crucial for capturing a comprehensive spectrum of discriminative features from lung CT scans. By initializing our framework with these pre-trained weights and subsequently optimizing them on the lung cancer dataset, we effectively harness the power of transfer learning.
EfficientNetB6-based model implementation
EfficientNet-B6 is recognized for delivering strong classification accuracy while maintaining computational efficiency, owing to its innovative compound scaling technique. This strategy uniformly scales the network’s depth, width, and input resolution, resulting in balanced performance improvements without unnecessary computational overhead. At the heart of the architecture is the Mobile Inverted Bottleneck Convolution (MBConv) block, which integrates SE mechanisms to adaptively recalibrate channel-wise feature responses. This feature refinement enables the model to better emphasize relevant patterns and suppress less critical information, enhancing its effectiveness in detecting nuanced features within cervical cytology images60.
EfficientNet’s robustness persists even when operating on lower-resolution images, making it a strong candidate for applications involving variable image quality. The architecture’s scalability is guided by a set of compound coefficients: a global scaling factor , and constants , , and , which define how resolution, width, and depth are scaled from the baseline configuration , , and , respectively. These relationships are mathematically described in Eqs. (8), (9), and (10), ensuring a systematic approach to model scaling.
Inception v3 based model implementation
The Inception V3 model, developed by Google, represents an advanced version of the original Inception network. It was designed to increase classification accuracy while maintaining computational efficiency. The core idea behind the Inception architecture is to approximate an optimal sparse structure by transforming it into a dense one, where similar non-uniform sparse elements are clustered together. This approach enables the formation of deeper and wider networks without significantly increasing computational costs, as described in61. The architectural design of Inception V3 is based on several guiding principles: Locality of information: Many signals in the input space tend to be spatially correlated. This allows for the use of smaller convolution filters to capture meaningful patterns. By applying dimension reduction (e.g., using convolutions) before expensive operations, computational efficiency can be improved without significant information loss.
Balanced network expansion: To effectively utilize available computational resources, both the depth (number of layers) and width (number of filters) of the network should be increased in tandem. This enables the model to learn richer and more diverse feature representations.
Gradual dimensional reduction: Aggressive reduction in feature map dimensions at early stages of the network is discouraged, as it may lead to the loss of critical information. Instead, gradual downsampling is preferred to preserve spatial details during the early learning phase.
Wide layer efficiency: Wider layers tend to learn faster and are especially beneficial at deeper levels of the network, where abstract feature representations are formed.
ResNet50 based model implementation
ResNet5062, a popular and deeper variant of ResNet, comprises 50 layers and was initially trained on the ImageNet dataset, which includes over a million labeled images. Its architecture involves batch normalization layers, non-linearities, and a combination of identity and convolutional residual blocks. ResNet50 is frequently paired with other architectures, such as Highway Networks, which learn skip connections through additional weight matrices.
The network begins with a conv1 layer consisting of 64 filters using a kernel, followed by a max-pooling operation with a stride of 2. The subsequent layers conv2 through conv5 are composed of residual blocks. Each block includes three convolutional layers: two convolutions (first for dimensionality reduction, the second for restoration) and a convolution sandwiched between them, as described in Eq. 11. This structure is iteratively applied within each residual block, with the final convolutional stage feeding into an average pooling layer and a fully connected layer. Classification is performed using a Softmax activation function.
DenseNet-121 based model implementation
The DenseNet-121 architecture63 is a key advancement in deep learning for computer vision, designed to enhance feature reuse and mitigate the vanishing gradient problem in very deep networks. Its hallmark is the dense connectivity pattern, where each layer receives feature maps from all preceding layers and passes its outputs to all subsequent ones. This design provides implicit deep supervision, strengthening feature propagation. The network is structured into multiple dense blocks, where each layer applies batch normalization, a Rectified Linear Unit (ReLU), and a convolution, with outputs concatenated to preserve low-level details while building high-level representations. Between dense blocks, transition layers consisting of a convolution for feature compression followed by average pooling regulate spatial dimensions and computational cost. Overall, DenseNet-121 achieves strong performance while maintaining parameter efficiency through its compact yet expressive design.
DenseNet-201 based model implementation
Building upon the foundational DenseNet architecture63, DenseNet-201 represents a deeper and more powerful variant designed to further enhance feature representation and gradient flow in very deep convolutional networks. While it shares the core principles of dense connectivity and feature reuse with DenseNet-121, its increased depth makes it particularly adept at capturing complex, hierarchical patterns in medical images. The architecture is structured around four dense blocks interconnected by transition layers. The key differentiator from DenseNet-121 is its greater depth, achieved by incorporating more layers within these dense blocks. This enables the network to learn a more comprehensive hierarchy of features, ranging from simple edges and textures in the early layers to complex morphological structures in the deeper layers.
The hallmark dense connectivity remains central to its design: each layer within a block receives concatenated feature maps from all preceding layers and passes its own outputs to all subsequent layers. This design ensures:Maximized Feature Reuse: Low-level features, such as tissue textures or nodule edges, are preserved and utilized throughout the network, improving representational efficiency.
Alleviated Vanishing Gradient: The direct connections provide shortened paths for gradient flow during backpropagation, ensuring stable training even at this significant depth.
Implicit Deep Supervision: The feature reuse acts as a form of regularization, reducing the risk of overfitting on limited medical datasets.
MobileNetV2 based model implementation
To incorporate a highly efficient architecture into our ensemble, we selected MobileNetV264. This model was specifically designed for mobile and embedded vision applications, offering an excellent trade-off between computational efficiency and accuracy. Its core innovation is the inverted residual with a linear bottleneck structure. Unlike traditional residual blocks that first compress and then expand feature maps, MobileNetV2’s inverted residual block first expands the channel dimension using a lightweight convolution. This expansion is followed by a depthwise convolution that filters the expanded features, and a final convolution that projects the features back to a lower-dimensional representation. This design significantly reduces computational cost and model size while preserving critical information. A key feature of this architecture is the use of linear bottlenecks. Instead of applying a non-linear activation function (such as ReLU) after the final projection layer, a linear activation function is used. This prevents non-linearities from destroying important information in the low-dimensional space, leading to more stable and representative features.
Proposed enhanced animated oat optimization algorithm with genetic operators (EAOO-GA)
While the standard Animated Oat Optimization (AOO) algorithm demonstrates innovative inspiration from natural phenomena, our empirical analysis reveals certain limitations that can hinder its performance on complex optimization problems, such as those encountered in medical image analysis. Primarily, the algorithm can occasionally exhibit a tendency towards premature convergence, where the population loses diversity too rapidly and becomes trapped in local optima. This is often attributed to its strong exploitation bias in the later stages, without a sufficient mechanism to reintroduce lost diversity or to escape suboptimal regions once discovered. To mitigate these limitations and enhance the algorithm’s robustness and global search capability, we propose an Enhanced Animated Oat Optimization (EAOO-GA) algorithm. The core innovation lies in the strategic integration of genetic operators, including crossover and mutation, into the standard AOO workflow. This hybrid approach leverages the strength of AOO’s bio-inspired exploration and exploitation while bolstering its population diversity through well-established evolutionary mechanisms.
Standard AOO algorithm
The AOO is a new metaheuristic algorithm that emulates three specific behaviors observed in the natural conduct of animated oats within their environment65. Initial seed displacement through environmental forces, including wind, water currents, and animal-mediated transport.
Moisture-induced morphological changes in the seed’s awn structure generate mechanical forces that produce rolling motion across terrestrial surfaces.
Kinetic energy accumulation during locomotion phases, with obstacle interactions triggering mechanical energy release mechanisms for enhanced dispersal range.
Initialization AOO initializes its population by generating a set of random solutions, as formalized in Eq. 12The position of each individual in the subgroup is denoted by . The subpopulation size and the problem’s dimension are given by N and Dim, respectively. Each element x
i, j of the position matrix is computed using Eq. 13:where r is a random number between 0 and 1.
Parameter Calculation The dynamic behavior of the animated oat during dispersal is characterized by three key biomechanical parameters: seed mass, eccentric rolling coefficient, and primary awn length. These parameters are algorithmically computed as follows:where m is the seed mass, N is the population size, L is the primary awn length, e is the eccentric rotation coefficient, t is the current iteration, and c is a dynamic adjustment factor.
Exploration Phase Following abscission, the dispersal of animated oat seeds is primarily driven by environmental vectors such as wind, water, or fauna. The stochastic nature of this process facilitates extensive exploration of the search space. The corresponding position update is defined as:where and denote the positions of the individual and the best individual in the population at iteration , respectively.
Exploitation Phase The remaining seeds are partitioned into two subsets based on the presence of obstacles during dispersal, assuming both outcomes are equiprobable. In the absence of obstructions, seeds undergo hygroscopic rolling driven by moisture-induced stress gradients. This motion is modeled using curvature-induced snap buckling, inspired by the work of Lindtner et al.66, which demonstrated that anisotropic swelling is governed by cellulose microfiber orientation. The rolling dynamics are mathematically represented through torque equations and eccentric rotation:where is a random matrix of size Dim with entries uniformly distributed in . The mean value in the Lévy flight is a random number between 0 and 1, typically used to adjust the step size. This stochasticity helps regulate the direction and distance of movement during flight. The scale parameter controls the width of the step length distribution, defining the range of possible step sizes. The current velocity vector represents the motion state of the particle or individual. The stability parameter governs the shape of the distribution, influencing the diversity and randomness of step lengths. The gamma function generalizes the factorial function to continuous values. For Lévy flights, is typically set to 1.5. The position update is defined as follows:where denotes the launch angle relative to the ground, is the air drag coefficient governing aerial motion, is a uniformly sampled random value, k quantifies the elasticity of the primary awn, and x represents the change in awn length due to elastic energy storage prior to release.
EAOO-GA algorithm
To address the limitations of the standard AOO, we introduce two genetic operators applied after the core AOO update procedures. These operators serve as a diversity-preserving mechanism, enabling the algorithm to avoid premature convergence and explore the search space more effectively.
Crossover Operator We employ a differential crossover mechanism to facilitate the exchange of information between individuals. For each target individual in the population, a trial vector is generated by combining components from the target vector and a mutant vector created from three distinct randomly selected individuals:where , , are distinct random individuals, F is a scaling factor, CR is the crossover rate, and ensures at least one parameter is inherited from the mutant vector.
Mutation Operator A non-uniform mutation strategy is applied to introduce random perturbations, with decreasing magnitude over iterations to transition from exploration to exploitation:where calculates the mutation step size that decreases over time:Here, r is a random number in [0, 1], T is the maximum iterations, and b determines the non-uniformity degree.
Selection and Integration The genetic operators are applied after AOO’s exploitation phase. A greedy selection mechanism determines whether newly generated solutions replace existing ones:This hybrid approach maintains AOO’s bio-inspired search capabilities while enhancing its global optimization performance through evolutionary mechanisms. The pseudocode of EAOO-GA is provided in Algorithm 1.
The feature fusion framework
In this study, we propose an optimized ensemble framework that extends beyond traditional model aggregation. Rather than merely combining outputs from individual models, we employ feature fusion architectures that integrate complementary representations from multiple deep learning backbones to enhance accuracy. Three tailored fusion architectures: Fusion 1, Fusion 2, and Fusion 3 were developed, each designed to exploit the distinctive strengths of different pre-trained models.
The selection of models for these architectures followed a systematic process designed to maximize both diversity and efficiency. To prevent redundancy and capture a broader range of feature representations, models from the same architectural family were not combined. Instead, our fusion strategy paired networks of differing capacities and inductive biases, specifically, coupling a high-capacity model with a more lightweight counterpart to balance diagnostic accuracy with computational cost. The resulting three fusion pairs are as follows:Fusion 1: DenseNet201 + EfficientNetB6
Fusion 2: Inception v3 + MobileNetV2
Fusion 3: DenseNet121 + ResNet50
The feature extraction process begins by propagating input images through the pre-trained models within each fusion pair. The resulting feature maps are then concatenated, integrating the distinct hierarchical patterns learned by each network. This systematic selection process yielded an initial pool of eight pre-trained models. Through rigorous evaluation, the six top-performing models were identified and strategically paired to form the three distinct fusion architectures, as illustrated in the model selection pipeline (Fig. 5). To strengthen the discriminative capacity of the fused representations, we incorporate an SE block. This mechanism adaptively recalibrates channel-wise responses, enabling the network to highlight informative features, such as texture and morphological patterns associated with malignancy, while suppressing irrelevant ones. Such dynamic feature emphasis is especially vital in lung cancer detection, where subtle yet critical visual cues must be accurately identified.
The recalibrated features from the SE block are subsequently subjected to Global Average Pooling, which reduces dimensionality while preserving salient feature information and enhancing robustness to spatial translations. A Dropout layer is strategically inserted following this operation to serve as a regularization mechanism, effectively mitigating overfitting by preventing complex co-adaptations of features during training. The network culminates in a softmax activation layer, which generates the final probability distribution across the target diagnostic classes for each fusion pathway. The block diagram of the feature fusion architectures is shown in Fig. 6. This feature fusion methodology is fundamentally motivated by the need to compensate for the performance variability and inherent limitations of individual models. By harnessing complementary representations from multiple architectural families, our ensemble framework achieves synergistic improvements in predictive accuracy, generalization capability, and operational robustness for pulmonary nodule classification. The integrated approach provides three distinct advantages: (1) superior classification performance through multi-perspective feature integration, (2) expanded representational capacity for capturing heterogeneous pathological patterns, and (3) inherent interpretability facilitated by the SE block’s channel-wise attention mechanism, which collectively contribute to more reliable and clinically actionable diagnostic support.
Proposed SE-FusionEAOO ensemble framework for lung cancer detection
In this section, we present a novel hybrid optimized ensemble framework tailored for accurate and reliable lung cancer detection from CT scans. Ensemble learning is particularly advantageous in medical diagnostics, as it enhances predictive accuracy and robustness by combining the strengths of multiple diverse models67. Such reliability is critical in lung cancer detection, where diagnostic errors may have serious consequences. The proposed framework is designed to achieve superior classification performance through two stages: (i) constructing three SE-enhanced fusion models as strong base learners, and (ii) aggregating their outputs using an optimized weighting strategy. The distinguishing feature of our approach lies in employing the Enhanced Animated Oat Optimization algorithm (EAOO-GA), described in Section 3.4, to fine-tune the ensemble weights. This metaheuristic efficiently explores the global search space to identify the optimal weight configuration, ensuring that the most informative models contribute more prominently to the final decision. As a result, the system achieves higher reliability and diagnostic precision. The overall architecture of the proposed ensemble is depicted in Fig. 1.
Ensemble construction and optimization process
The process for building and optimizing our ensemble is methodically outlined below and summarized in Algorithm 2.
Step 1: Base Model Training and Feature Fusion Three separate fusion models (Fusion 1, Fusion 2, Fusion 3) are constructed by pairing different pre-trained architectures (DenseNet201+EfficientNetB6, Inception v3+MobileNetV2, DenseNet121+ResNet50). Each pair is integrated using concatenation and enhanced with SE blocks for adaptive feature recalibration, as described in Section 3.2. These models are trained on the lung cancer dataset to serve as expert feature extractors and classifiers.
Step 2: Prediction Generation Each of the three trained fusion models is used to generate prediction vectors (e.g., class probabilities for ’Benign’, ’Malignant’, ’Normal’) on the validation or test set. Let represent the prediction matrices from Fusion 1, Fusion 2, and Fusion 3, respectively.
Step 3: Fitness Function Definition The core objective of the EAOO algorithm is to find the optimal weight vector that maximizes the accuracy of the weighted ensemble prediction. The fitness function is defined as:where are the true labels, and , .
Step 4: Weight Optimization via EAOO The proposed EAOO algorithm is deployed to solve this optimization problem. The EAOO population consists of candidate weight vectors. The algorithm evolves these candidates over generations through its operations (exploration, exploitation, crossover, mutation) to maximize the fitness function, ultimately converging on the optimal set of weights .
Step 5: Final Ensemble Prediction Once optimized, the final prediction for a new input image is made by a weighted average of the base model predictions using the optimal weights :The class with the highest probability in is selected as the final diagnosis.
Our EAOO-optimized ensemble framework provides a dynamic and highly accurate solution for lung nodule classification. By synergistically combining the representational power of multiple deep fusion models with the global optimization capability of EAOO-GA, we achieve a system that is not only more accurate than its individual components but also inherently robust and reliable. This approach underscores the significant potential of integrating advanced meta-heuristic optimization with deep ensemble learning to tackle critical challenges in medical image analysis.
Integration of explainable AI (XAI)
The term “explainable artificial intelligence” (XAI) refers to a range of methods designed to elucidate and illustrate how complex AI models make decisions. In this paper, we employed the Gradient-weighted Class Activation Mapping (Grad-CAM) technique to enhance the interpretability of our proposed cervical cancer classification framework. Grad-CAM generates class-specific heatmaps that visually highlight the most influential regions in input images, enabling clinicians to verify the model’s focus and improve transparency in the diagnostic process68.
Grad-CAM: This technique generates a local visual explanation by utilizing the target class gradients flowing to the last convolutional layer, creating an approximate localization map at the end of the prediction stage, as described in Eq. (33).In this expression, denotes the class activation map associated with category . The operator ensures that only positively contributing features are retained by zeroing out negative values. refers to the activation output from the -th channel of the final convolutional layer, while signifies the importance score of this channel for the target class . This score is determined by computing the spatial average of the gradient of the class score with respect to the feature map:Here, is the network’s raw output (logit) for class , and represents the activation value at position in the -th feature map. The denominator corresponds to the total number of spatial locations in the feature map and acts as a normalization factor. The resulting heatmap visually emphasizes image regions most influential in the network’s decision-making process for class .
Computational complexity
The proposed SE-FusionEAOO Ensemble framework involves several components, each contributing to the overall time complexity. Feature extraction from the pre-trained CNN pairs dominates during training and inference, with a per-image complexity of for convolutional operations, where N is the batch size, C the number of channels, and H/W the spatial dimensions (typically ). The SE blocks add a lightweight overhead of per block, with r the reduction ratio (default 16), ensuring minimal impact. SMOTE for handling class imbalance has a preprocessing time complexity of , where D is the feature dimensionality and M the number of minority samples. The EAOO-GA metaheuristic for weight optimization has a time complexity of , where P is the population size, G the number of generations, E the time for ensemble prediction on the dataset (proportional to the sum of base model complexities), and F the fitness computation. Overall, the framework’s time complexity is dominated by the CNN forward passes and EAOO-GA iterations; however, it remains practical for medical imaging tasks due to the transfer learning and parallel computation capabilities.
Mathematical formulation of the proposed framework
Let denote an input CT image and represent the deep feature vector extracted from the pretrained backbone (). For each fusion branch, feature maps from two complementary networks are concatenated and refined through an SE block to obtain channel-wise recalibrated features:where denotes concatenation and represents the excitation function defined by:with being global average pooling, the ReLU activation, the sigmoid gating, and , learnable parameters. The ensemble output combines the softmax probabilities of M fusion experts, each producing a class-score vector :To determine the optimal contribution of each fusion model, the EAOO-GA algorithm minimizes the cross-entropy-based objective function:where is the ground-truth label and is the predicted probability for class c. Within EAOO-GA, the population of candidate weight vectors evolves iteratively via the animated oat dynamic operators (Eqs. 16–25) and the incorporated genetic crossover and mutation (Eqs. 27–29).
The best solution at convergence satisfies:yielding the final optimized ensemble decision:This mathematical formulation clarifies how the proposed framework unifies multi-model feature fusion, attention-based recalibration, and metaheuristic weight optimization into a coherent and analytically defined learning pipeline.
Results and discussion
Results and discussion
This section presents a comprehensive evaluation of the proposed SE-FusionEAOO Ensemble framework. We begin by detailing the experimental setup and hyperparameter configurations. The model’s performance is then rigorously assessed on the IQ-OTH/NCCD lung cancer dataset, demonstrating its diagnostic capabilities. To enhance interpretability, we employ Grad-CAM visualizations to elucidate the model’s decision-making process by highlighting critical regions in CT scans. A comparative analysis against both traditional optimization algorithms and state-of-the-art methods further establishes the superiority of the proposed approach. Collectively, these analyses provide compelling evidence for the efficacy and advancement of the SE-FusionEAOO Ensemble model in lung cancer detection.
Implementation environment
All experiments were performed using Google Colab Notebook, an open-source cloud-based development platform that provides a suitable environment for executing ML workflows. The model was implemented using the Keras high-level API, with TensorFlow serving as the backend framework. The hardware configuration included a 10th Generation Intel® Core™ i9 processor, 32 GB of RAM, and a 64-bit Windows 10 operating system for initial code development and testing. Python version 3.6.9 was used as the primary programming language. In addition to Keras and TensorFlow, essential libraries such as NumPy, OpenCV, scikit-learn, and Matplotlib were employed for data preprocessing. To ensure reproducibility and consistent training across all models, the following hyperparameters were uniformly applied to the individual pre-trained models, SE-enhanced fusion architectures, and the final ensemble fine-tuning:Optimizer: Adam (with default , )
Learning Rate: 1e-4 (chosen to balance convergence speed and stability in transfer learning on medical images, following common practice)
Batch Size: 32 (suitable for GPU memory constraints while providing stable gradients)
Epochs: Maximum 100, with Early Stopping (patience=10, monitor=’val_loss’) to prevent overfitting
Loss Function: Categorical Cross-Entropy
Data Split: 70% training, 15% validation, 15% testing (stratified to preserve class distribution)
Random Seed: Fixed at 42 for TensorFlow, NumPy, and Python random operations to ensure reproducible results
Cross-Validation: 5-fold stratified cross-validation used during model selection to robustly rank the eight candidate models
Evaluation metrics
The performance of the proposed framework is evaluated using standard classification metrics, including accuracy, precision, recall, and F1-score. Together, these metrics provide a comprehensive assessment of the model’s classification effectiveness. Table 4 summarizes the metrics used in this study.
Where indicates the number of true positives, refers to true negatives, denotes false positives, and corresponds to false negatives. Additionally, represents the actual label (ground truth) for the instance, is the corresponding predicted label generated by the model, and signifies the total number of data samples used in the evaluation process.
Performance evaluation of basic models and feature fusion impact
To ensure the statistical robustness and reliability of our evaluation, all experiments were conducted using 5-fold stratified cross-validation. Reported values represent the mean () and standard deviation () of each metric across the 5 independent folds. Tables 5 and 6 summarize the results before and after applying data augmentation and SMOTE.
Tables 5 and 6 present a comprehensive comparison of the classification performance for individual baseline networks, proposed feature fusion architectures, and the final EAOO-optimized ensemble model. All reported values represent the mean ± standard deviation over five independent cross-validation folds, ensuring a statistically reliable evaluation.
Before addressing class imbalance (Table 5), the single CNN backbones exhibited noticeable variation in performance. DenseNet201 achieved the highest mean accuracy of and F1-score of , demonstrating strong representational capacity for lung nodule discrimination. Conversely, lighter or less expressive architectures such as VGG19 and Xception achieved considerably lower accuracies of and , respectively, confirming the dependence of diagnostic performance on network depth and architectural suitability. All three proposed fusion models consistently outperformed individual baselines, indicating the advantage of complementary feature integration. Among them, Fusion 1 (DenseNet201 + EfficientNetB6) yielded the best pre-SMOTE performance with accuracy.
After applying data augmentation and SMOTE (Table 6), overall performance improved markedly across all methods, especially for minority classes (“Benign” and “Normal”). The weakest baseline models (VGG19 and Xception) increased their accuracies to and , respectively, yet still lagged behind the hybrid models. The proposed fusion architectures achieved further gains, with Fusion 1 reaching accuracy and F1-score, confirming the effectiveness of combining diverse feature encoders.
Most notably, the proposed SE-FusionEAOO Ensemble attained a mean accuracy of and an F1-score of , outperforming all competing models by a statistically significant margin. The extremely low standard deviation values reflect consistent and stable generalization across folds. These findings demonstrate that the EAOO-GA optimization effectively assigns adaptive weights to the constituent models, maximizing their complementary strengths while mitigating individual weaknesses. Consequently, the framework establishes a new state-of-the-art benchmark on the IQ-OTH/NCCD dataset for lung cancer classification.
Ablation study
To systematically evaluate the contribution of each key component in the proposed SE-FusionEAOO Ensemble and address the reviewer’s request, we conducted an ablation study on the IQ-OTH/NCCD dataset. Table 7 reports the incremental performance gains when progressively adding the main innovations.
The reported results clearly validate the individual and collective contributions of each component within the proposed framework. Basic feature fusion through simple concatenation, without incorporating SE blocks, achieves a modest yet consistent improvement of approximately 0.9% in accuracy by effectively exploiting the complementary representations learned by different backbone architectures. Introducing SE blocks further enhances performance, yielding an additional gain of about 0.1–0.2% over plain fusion, which confirms the importance of adaptive channel-wise recalibration in emphasizing discriminative features while suppressing less informative responses. A substantial performance boost is observed when uniform ensemble averaging is replaced with EAOO-GA optimized weighting, resulting in an accuracy improvement ranging from 1.6% to 2.6%, thereby demonstrating the effectiveness of the proposed meta-heuristic optimization in accurately learning the relative contributions of individual models within the ensemble. Furthermore, applying SMOTE to address class imbalance contributes an additional improvement of approximately 3.2% in accuracy compared to the unbalanced full model, with a notable impact on enhancing sensitivity to minority classes. Collectively, these incremental enhancements elevate the framework from an already strong single-model baseline (95.1%) to state-of-the-art performance (99.4%), empirically justifying the proposed design choices and highlighting the synergistic integration of feature fusion, attention mechanisms, optimized ensemble weighting, and data balancing strategies.
Learning curves analysis
To visualize the training dynamics and performance convergence of the proposed SE-FusionEAOO Ensemble, we present learning curves for accuracy and ROC-AUC on the IQ-OTH/NCCD dataset (averaged over 5-fold cross-validation runs). Training was conducted for up to 100 epochs with early stopping (patience=10 on validation loss). Figure 7 shows the training and validation accuracy over epochs. The curves demonstrate rapid initial improvement, with training accuracy reaching 98% by epoch 30 and converging to 99.4% on the test set, while validation accuracy closely follows, indicating minimal overfitting and effective generalization. This trend highlights the framework’s efficient convergence, with the gap between training and validation curves narrowing to less than 1% after epoch 20, underscoring the role of transfer learning and SE blocks in promoting stable performance trends and robustness to data variability
Figure 8 depicts the multi-class ROC curves for the final model, with AUC values of 0.99 for Normal, 0.99 for Benign, and 0.99 for Malignant classes. The high AUCs confirm the model’s strong discriminative power, particularly for minority classes after SMOTE.
Computational efficiency analysis
To evaluate the computational efficiency and practical feasibility of the proposed framework for clinical deployment, both training and inference costs were systematically analyzed. Training and inference times were averaged over five independent runs to ensure consistency and reproducibility. Table 8 reports the recorded execution times for different model configurations.
The results in Table 8 show that the inclusion of SE modules and feature fusion approximately doubles the training time relative to the best-performing single model (DenseNet201), primarily due to the increased parameter count and dual-backbone processing. The EAOO-GA optimization stage introduces a modest additional cost of about 12 minutes (for a population size of 50 and 100 iterations), which is a one-time offline procedure executed after model training. Despite these additions, the inference time remains consistently low ( seconds per batch), demonstrating real-time feasibility in clinical scenarios. The substantial improvement in diagnostic accuracy—from 95.1% (single model) to 99.4% (optimized ensemble)—is therefore achieved with an acceptable computational overhead, confirming the efficiency and deployability of the proposed framework.
The results in Table 9 complement the empirical runtime findings by quantifying the relative computational demands of each model. Although the fusion networks increase the number of parameters and FLOPs, the observed rise in inference time remains moderate and proportional to the added complexity. The modest computational increase is justified by the substantial improvement in generalization capability and detection accuracy.
Statistical validation of model performance
To ensure the reliability and reproducibility of the reported results, a rigorous statistical validation was performed across 5-fold stratified cross-validation experiments. We applied two complementary statistical tests: the paired t-test to assess the significance of mean performance differences under the assumption of normality, and the non-parametric Wilcoxon signed-rank test to confirm robustness, thereby avoiding reliance on distributional assumptions. Both tests compared each baseline model against the proposed SE-FusionEAOO Ensemble using accuracy and macro-F1 as evaluation metrics. As shown in Table 10, the proposed SE-FusionEAOO Ensemble consistently achieved statistically significant improvements over all single models and intermediate fusion configurations, with in the paired t-tests for both accuracy and macro-F1. Although the Wilcoxon test yielded slightly higher p-values, they remained below the 0.1 threshold in all cases, confirming consistent superiority across folds. These findings confirm that the observed improvements are not due to random variation but reflect genuine performance gains attributable to the ensemble optimization and attention-based fusion strategy.
Hyperparameter settings for optimization algorithms
To ensure a fair, consistent, and reproducible comparative analysis, the hyperparameters for all meta-heuristic optimization algorithms were carefully selected based on established literature recommendations and preliminary tuning experiments on a validation subset. A unified population size of 50 and 100 iterations was adopted across all algorithms to enable equitable comparison of their search capabilities. The search space for ensemble weights was constrained to [0, 1] with the normalization constraint . Table 11 provides details of the settings and their brief justifications.
Performance comparison with state-of-the-art optimization algorithms
A critical aspect of our study was to evaluate the efficacy of the proposed EAOO-GA algorithm against a suite of state-of-the-art meta-heuristic optimizers for the task of ensemble weight optimization. The performance of each algorithm, measured by the final classification accuracy it enabled the ensemble to achieve on the IQ-OTH/NCCD test set, is summarized in Table 12. The results demonstrate the clear superiority of the proposed EAOO-GA algorithm, which achieved the highest accuracy of 99.4%. This signifies its exceptional ability to navigate the complex, high-dimensional search space and find a highly optimal set of ensemble weights. The GWO and SCA delivered strong, competitive performances with accuracies of 96.5% and 96.8% respectively, showcasing their inherent robustness.
Notably, the basic AOO algorithm achieved a respectable accuracy of 95.8%. The significant performance gap of 3.6% between AOO and our enhanced version (EAOO-GA) quantitatively validates the contribution of the integrated genetic operators (crossover and mutation), which crucially bolster exploration and prevent premature convergence , leading to more consistent performance trends across iterations as implied by the superior final metrics. The classic GA and WOA yielded lower performances, with accuracies of 94.2% and 93.1%, indicating their relative inefficiency for this specific optimization landscape. DE recorded the lowest accuracy at 91.7%. The distinct weight configuration discovered by EAOO-GA, as shown in the table, highlights its balanced and effective approach to assigning influence to each fusion model within the ensemble. Its superior performance underscores its potential as a powerful tool for hyperparameter and weight optimization tasks in complex machine learning pipelines.
Comparison with conventional ensemble methods
This subsection presents a comparative analysis evaluating the efficacy of the proposed EAOO-optimized ensemble framework against established conventional ensemble methods. The baseline techniques under investigation include Max, Mean, Weighted Average, Product, and Hard Voting ensembles. As quantitatively demonstrated in Table 13, the experimental results unequivocally establish the superior performance of our metaheuristic-optimized approach over all conventional fusion strategies. The results reveal that while conventional methods provide a baseline improvement over the best individual model (Max Ensemble accuracy of 95.77%), they are fundamentally limited by their static, non-optimized aggregation rules. The Mean, Weighted Average, Product, and Hard Voting ensembles all achieved an identical accuracy of 96.32%, failing to differentiate themselves in this context. In stark contrast, our proposed EAOO-optimized weighted ensemble achieved a significantly higher accuracy of 99.40%, outperforming all conventional techniques by a considerable margin. This substantial performance gain of approximately 3% is not trivial in the medical diagnostics domain, where even fractional percentage improvements can have significant clinical implications.
The superior performance of our method is directly attributed to the integration of the EAOO-GA, which systematically fine-tunes and optimizes the weights assigned to each base model within the ensemble. Unlike static methods that assign weights arbitrarily or based on simple heuristics, EAOO-GA performs a global search to discover a highly effective weight configuration that maximizes the collective decision-making power of the ensemble. This strategic optimization allows the ensemble to leverage the unique strengths of each constituent model while mitigating their individual weaknesses, resulting in a more robust and accurate diagnostic system for lung nodule classification in CT scans.
Comparison with state-of-the-art methods
This section presents a rigorous comparative analysis between the proposed SE-FusionEAOO Ensemble framework and contemporary state-of-the-art methods for lung cancer detection from CT scans. The performance comparison, detailed quantitatively in Table 14, was conducted on the publicly available IQ-OTH/NCCD dataset to ensure a fair and unbiased evaluation. The results unequivocally demonstrate that our approach not only achieves competitive performance but sets a new state-of-the-art benchmark for accuracy in this domain. The proposed SE-FusionEAOO Ensemble attains a remarkable accuracy of 99.40%, surpassing all existing methods included in this comparison. Notably, our framework outperforms recent advanced techniques such as the hybrid 3D-CNN with geometric feature analysis by Safta et al.73 (97.84%) and the ensemble method with weight optimization by Gautam et al.17 (97.23%). It also exceeds the performance of other recent works utilizing advanced preprocessing and integration techniques, such as Shariff et al.26 (98.78% with data augmentation) and Kumaran et al.7 (98.18% with integrated deep learning).
The superior performance of our framework can be attributed to its multi-faceted architectural innovation: the strategic integration of SE blocks enables more discriminative feature representation, the fusion of diverse architectures captures complementary patterns, and the EAOO-GA optimization algorithm precisely determines optimal ensemble weights. Furthermore, the incorporation of SMOTE effectively addresses class imbalance, ensuring robust performance across all categories. This comprehensive approach demonstrates a significant advancement over methods that rely on single-model architectures or less sophisticated ensemble strategies. The consistent outperform across various methodological approaches—including hybrid models, ensemble methods, and advanced preprocessing techniques—validates the effectiveness of our integrated framework. The SE-FusionEAOO Ensemble establishes a new performance benchmark for lung cancer detection on the IQ-OTH/NCCD dataset, highlighting its potential for clinical implementation and future research directions in medical image analysis.
Comparison with existing feature selection methods
While our framework utilizes SE blocks for adaptive feature recalibration rather than explicit selection, Table 15 compares its performance with recent methods that employ feature selection for lung cancer detection from CT scans. Our approach achieves 99.40% accuracy on IQ-OTH/NCCD, outperforming modified DenseNet + FS (95% ), custom CNN hierarchical extraction (93.06%), LBP + CBO-DenseNet (98.17%), and radiomics FS (80-90%). This highlights the benefits of attention mechanisms in DL for higher accuracy and robustness without manual feature engineering.
External validation on LIDC-IDRI dataset
To further validate the robustness and generalization capability of the proposed SE-FusionEAOO framework, an external validation experiment was conducted on the independent LIDC-IDRI dataset, which contains CT scans from 1,018 subjects with radiologist-verified annotations. The goal of this evaluation is to determine whether the proposed model, originally trained and optimized on the IQ-OTH/NCCD dataset, can maintain comparable diagnostic performance on a distinct and heterogeneous clinical dataset. All performance metrics are reported as mean ± 95% confidence interval across five independent runs presented in Table 16.
As shown in Table 16, the proposed SE-FusionEAOO Ensemble maintained outstanding performance on the unseen LIDC-IDRI dataset, confirming its strong generalization ability. Among the individual CNN models, DenseNet201 achieved the best generalization (93.8% accuracy), while lighter architectures such as VGG19 and MobileNetV2 demonstrated lower robustness when exposed to new data, dropping to around 87–89%. The fusion-based architectures significantly enhanced performance through complementary feature aggregation, with Fusion 1 achieving 95.4% accuracy and 95.3% F1-score. Most notably, the proposed SE-FusionEAOO Ensemble outperformed all baselines, attaining 97.9% accuracy and 97.8% F1-score, with narrow confidence intervals, indicating stable predictions across folds.
These findings demonstrate that the ensemble not only delivers superior performance on the training dataset (IQ-OTH/NCCD) but also generalizes effectively to an independent clinical dataset (LIDC-IDRI), confirming its robustness and potential for deployment in real-world CAD systems for lung cancer diagnosis.
Explainability of the proposed E-fusionEAOO ensemble framework using Grad-CAM
To enhance the transparency of the proposed SE-FusionEAOO Ensemble framework, we employed Grad-CAM to generate visual explanations of the model’s decision-making process. This technique produces coarse localization heatmaps that highlight the critical regions in the input CT scan that were most influential for the model’s prediction. Figure 9 presents representative examples for each diagnostic category from the IQ-OTH/NCCD dataset. For malignant cases (Fig. 9b), the Grad-CAM heatmaps demonstrate the model’s precise focus on morphologically suspicious features, particularly spiculated margins and irregular nodule contours, findings that align closely with radiological expertise. This suggests that our framework learns to identify clinically relevant biomarkers of malignancy, rather than relying on spurious correlations.
In benign cases (Fig. 9a), the model exhibits attention patterns centered on homogeneous tissue textures and well-defined nodule boundaries, consistent with radiologists’ assessment criteria for non-cancerous lesions. The normal cases (Fig. 9c) show dispersed attention without concentrated focal points, reflecting the absence of distinctive pathological features. The heatmaps shown in Fig. 9 reveal the model’s focus on clinically relevant features in the IQ-OTH/NCCD dataset, such as irregular nodule edges and density variations for malignant cases (indicative of invasive growth), ground-glass opacities for benign, and uniform texture for normal. This aligns with classification challenges, such as distinguishing subtle asymptomatic tumors, and enhancing interpretability without explicit feature selection, as SE blocks adaptively prioritize these patterns for improved diagnostic relevance.
This section presents a comprehensive evaluation of the proposed SE-FusionEAOO Ensemble framework. We begin by detailing the experimental setup and hyperparameter configurations. The model’s performance is then rigorously assessed on the IQ-OTH/NCCD lung cancer dataset, demonstrating its diagnostic capabilities. To enhance interpretability, we employ Grad-CAM visualizations to elucidate the model’s decision-making process by highlighting critical regions in CT scans. A comparative analysis against both traditional optimization algorithms and state-of-the-art methods further establishes the superiority of the proposed approach. Collectively, these analyses provide compelling evidence for the efficacy and advancement of the SE-FusionEAOO Ensemble model in lung cancer detection.
Implementation environment
All experiments were performed using Google Colab Notebook, an open-source cloud-based development platform that provides a suitable environment for executing ML workflows. The model was implemented using the Keras high-level API, with TensorFlow serving as the backend framework. The hardware configuration included a 10th Generation Intel® Core™ i9 processor, 32 GB of RAM, and a 64-bit Windows 10 operating system for initial code development and testing. Python version 3.6.9 was used as the primary programming language. In addition to Keras and TensorFlow, essential libraries such as NumPy, OpenCV, scikit-learn, and Matplotlib were employed for data preprocessing. To ensure reproducibility and consistent training across all models, the following hyperparameters were uniformly applied to the individual pre-trained models, SE-enhanced fusion architectures, and the final ensemble fine-tuning:Optimizer: Adam (with default , )
Learning Rate: 1e-4 (chosen to balance convergence speed and stability in transfer learning on medical images, following common practice)
Batch Size: 32 (suitable for GPU memory constraints while providing stable gradients)
Epochs: Maximum 100, with Early Stopping (patience=10, monitor=’val_loss’) to prevent overfitting
Loss Function: Categorical Cross-Entropy
Data Split: 70% training, 15% validation, 15% testing (stratified to preserve class distribution)
Random Seed: Fixed at 42 for TensorFlow, NumPy, and Python random operations to ensure reproducible results
Cross-Validation: 5-fold stratified cross-validation used during model selection to robustly rank the eight candidate models
Evaluation metrics
The performance of the proposed framework is evaluated using standard classification metrics, including accuracy, precision, recall, and F1-score. Together, these metrics provide a comprehensive assessment of the model’s classification effectiveness. Table 4 summarizes the metrics used in this study.
Where indicates the number of true positives, refers to true negatives, denotes false positives, and corresponds to false negatives. Additionally, represents the actual label (ground truth) for the instance, is the corresponding predicted label generated by the model, and signifies the total number of data samples used in the evaluation process.
Performance evaluation of basic models and feature fusion impact
To ensure the statistical robustness and reliability of our evaluation, all experiments were conducted using 5-fold stratified cross-validation. Reported values represent the mean () and standard deviation () of each metric across the 5 independent folds. Tables 5 and 6 summarize the results before and after applying data augmentation and SMOTE.
Tables 5 and 6 present a comprehensive comparison of the classification performance for individual baseline networks, proposed feature fusion architectures, and the final EAOO-optimized ensemble model. All reported values represent the mean ± standard deviation over five independent cross-validation folds, ensuring a statistically reliable evaluation.
Before addressing class imbalance (Table 5), the single CNN backbones exhibited noticeable variation in performance. DenseNet201 achieved the highest mean accuracy of and F1-score of , demonstrating strong representational capacity for lung nodule discrimination. Conversely, lighter or less expressive architectures such as VGG19 and Xception achieved considerably lower accuracies of and , respectively, confirming the dependence of diagnostic performance on network depth and architectural suitability. All three proposed fusion models consistently outperformed individual baselines, indicating the advantage of complementary feature integration. Among them, Fusion 1 (DenseNet201 + EfficientNetB6) yielded the best pre-SMOTE performance with accuracy.
After applying data augmentation and SMOTE (Table 6), overall performance improved markedly across all methods, especially for minority classes (“Benign” and “Normal”). The weakest baseline models (VGG19 and Xception) increased their accuracies to and , respectively, yet still lagged behind the hybrid models. The proposed fusion architectures achieved further gains, with Fusion 1 reaching accuracy and F1-score, confirming the effectiveness of combining diverse feature encoders.
Most notably, the proposed SE-FusionEAOO Ensemble attained a mean accuracy of and an F1-score of , outperforming all competing models by a statistically significant margin. The extremely low standard deviation values reflect consistent and stable generalization across folds. These findings demonstrate that the EAOO-GA optimization effectively assigns adaptive weights to the constituent models, maximizing their complementary strengths while mitigating individual weaknesses. Consequently, the framework establishes a new state-of-the-art benchmark on the IQ-OTH/NCCD dataset for lung cancer classification.
Ablation study
To systematically evaluate the contribution of each key component in the proposed SE-FusionEAOO Ensemble and address the reviewer’s request, we conducted an ablation study on the IQ-OTH/NCCD dataset. Table 7 reports the incremental performance gains when progressively adding the main innovations.
The reported results clearly validate the individual and collective contributions of each component within the proposed framework. Basic feature fusion through simple concatenation, without incorporating SE blocks, achieves a modest yet consistent improvement of approximately 0.9% in accuracy by effectively exploiting the complementary representations learned by different backbone architectures. Introducing SE blocks further enhances performance, yielding an additional gain of about 0.1–0.2% over plain fusion, which confirms the importance of adaptive channel-wise recalibration in emphasizing discriminative features while suppressing less informative responses. A substantial performance boost is observed when uniform ensemble averaging is replaced with EAOO-GA optimized weighting, resulting in an accuracy improvement ranging from 1.6% to 2.6%, thereby demonstrating the effectiveness of the proposed meta-heuristic optimization in accurately learning the relative contributions of individual models within the ensemble. Furthermore, applying SMOTE to address class imbalance contributes an additional improvement of approximately 3.2% in accuracy compared to the unbalanced full model, with a notable impact on enhancing sensitivity to minority classes. Collectively, these incremental enhancements elevate the framework from an already strong single-model baseline (95.1%) to state-of-the-art performance (99.4%), empirically justifying the proposed design choices and highlighting the synergistic integration of feature fusion, attention mechanisms, optimized ensemble weighting, and data balancing strategies.
Learning curves analysis
To visualize the training dynamics and performance convergence of the proposed SE-FusionEAOO Ensemble, we present learning curves for accuracy and ROC-AUC on the IQ-OTH/NCCD dataset (averaged over 5-fold cross-validation runs). Training was conducted for up to 100 epochs with early stopping (patience=10 on validation loss). Figure 7 shows the training and validation accuracy over epochs. The curves demonstrate rapid initial improvement, with training accuracy reaching 98% by epoch 30 and converging to 99.4% on the test set, while validation accuracy closely follows, indicating minimal overfitting and effective generalization. This trend highlights the framework’s efficient convergence, with the gap between training and validation curves narrowing to less than 1% after epoch 20, underscoring the role of transfer learning and SE blocks in promoting stable performance trends and robustness to data variability
Figure 8 depicts the multi-class ROC curves for the final model, with AUC values of 0.99 for Normal, 0.99 for Benign, and 0.99 for Malignant classes. The high AUCs confirm the model’s strong discriminative power, particularly for minority classes after SMOTE.
Computational efficiency analysis
To evaluate the computational efficiency and practical feasibility of the proposed framework for clinical deployment, both training and inference costs were systematically analyzed. Training and inference times were averaged over five independent runs to ensure consistency and reproducibility. Table 8 reports the recorded execution times for different model configurations.
The results in Table 8 show that the inclusion of SE modules and feature fusion approximately doubles the training time relative to the best-performing single model (DenseNet201), primarily due to the increased parameter count and dual-backbone processing. The EAOO-GA optimization stage introduces a modest additional cost of about 12 minutes (for a population size of 50 and 100 iterations), which is a one-time offline procedure executed after model training. Despite these additions, the inference time remains consistently low ( seconds per batch), demonstrating real-time feasibility in clinical scenarios. The substantial improvement in diagnostic accuracy—from 95.1% (single model) to 99.4% (optimized ensemble)—is therefore achieved with an acceptable computational overhead, confirming the efficiency and deployability of the proposed framework.
The results in Table 9 complement the empirical runtime findings by quantifying the relative computational demands of each model. Although the fusion networks increase the number of parameters and FLOPs, the observed rise in inference time remains moderate and proportional to the added complexity. The modest computational increase is justified by the substantial improvement in generalization capability and detection accuracy.
Statistical validation of model performance
To ensure the reliability and reproducibility of the reported results, a rigorous statistical validation was performed across 5-fold stratified cross-validation experiments. We applied two complementary statistical tests: the paired t-test to assess the significance of mean performance differences under the assumption of normality, and the non-parametric Wilcoxon signed-rank test to confirm robustness, thereby avoiding reliance on distributional assumptions. Both tests compared each baseline model against the proposed SE-FusionEAOO Ensemble using accuracy and macro-F1 as evaluation metrics. As shown in Table 10, the proposed SE-FusionEAOO Ensemble consistently achieved statistically significant improvements over all single models and intermediate fusion configurations, with in the paired t-tests for both accuracy and macro-F1. Although the Wilcoxon test yielded slightly higher p-values, they remained below the 0.1 threshold in all cases, confirming consistent superiority across folds. These findings confirm that the observed improvements are not due to random variation but reflect genuine performance gains attributable to the ensemble optimization and attention-based fusion strategy.
Hyperparameter settings for optimization algorithms
To ensure a fair, consistent, and reproducible comparative analysis, the hyperparameters for all meta-heuristic optimization algorithms were carefully selected based on established literature recommendations and preliminary tuning experiments on a validation subset. A unified population size of 50 and 100 iterations was adopted across all algorithms to enable equitable comparison of their search capabilities. The search space for ensemble weights was constrained to [0, 1] with the normalization constraint . Table 11 provides details of the settings and their brief justifications.
Performance comparison with state-of-the-art optimization algorithms
A critical aspect of our study was to evaluate the efficacy of the proposed EAOO-GA algorithm against a suite of state-of-the-art meta-heuristic optimizers for the task of ensemble weight optimization. The performance of each algorithm, measured by the final classification accuracy it enabled the ensemble to achieve on the IQ-OTH/NCCD test set, is summarized in Table 12. The results demonstrate the clear superiority of the proposed EAOO-GA algorithm, which achieved the highest accuracy of 99.4%. This signifies its exceptional ability to navigate the complex, high-dimensional search space and find a highly optimal set of ensemble weights. The GWO and SCA delivered strong, competitive performances with accuracies of 96.5% and 96.8% respectively, showcasing their inherent robustness.
Notably, the basic AOO algorithm achieved a respectable accuracy of 95.8%. The significant performance gap of 3.6% between AOO and our enhanced version (EAOO-GA) quantitatively validates the contribution of the integrated genetic operators (crossover and mutation), which crucially bolster exploration and prevent premature convergence , leading to more consistent performance trends across iterations as implied by the superior final metrics. The classic GA and WOA yielded lower performances, with accuracies of 94.2% and 93.1%, indicating their relative inefficiency for this specific optimization landscape. DE recorded the lowest accuracy at 91.7%. The distinct weight configuration discovered by EAOO-GA, as shown in the table, highlights its balanced and effective approach to assigning influence to each fusion model within the ensemble. Its superior performance underscores its potential as a powerful tool for hyperparameter and weight optimization tasks in complex machine learning pipelines.
Comparison with conventional ensemble methods
This subsection presents a comparative analysis evaluating the efficacy of the proposed EAOO-optimized ensemble framework against established conventional ensemble methods. The baseline techniques under investigation include Max, Mean, Weighted Average, Product, and Hard Voting ensembles. As quantitatively demonstrated in Table 13, the experimental results unequivocally establish the superior performance of our metaheuristic-optimized approach over all conventional fusion strategies. The results reveal that while conventional methods provide a baseline improvement over the best individual model (Max Ensemble accuracy of 95.77%), they are fundamentally limited by their static, non-optimized aggregation rules. The Mean, Weighted Average, Product, and Hard Voting ensembles all achieved an identical accuracy of 96.32%, failing to differentiate themselves in this context. In stark contrast, our proposed EAOO-optimized weighted ensemble achieved a significantly higher accuracy of 99.40%, outperforming all conventional techniques by a considerable margin. This substantial performance gain of approximately 3% is not trivial in the medical diagnostics domain, where even fractional percentage improvements can have significant clinical implications.
The superior performance of our method is directly attributed to the integration of the EAOO-GA, which systematically fine-tunes and optimizes the weights assigned to each base model within the ensemble. Unlike static methods that assign weights arbitrarily or based on simple heuristics, EAOO-GA performs a global search to discover a highly effective weight configuration that maximizes the collective decision-making power of the ensemble. This strategic optimization allows the ensemble to leverage the unique strengths of each constituent model while mitigating their individual weaknesses, resulting in a more robust and accurate diagnostic system for lung nodule classification in CT scans.
Comparison with state-of-the-art methods
This section presents a rigorous comparative analysis between the proposed SE-FusionEAOO Ensemble framework and contemporary state-of-the-art methods for lung cancer detection from CT scans. The performance comparison, detailed quantitatively in Table 14, was conducted on the publicly available IQ-OTH/NCCD dataset to ensure a fair and unbiased evaluation. The results unequivocally demonstrate that our approach not only achieves competitive performance but sets a new state-of-the-art benchmark for accuracy in this domain. The proposed SE-FusionEAOO Ensemble attains a remarkable accuracy of 99.40%, surpassing all existing methods included in this comparison. Notably, our framework outperforms recent advanced techniques such as the hybrid 3D-CNN with geometric feature analysis by Safta et al.73 (97.84%) and the ensemble method with weight optimization by Gautam et al.17 (97.23%). It also exceeds the performance of other recent works utilizing advanced preprocessing and integration techniques, such as Shariff et al.26 (98.78% with data augmentation) and Kumaran et al.7 (98.18% with integrated deep learning).
The superior performance of our framework can be attributed to its multi-faceted architectural innovation: the strategic integration of SE blocks enables more discriminative feature representation, the fusion of diverse architectures captures complementary patterns, and the EAOO-GA optimization algorithm precisely determines optimal ensemble weights. Furthermore, the incorporation of SMOTE effectively addresses class imbalance, ensuring robust performance across all categories. This comprehensive approach demonstrates a significant advancement over methods that rely on single-model architectures or less sophisticated ensemble strategies. The consistent outperform across various methodological approaches—including hybrid models, ensemble methods, and advanced preprocessing techniques—validates the effectiveness of our integrated framework. The SE-FusionEAOO Ensemble establishes a new performance benchmark for lung cancer detection on the IQ-OTH/NCCD dataset, highlighting its potential for clinical implementation and future research directions in medical image analysis.
Comparison with existing feature selection methods
While our framework utilizes SE blocks for adaptive feature recalibration rather than explicit selection, Table 15 compares its performance with recent methods that employ feature selection for lung cancer detection from CT scans. Our approach achieves 99.40% accuracy on IQ-OTH/NCCD, outperforming modified DenseNet + FS (95% ), custom CNN hierarchical extraction (93.06%), LBP + CBO-DenseNet (98.17%), and radiomics FS (80-90%). This highlights the benefits of attention mechanisms in DL for higher accuracy and robustness without manual feature engineering.
External validation on LIDC-IDRI dataset
To further validate the robustness and generalization capability of the proposed SE-FusionEAOO framework, an external validation experiment was conducted on the independent LIDC-IDRI dataset, which contains CT scans from 1,018 subjects with radiologist-verified annotations. The goal of this evaluation is to determine whether the proposed model, originally trained and optimized on the IQ-OTH/NCCD dataset, can maintain comparable diagnostic performance on a distinct and heterogeneous clinical dataset. All performance metrics are reported as mean ± 95% confidence interval across five independent runs presented in Table 16.
As shown in Table 16, the proposed SE-FusionEAOO Ensemble maintained outstanding performance on the unseen LIDC-IDRI dataset, confirming its strong generalization ability. Among the individual CNN models, DenseNet201 achieved the best generalization (93.8% accuracy), while lighter architectures such as VGG19 and MobileNetV2 demonstrated lower robustness when exposed to new data, dropping to around 87–89%. The fusion-based architectures significantly enhanced performance through complementary feature aggregation, with Fusion 1 achieving 95.4% accuracy and 95.3% F1-score. Most notably, the proposed SE-FusionEAOO Ensemble outperformed all baselines, attaining 97.9% accuracy and 97.8% F1-score, with narrow confidence intervals, indicating stable predictions across folds.
These findings demonstrate that the ensemble not only delivers superior performance on the training dataset (IQ-OTH/NCCD) but also generalizes effectively to an independent clinical dataset (LIDC-IDRI), confirming its robustness and potential for deployment in real-world CAD systems for lung cancer diagnosis.
Explainability of the proposed E-fusionEAOO ensemble framework using Grad-CAM
To enhance the transparency of the proposed SE-FusionEAOO Ensemble framework, we employed Grad-CAM to generate visual explanations of the model’s decision-making process. This technique produces coarse localization heatmaps that highlight the critical regions in the input CT scan that were most influential for the model’s prediction. Figure 9 presents representative examples for each diagnostic category from the IQ-OTH/NCCD dataset. For malignant cases (Fig. 9b), the Grad-CAM heatmaps demonstrate the model’s precise focus on morphologically suspicious features, particularly spiculated margins and irregular nodule contours, findings that align closely with radiological expertise. This suggests that our framework learns to identify clinically relevant biomarkers of malignancy, rather than relying on spurious correlations.
In benign cases (Fig. 9a), the model exhibits attention patterns centered on homogeneous tissue textures and well-defined nodule boundaries, consistent with radiologists’ assessment criteria for non-cancerous lesions. The normal cases (Fig. 9c) show dispersed attention without concentrated focal points, reflecting the absence of distinctive pathological features. The heatmaps shown in Fig. 9 reveal the model’s focus on clinically relevant features in the IQ-OTH/NCCD dataset, such as irregular nodule edges and density variations for malignant cases (indicative of invasive growth), ground-glass opacities for benign, and uniform texture for normal. This aligns with classification challenges, such as distinguishing subtle asymptomatic tumors, and enhancing interpretability without explicit feature selection, as SE blocks adaptively prioritize these patterns for improved diagnostic relevance.
Advantages, limitations, and future research directions
Advantages, limitations, and future research directions
Advantages
The proposed SE-FusionEAOO Ensemble offers several key advantages over existing methods:Superior accuracy and robustness Achieves state-of-the-art 99.40% accuracy on the IQ-OTH/NCCD dataset, outperforming individual models and conventional ensembles by leveraging diverse fusion pairs and EAOO-GA optimized weighting, as evidenced in Tables 5 and 6.
Enhanced interpretability Integrates SE blocks for channel-wise feature importance and Grad-CAM for visual heatmaps, addressing the “black-box” nature of deep learning and fostering clinical trust.
Effective handling of imbalance SMOTE significantly improves sensitivity to minority classes (Benign/Normal), with pre- and post-SMOTE comparisons showing gains of 3–4% in overall metrics.
Efficient optimization The novel EAOO-GA converges quickly (100 iterations), providing precise weights that boost generalization without prohibitive costs, as shown in the ablation study (Table 7).
Limitations
Despite these strengths, the framework has some limitations:Computational overhead The multi-model fusion and EAOO-GA optimization increase training time ( 78 min + 12 min) compared to single models, though inference remains fast ( sec/batch), as analyzed in Table 8.
Dataset dependency Evaluated solely on IQ-OTH/NCCD; performance may vary on larger or multi-modal datasets due to transfer learning biases from pre-trained models.
Single-objective focus EAOO-GA optimizes for accuracy alone; it does not explicitly balance other metrics like sensitivity or energy efficiency.
Interpretability scope While improved, full clinical explainability (e.g., causal reasoning) is not addressed.
Future research directions and recommendations
To build on this work, we recommend the following:Extend to multi-objective optimization in EAOO-GA (e.g., accuracy + sensitivity + runtime) for more versatile applications.
Validate on larger, multi-modal datasets (e.g., CT + histopathology) and real-world clinical trials to assess generalizability.
Explore lightweight variants (e.g., model pruning) for edge deployment in resource-limited settings.
Integrate advanced interpretability techniques, such as SHAP or counterfactual explanations, for deeper clinical insights.
These directions aim to enhance the framework’s applicability and robustness in practical oncology workflows.
Advantages
The proposed SE-FusionEAOO Ensemble offers several key advantages over existing methods:Superior accuracy and robustness Achieves state-of-the-art 99.40% accuracy on the IQ-OTH/NCCD dataset, outperforming individual models and conventional ensembles by leveraging diverse fusion pairs and EAOO-GA optimized weighting, as evidenced in Tables 5 and 6.
Enhanced interpretability Integrates SE blocks for channel-wise feature importance and Grad-CAM for visual heatmaps, addressing the “black-box” nature of deep learning and fostering clinical trust.
Effective handling of imbalance SMOTE significantly improves sensitivity to minority classes (Benign/Normal), with pre- and post-SMOTE comparisons showing gains of 3–4% in overall metrics.
Efficient optimization The novel EAOO-GA converges quickly (100 iterations), providing precise weights that boost generalization without prohibitive costs, as shown in the ablation study (Table 7).
Limitations
Despite these strengths, the framework has some limitations:Computational overhead The multi-model fusion and EAOO-GA optimization increase training time ( 78 min + 12 min) compared to single models, though inference remains fast ( sec/batch), as analyzed in Table 8.
Dataset dependency Evaluated solely on IQ-OTH/NCCD; performance may vary on larger or multi-modal datasets due to transfer learning biases from pre-trained models.
Single-objective focus EAOO-GA optimizes for accuracy alone; it does not explicitly balance other metrics like sensitivity or energy efficiency.
Interpretability scope While improved, full clinical explainability (e.g., causal reasoning) is not addressed.
Future research directions and recommendations
To build on this work, we recommend the following:Extend to multi-objective optimization in EAOO-GA (e.g., accuracy + sensitivity + runtime) for more versatile applications.
Validate on larger, multi-modal datasets (e.g., CT + histopathology) and real-world clinical trials to assess generalizability.
Explore lightweight variants (e.g., model pruning) for edge deployment in resource-limited settings.
Integrate advanced interpretability techniques, such as SHAP or counterfactual explanations, for deeper clinical insights.
These directions aim to enhance the framework’s applicability and robustness in practical oncology workflows.
Conclusions and future work
Conclusions and future work
This study was motivated by the persistent challenges in automated lung cancer diagnosis, particularly the issues of model generalizability, interpretability, and performance on imbalanced medical datasets. In response, we introduced a novel and comprehensive framework, named the EAOO-GA-Optimized Ensemble, for classifying lung nodules from CT scans. The core of our methodology involved a meticulously designed, two-stage architecture. First, we constructed three powerful feature fusion models by strategically pairing diverse pre-trained architectures: DenseNet201 with EfficientNet-B6, Inception v3 with MobileNetV2, and DenseNet-121 with ResNet-50. This selection was made after rigorous evaluation to ensure maximum architectural diversity, covering residual connections, dense connectivity, compound scaling, and inverted residuals. Each fusion model was further augmented with Squeeze-and-Excitation (SE) blocks, which adaptively recalibrated channel-wise feature responses, allowing the network to dynamically emphasize the most informative patterns indicative of malignancy. The second and most innovative stage involved the intelligent aggregation of these expert models. Rather than relying on simple averaging, we proposed the use of a novel metaheuristic, the Enhanced Animated Oat Optimization algorithm with Genetic Operators (EAOO-GA), to determine the optimal weighting scheme for the ensemble. This algorithm effectively performed a global search to fine-tune the contribution of each fusion model, ensuring that the most accurate and robust predictions were prioritized in the final decision. The entire framework was validated on the IQ-OTH/NCCD lung cancer dataset. A comprehensive pre-processing pipeline, including resizing, normalization, and data augmentation techniques (rotations, flips, etc.), was employed to standardize inputs and enhance model generalization. Crucially, the pervasive issue of class imbalance was directly addressed using the Synthetic Minority Over-sampling Technique (SMOTE), which ensured the model was not biased toward the majority class and improved its sensitivity to ‘Benign’ and ‘Normal’ cases.
The experimental results demonstrated the unequivocal superiority of our proposed framework. It achieved a state-of-the-art accuracy of 99.4%, precision of 99.2%, recall of 99.5%, and F1-score of 99.3%, significantly outperforming all individual base models (e.g., DenseNet201 at 95.1% accuracy, 94.3% precision, 95.8% recall, and 95.0% F1-score), the fusion architectures (e.g., Fusion 1 at 96.5% accuracy, 95.9% precision, 97.0% recall, and 96.4% F1-score), conventional ensemble fusion methods (e.g., Mean, Weighted Average), and other state-of-the-art metaheuristic optimizers (including DE, GWO, WOA, and the basic AOO). Furthermore, the integration of Grad-CAM provided compelling visual explanations of the model’s decision-making process, highlighting its focus on clinically relevant nodule features and thereby enhancing transparency and fostering potential clinical trust. In summary, this work makes a significant contribution to the field of medical image analysis by presenting a robust, accurate, and interpretable system that effectively addresses key limitations of existing deep learning models for lung cancer detection. The synergistic combination of feature fusion, SE attention mechanisms, evolutionary optimization, and strategic data handling provides a powerful blueprint for developing reliable computer-aided diagnostic tools. In real-life applications, the SE-FusionEAOO Ensemble addresses critical challenges in lung cancer detection, such as late-stage diagnoses that drastically reduce survival rates (from 56% early to 5% advanced) and impose heavy economic burdens on healthcare systems. By providing highly accurate (99.40%), interpretable, and efficient CT analysis, it enables earlier intervention in underserved areas, reducing misdiagnoses and treatment costs while improving patient outcomes. For small industries, including medical startups or rural clinics, the framework’s lightweight design (via transfer learning and parallelizable components) lowers barriers to entry, allowing for cost-effective integration on standard hardware without requiring large-scale resources. This fosters innovation in accessible AI diagnostics and supports equitable healthcare delivery.
While this study demonstrates the strong potential of the proposed SE-FusionEAOO framework for accurate and interpretable lung cancer detection, several well-defined avenues remain open for further exploration. Future research will initially focus on multi-modal data integration, incorporating CT-based imaging features with complementary patient information such as demographics, clinical records, and genomic profiles. This direction is expected to yield a more comprehensive and personalized diagnostic model that better captures the multifactorial nature of lung cancer.
Additionally, we aim to extend the ensemble architecture with transformer-based vision models (e.g., Swin Transformer, ViT variants) to further enhance contextual feature representation and robustness across imaging modalities. Beyond performance gains, we will also pursue the development of advanced interpretability frameworks that move beyond Grad-CAM, leveraging attention attribution maps and concept-based explanations to provide more quantitative, clinically actionable insights for radiologists. A critical next step involves conducting large-scale external validation using multi-institutional and heterogeneous datasets, followed by pilot deployment in real-world clinical settings through collaboration with healthcare professionals. This will allow rigorous assessment of the model’s generalizability, clinical reliability, and operational feasibility. Concurrently, the EAOO-GA optimization component will be further refined and benchmarked across diverse optimization problems to verify its adaptability and convergence stability.
Finally, to support practical deployment, the entire framework will be optimized for computational efficiency and resource-awareness, enabling its use in real-time or resource-constrained environments. Given the algorithm’s domain-agnostic design, we also plan to extend the EAOO-GA-optimized ensemble paradigm to other cancer types and medical imaging applications, thus broadening its clinical impact and translational potential.
This study was motivated by the persistent challenges in automated lung cancer diagnosis, particularly the issues of model generalizability, interpretability, and performance on imbalanced medical datasets. In response, we introduced a novel and comprehensive framework, named the EAOO-GA-Optimized Ensemble, for classifying lung nodules from CT scans. The core of our methodology involved a meticulously designed, two-stage architecture. First, we constructed three powerful feature fusion models by strategically pairing diverse pre-trained architectures: DenseNet201 with EfficientNet-B6, Inception v3 with MobileNetV2, and DenseNet-121 with ResNet-50. This selection was made after rigorous evaluation to ensure maximum architectural diversity, covering residual connections, dense connectivity, compound scaling, and inverted residuals. Each fusion model was further augmented with Squeeze-and-Excitation (SE) blocks, which adaptively recalibrated channel-wise feature responses, allowing the network to dynamically emphasize the most informative patterns indicative of malignancy. The second and most innovative stage involved the intelligent aggregation of these expert models. Rather than relying on simple averaging, we proposed the use of a novel metaheuristic, the Enhanced Animated Oat Optimization algorithm with Genetic Operators (EAOO-GA), to determine the optimal weighting scheme for the ensemble. This algorithm effectively performed a global search to fine-tune the contribution of each fusion model, ensuring that the most accurate and robust predictions were prioritized in the final decision. The entire framework was validated on the IQ-OTH/NCCD lung cancer dataset. A comprehensive pre-processing pipeline, including resizing, normalization, and data augmentation techniques (rotations, flips, etc.), was employed to standardize inputs and enhance model generalization. Crucially, the pervasive issue of class imbalance was directly addressed using the Synthetic Minority Over-sampling Technique (SMOTE), which ensured the model was not biased toward the majority class and improved its sensitivity to ‘Benign’ and ‘Normal’ cases.
The experimental results demonstrated the unequivocal superiority of our proposed framework. It achieved a state-of-the-art accuracy of 99.4%, precision of 99.2%, recall of 99.5%, and F1-score of 99.3%, significantly outperforming all individual base models (e.g., DenseNet201 at 95.1% accuracy, 94.3% precision, 95.8% recall, and 95.0% F1-score), the fusion architectures (e.g., Fusion 1 at 96.5% accuracy, 95.9% precision, 97.0% recall, and 96.4% F1-score), conventional ensemble fusion methods (e.g., Mean, Weighted Average), and other state-of-the-art metaheuristic optimizers (including DE, GWO, WOA, and the basic AOO). Furthermore, the integration of Grad-CAM provided compelling visual explanations of the model’s decision-making process, highlighting its focus on clinically relevant nodule features and thereby enhancing transparency and fostering potential clinical trust. In summary, this work makes a significant contribution to the field of medical image analysis by presenting a robust, accurate, and interpretable system that effectively addresses key limitations of existing deep learning models for lung cancer detection. The synergistic combination of feature fusion, SE attention mechanisms, evolutionary optimization, and strategic data handling provides a powerful blueprint for developing reliable computer-aided diagnostic tools. In real-life applications, the SE-FusionEAOO Ensemble addresses critical challenges in lung cancer detection, such as late-stage diagnoses that drastically reduce survival rates (from 56% early to 5% advanced) and impose heavy economic burdens on healthcare systems. By providing highly accurate (99.40%), interpretable, and efficient CT analysis, it enables earlier intervention in underserved areas, reducing misdiagnoses and treatment costs while improving patient outcomes. For small industries, including medical startups or rural clinics, the framework’s lightweight design (via transfer learning and parallelizable components) lowers barriers to entry, allowing for cost-effective integration on standard hardware without requiring large-scale resources. This fosters innovation in accessible AI diagnostics and supports equitable healthcare delivery.
While this study demonstrates the strong potential of the proposed SE-FusionEAOO framework for accurate and interpretable lung cancer detection, several well-defined avenues remain open for further exploration. Future research will initially focus on multi-modal data integration, incorporating CT-based imaging features with complementary patient information such as demographics, clinical records, and genomic profiles. This direction is expected to yield a more comprehensive and personalized diagnostic model that better captures the multifactorial nature of lung cancer.
Additionally, we aim to extend the ensemble architecture with transformer-based vision models (e.g., Swin Transformer, ViT variants) to further enhance contextual feature representation and robustness across imaging modalities. Beyond performance gains, we will also pursue the development of advanced interpretability frameworks that move beyond Grad-CAM, leveraging attention attribution maps and concept-based explanations to provide more quantitative, clinically actionable insights for radiologists. A critical next step involves conducting large-scale external validation using multi-institutional and heterogeneous datasets, followed by pilot deployment in real-world clinical settings through collaboration with healthcare professionals. This will allow rigorous assessment of the model’s generalizability, clinical reliability, and operational feasibility. Concurrently, the EAOO-GA optimization component will be further refined and benchmarked across diverse optimization problems to verify its adaptability and convergence stability.
Finally, to support practical deployment, the entire framework will be optimized for computational efficiency and resource-awareness, enabling its use in real-time or resource-constrained environments. Given the algorithm’s domain-agnostic design, we also plan to extend the EAOO-GA-optimized ensemble paradigm to other cancer types and medical imaging applications, thus broadening its clinical impact and translational potential.
출처: PubMed Central (JATS). 라이선스는 원 publisher 정책을 따릅니다 — 인용 시 원문을 표기해 주세요.
🏷️ 같은 키워드 · 무료전문 — 이 논문 MeSH/keyword 기반
- Reforming the delivery of smoking cessation: a distributional cost-effectiveness analysis of providing smoking cessation as part of targeted lung cancer screening.
- A Phase II Study of Durvalumab, Doxorubicin, and Ifosfamide in Recurrent and/or Metastatic Pulmonary Sarcomatoid Carcinoma (KCSG LU-19-24).
- A herbal formulation inhibits growth and survival of lung cancer cells through DNA damage and apoptosis - in vitro and in vivo studies.
- Negative trial but positive lesson: reframing immunotherapy resistance from one-size-fits-all to precision strategies.
- Lung Cancer Screening in Adults: State-of-the-Art and Policy Mapping (2025).
- Retrospective dosimetric evaluation of the collapsed cone, AAA, and Acuros XB algorithms for lung cancer Halcyon VMAT plans.