Transfer learning with Bayesian optimization for colorectal cancer histopathology classification.
[BACKGROUND] Automated colorectal cancer (CRC) histopathology classification remains challenging due to variations in datasets, staining conditions, and tissue morphology across institutions.
APA
ALGhafri HS, Lim CS (2026). Transfer learning with Bayesian optimization for colorectal cancer histopathology classification.. BMC medical imaging, 26(1), 74. https://doi.org/10.1186/s12880-026-02149-x
MLA
ALGhafri HS, et al.. "Transfer learning with Bayesian optimization for colorectal cancer histopathology classification.." BMC medical imaging, vol. 26, no. 1, 2026, pp. 74.
PMID
41526835
Abstract
[BACKGROUND] Automated colorectal cancer (CRC) histopathology classification remains challenging due to variations in datasets, staining conditions, and tissue morphology across institutions. Many prior studies apply standard CNN architectures with fixed hyperparameters, leaving limited examination of how model choice and optimization strategies affect performance robustness across heterogeneous CRC data.
[METHODS] We evaluate eight transfer learning models on three-class CRC datasets and propose CRC-BayTune, applying Bayesian optimization to tune key training parameters, including learning rate, batch size, with fine-tuning depth. All models are assessed in patch-level experimental settings, and statistical significance is quantified using Friedman tests, repeated-measures ANOVA, and post hoc analyses. Robustness is assessed by introducing controlled Gaussian noise perturbations. Grad-CAM provides qualitative visual explanations by highlighting regions that contribute to model predictions.
[RESULTS] DenseNet201, InceptionV3, InceptionResNetV2, and ResNet50V2 achieved the highest median MCC values of 0.984, 0.982, 0.975, and 0.983, respectively. Statistical analysis confirms that both model architecture ([Formula: see text], Friedman) and hyperparameter configuration ([Formula: see text], RM-ANOVA) significantly affect performance. Models with deeper feature hierarchies demonstrated more stable convergence and smaller accuracy degradation under noise.
[CONCLUSION] The results show that systematic hyperparameter tuning can improve the training stability and classification performance of standard CNN models compared with fixed configurations in CRC histopathology tasks. The findings underscore that model performance in this setting is sensitive to choices such as learning rate, batch size, and fine-tuning depth, and that evaluating these factors explicitly can support more reliable use of deep learning models in computational pathology.
[METHODS] We evaluate eight transfer learning models on three-class CRC datasets and propose CRC-BayTune, applying Bayesian optimization to tune key training parameters, including learning rate, batch size, with fine-tuning depth. All models are assessed in patch-level experimental settings, and statistical significance is quantified using Friedman tests, repeated-measures ANOVA, and post hoc analyses. Robustness is assessed by introducing controlled Gaussian noise perturbations. Grad-CAM provides qualitative visual explanations by highlighting regions that contribute to model predictions.
[RESULTS] DenseNet201, InceptionV3, InceptionResNetV2, and ResNet50V2 achieved the highest median MCC values of 0.984, 0.982, 0.975, and 0.983, respectively. Statistical analysis confirms that both model architecture ([Formula: see text], Friedman) and hyperparameter configuration ([Formula: see text], RM-ANOVA) significantly affect performance. Models with deeper feature hierarchies demonstrated more stable convergence and smaller accuracy degradation under noise.
[CONCLUSION] The results show that systematic hyperparameter tuning can improve the training stability and classification performance of standard CNN models compared with fixed configurations in CRC histopathology tasks. The findings underscore that model performance in this setting is sensitive to choices such as learning rate, batch size, and fine-tuning depth, and that evaluating these factors explicitly can support more reliable use of deep learning models in computational pathology.