Assessing CNNs and LoRA-Fine-Tuned Vision-Language Models for Breast Cancer Histopathology Image Classification.
2/5 보강
OpenAlex 토픽 ·
AI in cancer detection
Advanced Neural Network Applications
Digital Imaging for Blood Diseases
Breast cancer histopathology classification remains a fundamental challenge in computational pathology due to variations in tissue morphology across magnification levels.
APA
Tomiris M. Zhaksylyk, Beibit Abdikenov, et al. (2026). Assessing CNNs and LoRA-Fine-Tuned Vision-Language Models for Breast Cancer Histopathology Image Classification.. Journal of imaging, 12(4). https://doi.org/10.3390/jimaging12040168
MLA
Tomiris M. Zhaksylyk, et al.. "Assessing CNNs and LoRA-Fine-Tuned Vision-Language Models for Breast Cancer Histopathology Image Classification.." Journal of imaging, vol. 12, no. 4, 2026.
PMID
42042511
Abstract
Breast cancer histopathology classification remains a fundamental challenge in computational pathology due to variations in tissue morphology across magnification levels. Convolutional neural networks (CNNs) have long been the standard for image-based diagnosis, yet recent advances in vision-language models (VLMs) suggest they may provide strong and transferable representations for complex medical images. In this study, we present a systematic comparison between CNN baselines and large VLMs-Qwen2 and SmolVLM-fine-tuned with Low-Rank Adaptation (LoRA; r=16, α=32, dropout = 0.05) on the BreakHis dataset. Models were evaluated at 40×, 100×, 200×, and 400× magnifications using accuracy, precision, recall, F1-score, and area under the ROC curve (AUC). While Qwen2 achieved moderate performance across magnifications (e.g., 0.8736 accuracy and 0.9552 AUC at 200×), SmolVLM consistently outperformed Qwen2 and substantially reduced the gap with CNN baselines, reaching up to 0.9453 accuracy and 0.9572 F1-score at 200×-approaching the performance of AlexNet (0.9543 accuracy) at the same magnification. CNN baselines, particularly ResNet34, remained the strongest models overall, achieving the highest performance across all magnifications (e.g., 0.9879 accuracy and 0.9984 AUC at 40×). These findings demonstrate that LoRA fine-tuned VLMs, despite requiring gradient accumulation and memory-efficient optimizers and operating with a significantly smaller number of trainable parameters, can achieve competitive performance relative to traditional CNNs. However, CNN-based architectures still provide the highest accuracy and robustness for histopathology classification. Our results highlight the potential of VLMs as parameter-efficient alternatives for digital pathology tasks, particularly in resource-constrained settings.