Assessing CNNs and LoRA-Fine-Tuned Vision-Language Models for Breast Cancer Histopathology Image Classification.

Tomiris M. Zhaksylyk; Beibit Abdikenov; Nurbek M. Saidnassim; Birzhan Ayanbayev; Aruzhan S. Imasheva; Temirlan S. Karibekov

doi:10.3390/jimaging12040168

← 뒤로

Assessing CNNs and LoRA-Fine-Tuned Vision-Language Models for Breast Cancer Histopathology Image Classification.

2/5 보강

Journal of imaging 2026 Vol.12(4) OA AI in cancer detection

OpenAlex 토픽 · AI in cancer detection Advanced Neural Network Applications Digital Imaging for Blood Diseases

Zhaksylyk TM, Abdikenov BB, Saidnassim NM, Ayanbayev BT, Imasheva AS, Karibekov TS

🔓 OA 전문 ↗ 원문 ↗ DOI ↗ BibTeX ↓ RIS ↓

📝 환자 설명용 한 줄

Breast cancer histopathology classification remains a fundamental challenge in computational pathology due to variations in tissue morphology across magnification levels.

이 논문을 인용하기

BibTeX ↓ RIS ↓

APA Tomiris M. Zhaksylyk, Beibit Abdikenov, et al. (2026). Assessing CNNs and LoRA-Fine-Tuned Vision-Language Models for Breast Cancer Histopathology Image Classification.. Journal of imaging, 12(4). https://doi.org/10.3390/jimaging12040168

MLA Tomiris M. Zhaksylyk, et al.. "Assessing CNNs and LoRA-Fine-Tuned Vision-Language Models for Breast Cancer Histopathology Image Classification.." Journal of imaging, vol. 12, no. 4, 2026.

PMID 42042511

DOI 10.3390/jimaging12040168

Abstract

Breast cancer histopathology classification remains a fundamental challenge in computational pathology due to variations in tissue morphology across magnification levels. Convolutional neural networks (CNNs) have long been the standard for image-based diagnosis, yet recent advances in vision-language models (VLMs) suggest they may provide strong and transferable representations for complex medical images. In this study, we present a systematic comparison between CNN baselines and large VLMs-Qwen2 and SmolVLM-fine-tuned with Low-Rank Adaptation (LoRA; r=16, α=32, dropout = 0.05) on the BreakHis dataset. Models were evaluated at 40×, 100×, 200×, and 400× magnifications using accuracy, precision, recall, F1-score, and area under the ROC curve (AUC). While Qwen2 achieved moderate performance across magnifications (e.g., 0.8736 accuracy and 0.9552 AUC at 200×), SmolVLM consistently outperformed Qwen2 and substantially reduced the gap with CNN baselines, reaching up to 0.9453 accuracy and 0.9572 F1-score at 200×-approaching the performance of AlexNet (0.9543 accuracy) at the same magnification. CNN baselines, particularly ResNet34, remained the strongest models overall, achieving the highest performance across all magnifications (e.g., 0.9879 accuracy and 0.9984 AUC at 40×). These findings demonstrate that LoRA fine-tuned VLMs, despite requiring gradient accumulation and memory-efficient optimizers and operating with a significantly smaller number of trainable parameters, can achieve competitive performance relative to traditional CNNs. However, CNN-based architectures still provide the highest accuracy and robustness for histopathology classification. Our results highlight the potential of VLMs as parameter-efficient alternatives for digital pathology tasks, particularly in resource-constrained settings.