본문으로 건너뛰기
← 뒤로

Leveraging pretrained vision-language model for enhanced breast cancer diagnosis with multi-view mammography.

1/5 보강
Medical physics 📖 저널 OA 33.8% 2026 Vol.53(1) p. e70261
Retraction 확인
출처

PICO 자동 추출 (휴리스틱, conf 2/4)

유사 논문
P · Population 대상 환자/모집단
This study highlights the potential of applying the finetuned vision-language models for developing multi-view, image-text-based CAD schemes of breast cancer.
I · Intervention 중재 / 시술
추출되지 않음
C · Comparison 대조 / 비교
추출되지 않음
O · Outcome 결과 / 결론
[CONCLUSIONS] The proposed Mammo-CLIP demonstrates superior breast cancer diagnosis performance compared to SOTA methods. This study highlights the potential of applying the finetuned vision-language models for developing multi-view, image-text-based CAD schemes of breast cancer.

Chen X, Li Y, Hu M, Salari E, Chen X, Qiu RLJ, Zheng B, Yang X

📝 환자 설명용 한 줄

[BACKGROUND] Although fusion of information from multiple views of mammograms plays an important role to increase accuracy of breast cancer detection, developing multi-view mammograms-based computer-a

이 논문을 인용하기

↓ .bib ↓ .ris
APA Chen X, Li Y, et al. (2026). Leveraging pretrained vision-language model for enhanced breast cancer diagnosis with multi-view mammography.. Medical physics, 53(1), e70261. https://doi.org/10.1002/mp.70261
MLA Chen X, et al.. "Leveraging pretrained vision-language model for enhanced breast cancer diagnosis with multi-view mammography.." Medical physics, vol. 53, no. 1, 2026, pp. e70261.
PMID 41532302
DOI 10.1002/mp.70261

Abstract

[BACKGROUND] Although fusion of information from multiple views of mammograms plays an important role to increase accuracy of breast cancer detection, developing multi-view mammograms-based computer-aided diagnosis (CAD) schemes still faces big challenges and no such CAD schemes have been used in clinical practice.

[PURPOSE] To overcome these challenges, we investigate a new approach based on the concept of contrastive language-image pre-training (CLIP), which has sparked interest across various medical imaging tasks. The aim is to solve the challenges in: (1) effectively adapting the single-view CLIP for multi-view feature fusion and (2) efficiently fine-tuning this parameter-dense model with limited samples and computational resources.

[METHODS] We introduce a unique Mammo-CLIP, the first multi-modal framework to process multi-view mammograms and corresponding simple texts. Mammo-CLIP uses an early feature fusion strategy to learn multi-view relationships in four mammograms acquired from the craniocaudal (CC) and mediolateral oblique (MLO) views of the left and right breasts. To enhance learning efficiency, plug-and-play adapters are added into CLIP's image and text encoders for fine-tuning the model efficiently and limiting updates to about 1% of the parameters. For framework evaluation, we assembled two datasets retrospectively. The first dataset, comprising 470 malignant and 479 benign cases, was used for few-shot fine-tuning and internal evaluation of the proposed Mammo-CLIP via 5-fold cross-validation. The second dataset, including 60 malignant and 294 benign cases, was used to test generalizability of Mammo-CLIP.

[RESULTS] Mammo-CLIP outperforms the state-of-the-art (SOTA) cross-view transformer evaluated using areas under ROC curves (AUC = 0.841 ± 0.017 vs. 0.817 ± 0.012 and 0.837 ± 0.034 vs. 0.807 ± 0.036) on both datasets. It also surpasses previous two CLIP-based methods by 20.3% and 14.3% in AUC.

[CONCLUSIONS] The proposed Mammo-CLIP demonstrates superior breast cancer diagnosis performance compared to SOTA methods. This study highlights the potential of applying the finetuned vision-language models for developing multi-view, image-text-based CAD schemes of breast cancer.

🏷️ 키워드 / MeSH

같은 제1저자의 인용 많은 논문 (5)