Vision-language model-based semantic-guided imaging biomarker for lung nodule malignancy prediction.
1/5 보강
[OBJECTIVE] Machine learning models have utilized semantic features, deep features, or both to assess lung nodule malignancy.
APA
Zhuang L, Tabatabaei SMH, et al. (2025). Vision-language model-based semantic-guided imaging biomarker for lung nodule malignancy prediction.. Journal of biomedical informatics, 172, 104947. https://doi.org/10.1016/j.jbi.2025.104947
MLA
Zhuang L, et al.. "Vision-language model-based semantic-guided imaging biomarker for lung nodule malignancy prediction.." Journal of biomedical informatics, vol. 172, 2025, pp. 104947.
PMID
41161557 ↗
Abstract 한글 요약
[OBJECTIVE] Machine learning models have utilized semantic features, deep features, or both to assess lung nodule malignancy. However, their reliance on manual annotation during inference, limited interpretability, and sensitivity to imaging variations hinder their application in real-world clinical settings. Thus, this research aims to integrate semantic features derived from radiologists' assessments of nodules, guiding the model to learn clinically relevant, robust, and explainable imaging features for predicting lung cancer.
[METHODS] We obtained 938 low-dose CT scans from the National Lung Screening Trial (NLST) with 1,261 nodules and semantic features. Additionally, the Lung Image Database Consortium dataset contains 1,018 CT scans, with 2,625 lesions annotated for nodule characteristics. Three external datasets were obtained from UCLA Health, the LUNGx Challenge, and the Duke Lung Cancer Screening. For imaging input, we obtained 2D nodule slices in nine directions from 50×50×50mm nodule crop. We converted structured semantic features into sentences using Gemini. We fine-tuned a pretrained Contrastive Language-Image Pretraining (CLIP) model with a parameter-efficient fine-tuning approach to align imaging and semantic text features and predict the one-year lung cancer diagnosis.
[RESULTS] Our model outperformed the state-of-the-art (SOTA) models in the NLST test set with an AUROC of 0.901 and AUPRC of 0.776. It also showed robust results in external datasets. Using CLIP, we also obtained predictions on semantic features through zero-shot inference, such as nodule margin (AUROC: 0.807), nodule consistency (0.812), and pleural attachment (0.840).
[CONCLUSION] By incorporating semantic features into the vision-language model, our approach surpasses the SOTA models in predicting lung cancer from CT scans collected from diverse clinical settings. It provides explainable outputs, aiding clinicians in comprehending the underlying meaning of model predictions. The code is available at https://github.com/luotingzhuang/CLIP_nodule.
[METHODS] We obtained 938 low-dose CT scans from the National Lung Screening Trial (NLST) with 1,261 nodules and semantic features. Additionally, the Lung Image Database Consortium dataset contains 1,018 CT scans, with 2,625 lesions annotated for nodule characteristics. Three external datasets were obtained from UCLA Health, the LUNGx Challenge, and the Duke Lung Cancer Screening. For imaging input, we obtained 2D nodule slices in nine directions from 50×50×50mm nodule crop. We converted structured semantic features into sentences using Gemini. We fine-tuned a pretrained Contrastive Language-Image Pretraining (CLIP) model with a parameter-efficient fine-tuning approach to align imaging and semantic text features and predict the one-year lung cancer diagnosis.
[RESULTS] Our model outperformed the state-of-the-art (SOTA) models in the NLST test set with an AUROC of 0.901 and AUPRC of 0.776. It also showed robust results in external datasets. Using CLIP, we also obtained predictions on semantic features through zero-shot inference, such as nodule margin (AUROC: 0.807), nodule consistency (0.812), and pleural attachment (0.840).
[CONCLUSION] By incorporating semantic features into the vision-language model, our approach surpasses the SOTA models in predicting lung cancer from CT scans collected from diverse clinical settings. It provides explainable outputs, aiding clinicians in comprehending the underlying meaning of model predictions. The code is available at https://github.com/luotingzhuang/CLIP_nodule.
🏷️ 키워드 / MeSH 📖 같은 키워드 OA만
같은 제1저자의 인용 많은 논문 (3)
- Integrative multimodal transcriptomics identifies a cancer-associated fibroblast membrane signature for predicting prognosis and therapeutic response in pancreatic ductal adenocarcinoma.
- Advancing Precision Oncology Through Modeling of Longitudinal and Multimodal Data.
- Patient-level thyroid cancer classification using attention multiple instance learning on fused multi-scale ultrasound image features.
🏷️ 같은 키워드 · 무료전문 — 이 논문 MeSH/keyword 기반
- A Phase I Study of Hydroxychloroquine and Suba-Itraconazole in Men with Biochemical Relapse of Prostate Cancer (HITMAN-PC): Dose Escalation Results.
- Self-management of male urinary symptoms: qualitative findings from a primary care trial.
- Clinical and Liquid Biomarkers of 20-Year Prostate Cancer Risk in Men Aged 45 to 70 Years.
- Diagnostic accuracy of Ga-PSMA PET/CT versus multiparametric MRI for preoperative pelvic invasion in the patients with prostate cancer.
- Clinical Presentation and Outcomes of Patients Undergoing Surgery for Thyroid Cancer.
- Association of patient health education with the postoperative health related quality of life in low- intermediate recurrence risk differentiated thyroid cancer patients.