본문으로 건너뛰기
← 뒤로

Multi-center benchmarking of large language models for clinical decision support in lung cancer screening.

단면연구 1/5 보강
Cell reports. Medicine 📖 저널 OA 99.2% 2021: 1/1 OA 2024: 9/9 OA 2025: 45/46 OA 2026: 73/73 OA 2021~2026 2025 Vol.6(12) p. 102465
Retraction 확인
출처

Duan Z, Huang X, Lu R, Xu W, Liu H, Geng Y, Takahashi N, Wu Y, Wang Q, Song Y, Xu H, Tang H, Lan F, Eils R, Tan L

📝 환자 설명용 한 줄

Large language models (LLMs) are increasingly explored for clinical applications, but their ability to generate management recommendations for lung cancer screening remains uncertain.

🔬 핵심 임상 통계 (초록에서 자동 추출 — 원문 검증 권장)
  • 연구 설계 cross-sectional

이 논문을 인용하기

↓ .bib ↓ .ris
APA Duan Z, Huang X, et al. (2025). Multi-center benchmarking of large language models for clinical decision support in lung cancer screening.. Cell reports. Medicine, 6(12), 102465. https://doi.org/10.1016/j.xcrm.2025.102465
MLA Duan Z, et al.. "Multi-center benchmarking of large language models for clinical decision support in lung cancer screening.." Cell reports. Medicine, vol. 6, no. 12, 2025, pp. 102465.
PMID 41274285 ↗

Abstract

Large language models (LLMs) are increasingly explored for clinical applications, but their ability to generate management recommendations for lung cancer screening remains uncertain. In this cross-sectional, multi-center study, 148 anonymized low-dose computed tomography (CT) reports from three healthcare institutions are used to assess the readability, accuracy, and consistency of four widely adopted models (GPT-3.5, GPT-4, Claude 3 Sonnet, and Claude 3 Opus). Among them, Claude 3 Opus produces the most readable recommendations, while GPT-4 achieves the highest clinical accuracy. Importantly, performance dose not differ significantly across institutions, underscoring the robustness of these models to variations in reporting templates and their utility in diverse healthcare settings. In an exploratory analysis, two state-of-the-art models, proprietary GPT-4o and its open-source counterpart DeepSeek-R1, show comparable performance to GPT-4, outperforming GPT-3.5. These findings highlight the potential role of LLMs to enhance clinical decision support in lung cancer screening across diverse healthcare settings.

🏷️ 키워드 / MeSH 📖 같은 키워드 OA만

같은 제1저자의 인용 많은 논문 (5)

🏷️ 같은 키워드 · 무료전문 — 이 논문 MeSH/keyword 기반

🟢 PMC 전문 열기