Comparative Performance of 3 Artificial Intelligence Systems for Lung Nodule Characterization in Low-Dose Computed Tomography Screening.
1/5 보강
PICO 자동 추출 (휴리스틱, conf 2/4)
유사 논문P · Population 대상 환자/모집단
100 subjects, assessing agreement with a reference standard and inter-vendor consistency.
I · Intervention 중재 / 시술
추출되지 않음
C · Comparison 대조 / 비교
추출되지 않음
O · Outcome 결과 / 결론
Inter-AI agreement was substantial (κ=0.66 to 0.78), and diameter/volume measurements showed moderate to good reliability (ICC=0.57 to 0.87). [CONCLUSION] Commercial AI systems show variable performance in nodule detection and classification, underscoring the need for users to understand each system's characteristics and interpret results within clinical context.
[PURPOSE] This study evaluates 3 artificial intelligence (AI) systems in detecting, characterizing, and classifying lung nodules on low-dose computed tomography (LDCT) scans of 100 subjects, assessing
- p-value P<0.001
APA
Khurelsukh K, Lin YP, et al. (2026). Comparative Performance of 3 Artificial Intelligence Systems for Lung Nodule Characterization in Low-Dose Computed Tomography Screening.. Journal of thoracic imaging. https://doi.org/10.1097/RTI.0000000000000877
MLA
Khurelsukh K, et al.. "Comparative Performance of 3 Artificial Intelligence Systems for Lung Nodule Characterization in Low-Dose Computed Tomography Screening.." Journal of thoracic imaging, 2026.
PMID
41815002 ↗
Abstract 한글 요약
[PURPOSE] This study evaluates 3 artificial intelligence (AI) systems in detecting, characterizing, and classifying lung nodules on low-dose computed tomography (LDCT) scans of 100 subjects, assessing agreement with a reference standard and inter-vendor consistency.
[MATERIALS AND METHODS] Performance of 3 commercially available AI platforms-AI 1, AI 2, and AI 3-was retrospectively analyzed against evaluations by 2 thoracic radiologists, with discordances resolved by consensus as reference standard. Agreements were assessed for nodule presence, type (solid, part-solid, ground-glass), and Lung-RADS category using Cohen Kappa. Agreement for continuous measurements (nodule diameter and volume) across AI systems was evaluated using intraclass correlation coefficients (ICC). Group comparisons for continuous variables were performed using the Kruskal-Wallis test, with Mann-Whitney U tests for post hoc pairwise comparisons. Categorical variables were compared using χ2 tests. Bland-Altman analysis evaluated variability in diameter and volume measurements.
[RESULTS] The 3 AI systems detected 435, 152, and 70 nodules, respectively, whereas radiologists identified 126 nodules (P<0.001). Sensitivity, specificity, and accuracy were 77.0%, 8.2%, and 25.7% for AI 1; 72.2%, 83.4%, and 80.6% for AI 2; and 42.9%, 95.7%, and 82.2% for AI 3. Agreement with the reference standard was perfect for AI 2 and almost perfect for AI 3, but absent for AI 1. Inter-AI agreement was substantial (κ=0.66 to 0.78), and diameter/volume measurements showed moderate to good reliability (ICC=0.57 to 0.87).
[CONCLUSION] Commercial AI systems show variable performance in nodule detection and classification, underscoring the need for users to understand each system's characteristics and interpret results within clinical context.
[MATERIALS AND METHODS] Performance of 3 commercially available AI platforms-AI 1, AI 2, and AI 3-was retrospectively analyzed against evaluations by 2 thoracic radiologists, with discordances resolved by consensus as reference standard. Agreements were assessed for nodule presence, type (solid, part-solid, ground-glass), and Lung-RADS category using Cohen Kappa. Agreement for continuous measurements (nodule diameter and volume) across AI systems was evaluated using intraclass correlation coefficients (ICC). Group comparisons for continuous variables were performed using the Kruskal-Wallis test, with Mann-Whitney U tests for post hoc pairwise comparisons. Categorical variables were compared using χ2 tests. Bland-Altman analysis evaluated variability in diameter and volume measurements.
[RESULTS] The 3 AI systems detected 435, 152, and 70 nodules, respectively, whereas radiologists identified 126 nodules (P<0.001). Sensitivity, specificity, and accuracy were 77.0%, 8.2%, and 25.7% for AI 1; 72.2%, 83.4%, and 80.6% for AI 2; and 42.9%, 95.7%, and 82.2% for AI 3. Agreement with the reference standard was perfect for AI 2 and almost perfect for AI 3, but absent for AI 1. Inter-AI agreement was substantial (κ=0.66 to 0.78), and diameter/volume measurements showed moderate to good reliability (ICC=0.57 to 0.87).
[CONCLUSION] Commercial AI systems show variable performance in nodule detection and classification, underscoring the need for users to understand each system's characteristics and interpret results within clinical context.
🏷️ 키워드 / MeSH 📖 같은 키워드 OA만
🏷️ 같은 키워드 · 무료전문 — 이 논문 MeSH/keyword 기반
- Nanotechnology-Assisted Molecular Profiling: Emerging Advances in Circulating Tumor DNA Detection.
- Artificial intelligence and breast cancer screening in Serbia: a dual-perspective qualitative study among radiologists and screening-aged women.
- Reforming the delivery of smoking cessation: a distributional cost-effectiveness analysis of providing smoking cessation as part of targeted lung cancer screening.
- Lung Cancer Screening in Adults: State-of-the-Art and Policy Mapping (2025).
- Aesthetically ideal noses created using a single artificial intelligence model: Validating literature and exploring ethnic differences.
- Integrative Computational Approaches to Prostate Cancer with Conditional Reprogramming and AI-Driven Precision Medicine.