Artificial intelligence for TNM staging in NSCLC: a critical appraisal of segmentation utility in [⁸F]FDG PET/CT.
1/5 보강
PICO 자동 추출 (휴리스틱, conf 2/4)
유사 논문P · Population 대상 환자/모집단
306 patients, with most discordances due to upstaging (88/306).
I · Intervention 중재 / 시술
추출되지 않음
C · Comparison 대조 / 비교
추출되지 않음
O · Outcome 결과 / 결론
On a lesion level M-stage false positives and undersegmentation in the hilar region emerged as the main driver of clinically relevant upstaging. Despite promising lesion detection sensitivity, only 67.7% UICC-stagings were accurate using AI masks, indicating that diagnostic AI may support, though not yet replace, manual lesion evaluation in NSCLC [⁸F]FDG PET/CT.
[PURPOSE] This study aims to investigate whether a diagnostic AI model can effectively support lesion detection and staging in non-small cell lung cancer (NSCLC) [⁸F]FDG PET/CT studies, focusing on th
- 표본수 (n) 196
- Sensitivity 95.8%
APA
Heimer MM, Dexl J, et al. (2026). Artificial intelligence for TNM staging in NSCLC: a critical appraisal of segmentation utility in [⁸F]FDG PET/CT.. European journal of nuclear medicine and molecular imaging, 53(5), 3117-3127. https://doi.org/10.1007/s00259-025-07677-2
MLA
Heimer MM, et al.. "Artificial intelligence for TNM staging in NSCLC: a critical appraisal of segmentation utility in [⁸F]FDG PET/CT.." European journal of nuclear medicine and molecular imaging, vol. 53, no. 5, 2026, pp. 3117-3127.
PMID
41275455
Abstract
[PURPOSE] This study aims to investigate whether a diagnostic AI model can effectively support lesion detection and staging in non-small cell lung cancer (NSCLC) [⁸F]FDG PET/CT studies, focusing on the distinction between technical segmentation accuracy and clinically meaningful performance.
[METHODS] In this retrospective single-centre study, [⁸F]FDG PET/CT scans from 306 treatment-naïve NSCLC patients were reviewed with reference to multidisciplinary team decisions. Tumour lesions were manually segmented for reference and compared with predictions from the top-performing algorithm of the autoPET III challenge. Quantitative segmentation metrics were calculated, and lesion-level errors were assessed for impact on patient-level TNM and UICC staging.
[RESULTS] The algorithm achieved a mean Dice Similarity Coefficient (DSC) of 0.64. Lesion-level sensitivity was 95.8% across all patients, with a precision of 87.5%. False positive M-category lesions (n = 196) occurred as most frequent error. Of all false positives, 35.7% were benign and 34.7% non-oncologic pathologies. UICC staging matched ground truth in 207/306 patients, with most discordances due to upstaging (88/306).
[CONCLUSION] Clinically driven metrics and cause-based error analysis offer valuable insight into AI segmentation performance. The evaluated model showed excellent lesion sensitivity but a tendency towards systematic overprediction across TNM categories. On a lesion level M-stage false positives and undersegmentation in the hilar region emerged as the main driver of clinically relevant upstaging. Despite promising lesion detection sensitivity, only 67.7% UICC-stagings were accurate using AI masks, indicating that diagnostic AI may support, though not yet replace, manual lesion evaluation in NSCLC [⁸F]FDG PET/CT.
[METHODS] In this retrospective single-centre study, [⁸F]FDG PET/CT scans from 306 treatment-naïve NSCLC patients were reviewed with reference to multidisciplinary team decisions. Tumour lesions were manually segmented for reference and compared with predictions from the top-performing algorithm of the autoPET III challenge. Quantitative segmentation metrics were calculated, and lesion-level errors were assessed for impact on patient-level TNM and UICC staging.
[RESULTS] The algorithm achieved a mean Dice Similarity Coefficient (DSC) of 0.64. Lesion-level sensitivity was 95.8% across all patients, with a precision of 87.5%. False positive M-category lesions (n = 196) occurred as most frequent error. Of all false positives, 35.7% were benign and 34.7% non-oncologic pathologies. UICC staging matched ground truth in 207/306 patients, with most discordances due to upstaging (88/306).
[CONCLUSION] Clinically driven metrics and cause-based error analysis offer valuable insight into AI segmentation performance. The evaluated model showed excellent lesion sensitivity but a tendency towards systematic overprediction across TNM categories. On a lesion level M-stage false positives and undersegmentation in the hilar region emerged as the main driver of clinically relevant upstaging. Despite promising lesion detection sensitivity, only 67.7% UICC-stagings were accurate using AI masks, indicating that diagnostic AI may support, though not yet replace, manual lesion evaluation in NSCLC [⁸F]FDG PET/CT.
MeSH Terms
Humans; Positron Emission Tomography Computed Tomography; Fluorodeoxyglucose F18; Carcinoma, Non-Small-Cell Lung; Lung Neoplasms; Neoplasm Staging; Female; Male; Artificial Intelligence; Middle Aged; Aged; Retrospective Studies; Image Processing, Computer-Assisted; Aged, 80 and over; Adult; Radiopharmaceuticals