Radiomic approach to support multidisciplinary tumor board decision-making in locally advanced non-small cell lung cancer.
1/5 보강
PICO 자동 추출 (휴리스틱, conf 2/4)
유사 논문P · Population 대상 환자/모집단
환자: LA-NSCLC whose treatments had been decided by an expert MTB
I · Intervention 중재 / 시술
추출되지 않음
C · Comparison 대조 / 비교
추출되지 않음
O · Outcome 결과 / 결론
[CONCLUSIONS] ML models combining clinicopathological and radiomic features can reproduce MTB treatment recommendations for LA-NSCLC with good accuracy. This approach may provide decision in settings with limited MTB expertise and promote more consistent treatment allocation.
[BACKGROUND AND OBJECTIVE] Selecting the optimal treatment for locally advanced non-small cell lung cancer (LA-NSCLC) is complex and typically requires multidisciplinary tumor board (MTB) evaluation.
APA
Pasello G, Kotler H, et al. (2025). Radiomic approach to support multidisciplinary tumor board decision-making in locally advanced non-small cell lung cancer.. Frontiers in oncology, 15, 1713847. https://doi.org/10.3389/fonc.2025.1713847
MLA
Pasello G, et al.. "Radiomic approach to support multidisciplinary tumor board decision-making in locally advanced non-small cell lung cancer.." Frontiers in oncology, vol. 15, 2025, pp. 1713847.
PMID
41487562 ↗
Abstract 한글 요약
[BACKGROUND AND OBJECTIVE] Selecting the optimal treatment for locally advanced non-small cell lung cancer (LA-NSCLC) is complex and typically requires multidisciplinary tumor board (MTB) evaluation. This study investigated whether machine learning (ML) models trained on MTB decisions could support treatment selection by integrating clinicopathological characteristics with radiomic features from both the primary tumor and mediastinal lymph nodes (LN).
[MATERIALS AND METHODS] We retrospectively analyzed patients with LA-NSCLC whose treatments had been decided by an expert MTB. Patients were categorized into three pathways: (A) upfront surgery, (B) neoadjuvant systemic treatment followed by surgery, (C) concurrent chemoradiotherapy. Baseline CT scans were segmented to extract radiomic features from primary tumors and mediastinal LNs. Two ML models were developed based on clinicopathological and radiomic data, using MTB decisions as ground truth: (1) A vs. Rest and (2) B vs. C. Performance was assessed in independent training and test cohorts using the area under the receiver operating characteristic curve (AUC) and accuracy.
[RESULTS] In the training cohort, the A vs. Rest achieved an AUC of 0.847 and accuracy of 0.795 with 13 features, while the B vs. C model reached an AUC of 0.740 and accuracy of 0.700 with 9 features. In the test cohort, results remained robust, with an AUC of 0.808 (accuracy 0.700) for A vs. Rest and an AUC of 0.754 (accuracy 0.740) for B vs. C.
[CONCLUSIONS] ML models combining clinicopathological and radiomic features can reproduce MTB treatment recommendations for LA-NSCLC with good accuracy. This approach may provide decision in settings with limited MTB expertise and promote more consistent treatment allocation.
[MATERIALS AND METHODS] We retrospectively analyzed patients with LA-NSCLC whose treatments had been decided by an expert MTB. Patients were categorized into three pathways: (A) upfront surgery, (B) neoadjuvant systemic treatment followed by surgery, (C) concurrent chemoradiotherapy. Baseline CT scans were segmented to extract radiomic features from primary tumors and mediastinal LNs. Two ML models were developed based on clinicopathological and radiomic data, using MTB decisions as ground truth: (1) A vs. Rest and (2) B vs. C. Performance was assessed in independent training and test cohorts using the area under the receiver operating characteristic curve (AUC) and accuracy.
[RESULTS] In the training cohort, the A vs. Rest achieved an AUC of 0.847 and accuracy of 0.795 with 13 features, while the B vs. C model reached an AUC of 0.740 and accuracy of 0.700 with 9 features. In the test cohort, results remained robust, with an AUC of 0.808 (accuracy 0.700) for A vs. Rest and an AUC of 0.754 (accuracy 0.740) for B vs. C.
[CONCLUSIONS] ML models combining clinicopathological and radiomic features can reproduce MTB treatment recommendations for LA-NSCLC with good accuracy. This approach may provide decision in settings with limited MTB expertise and promote more consistent treatment allocation.
🏷️ 키워드 / MeSH 📖 같은 키워드 OA만
같은 제1저자의 인용 많은 논문 (1)
📖 전문 본문 읽기 PMC JATS · ~43 KB · 영문
Introduction
1
Introduction
Lung cancer (LC) is the second most commonly diagnosed cancer and the leading cause of cancer-related mortality worldwide, with high death rates observed in both men and women (1). Non-small cell lung cancer (NSCLC) accounts for approximately 85% of all LC cases, and about 30% of these patients present with locally advanced (LA) disease (stage III according to the eight edition of the TNM classification) at diagnosis. LA-NSCLC is a highly heterogeneous condition, with wide variations in tumor size, local extension, and nodal involvement, making optimal treatment selection a persistent challenge in thoracic oncology (2–4). Although LA-NSCLC is potentially curable, it remains associated with a high tumor burden and poor outcomes, with 5-year survival rates of 36%, 26%, and 13% for stages IIIA, IIIB, and IIIC, respectively (5). This scenario underscores the need for treatment standardization, careful optimization, ideally guided by experienced multidisciplinary teams (6).
Multidisciplinary tumor boards (MTBs) play a crucial role in the management of LA-NSCLC. However, heterogeneity in MTB expertise, case volumes, and access to advanced surgical, radiotherapy, and systemic treatment options can influence the appropriateness of treatment strategies and, ultimately, patient outcomes (7, 8). Moreover, the rapid introduction of novel therapeutic approaches, including targeted agents and immunotherapy in both perioperative and post-chemoradiotherapy (CRT) settings, has improved survival but blurred traditional treatment boundaries, further emphasizing the importance of precise patient stratification to guide personalized care (9–23).
In this evolving landscape, additional prognostic and predictive biomarkers are needed to complement established markers such PD-L1. Inflammatory signatures in blood and tumor samples, tumor immune microenvironment features and radiomic descriptors are increasingly recognized as potential contributors to refining treatment allocation (24, 25). Radiomics, in particular, enables extraction of high-dimensional quantitative features from routine imaging and has shown promise in oncology as a non-invasive biomarker discovery tool (26). In LA-NSCLC, computed tomography (CT)-based radiomic signatures from primary tumors has been explored for predicting occult mediastinal LN involvement (27, 28). Furthermore, preliminary evidence suggests that radiomic features derived directly from LNs may provide even more relevant information than those from the primary tumor alone (29–31). However, LN-based radiomics remains underexplored, and the integration of these features with clinicopathological data has been scarcely investigated.
In this proof-of-concept study, we retrospectively analyzed 176 patients who underwent radical treatment for stage III NSCLC over a 10–year period (2012-2022). We collected clinicopathological, radiological and treatment data, and integrated them with radiomic features extracted from both the primary tumor and mediastinal LNs. Treatment allocation, determined by the local expert MTB, served as the reference standard. Our aim was to explore the feasibility and potential utility of machine learning models trained on these data to reproduce MTB decisions. Such an approach could ultimately provide decision-support tools for centers with limited MTB expertise, thereby promoting more consistent and individualized treatment strategies in LA-NSCLC.
Introduction
Lung cancer (LC) is the second most commonly diagnosed cancer and the leading cause of cancer-related mortality worldwide, with high death rates observed in both men and women (1). Non-small cell lung cancer (NSCLC) accounts for approximately 85% of all LC cases, and about 30% of these patients present with locally advanced (LA) disease (stage III according to the eight edition of the TNM classification) at diagnosis. LA-NSCLC is a highly heterogeneous condition, with wide variations in tumor size, local extension, and nodal involvement, making optimal treatment selection a persistent challenge in thoracic oncology (2–4). Although LA-NSCLC is potentially curable, it remains associated with a high tumor burden and poor outcomes, with 5-year survival rates of 36%, 26%, and 13% for stages IIIA, IIIB, and IIIC, respectively (5). This scenario underscores the need for treatment standardization, careful optimization, ideally guided by experienced multidisciplinary teams (6).
Multidisciplinary tumor boards (MTBs) play a crucial role in the management of LA-NSCLC. However, heterogeneity in MTB expertise, case volumes, and access to advanced surgical, radiotherapy, and systemic treatment options can influence the appropriateness of treatment strategies and, ultimately, patient outcomes (7, 8). Moreover, the rapid introduction of novel therapeutic approaches, including targeted agents and immunotherapy in both perioperative and post-chemoradiotherapy (CRT) settings, has improved survival but blurred traditional treatment boundaries, further emphasizing the importance of precise patient stratification to guide personalized care (9–23).
In this evolving landscape, additional prognostic and predictive biomarkers are needed to complement established markers such PD-L1. Inflammatory signatures in blood and tumor samples, tumor immune microenvironment features and radiomic descriptors are increasingly recognized as potential contributors to refining treatment allocation (24, 25). Radiomics, in particular, enables extraction of high-dimensional quantitative features from routine imaging and has shown promise in oncology as a non-invasive biomarker discovery tool (26). In LA-NSCLC, computed tomography (CT)-based radiomic signatures from primary tumors has been explored for predicting occult mediastinal LN involvement (27, 28). Furthermore, preliminary evidence suggests that radiomic features derived directly from LNs may provide even more relevant information than those from the primary tumor alone (29–31). However, LN-based radiomics remains underexplored, and the integration of these features with clinicopathological data has been scarcely investigated.
In this proof-of-concept study, we retrospectively analyzed 176 patients who underwent radical treatment for stage III NSCLC over a 10–year period (2012-2022). We collected clinicopathological, radiological and treatment data, and integrated them with radiomic features extracted from both the primary tumor and mediastinal LNs. Treatment allocation, determined by the local expert MTB, served as the reference standard. Our aim was to explore the feasibility and potential utility of machine learning models trained on these data to reproduce MTB decisions. Such an approach could ultimately provide decision-support tools for centers with limited MTB expertise, thereby promoting more consistent and individualized treatment strategies in LA-NSCLC.
Materials and methods
2
Materials and methods
2.1
Study design and population
This proof-of-concept retrospective observational study included patients with stage III NSCLC consecutively referred to the Thoracic Oncology MTB at the Veneto Institute of Oncology IOV – IRCCS and the University Hospital of Padua (Italy) between November 2012 and July 2022. The study was approved by the Ethics Committee of the IOV-IRCCS (Cod. Int CESC IOV: 2021-89). Written informed consent for data collection, analysis and publication was obtained from all participants in accordance with the regulations of the Italian Data Protection Authority.
The study scheme is depicted in Figure 1.
Inclusion criteria were: (i) histologically confirmed stage III NSCLC; (ii) eligibility for multimodality treatment with radical intent as determined by the MTB; (iii) availability of clinical, hematological, and pathological data; (iv) availability of a baseline total-body CT scan and 18F-fluorodeoxyglucose positron emission tomography (18F-FDG PET-CT); (v) presence of at least one pulmonary target lesion (longest diameter greater than or equal to 10 mm) and -if available- one or two mediastinal LNs that were PET-avid and/or pathologically confirmed as positive; and (vi) a minimum follow-up of 6 months after the completion of radical treatment.
Patients were excluded if their CT images had a slice thickness > 5 mm or were acquired using a single non-replicated CT-scanner model. PET-CT was mandatory to integrate CT-derived LN status. The study population and detailed eligibility criteria are presented in Figure 2.
2.2
Clinicopathological data and treatment pathways
For each patient, we collected clinicopathological variables including age, sex, Eastern Cooperative Oncology Group Performance Status (ECOG PS), smoking history, tumor histology, molecular profile, PDL1 expression levels, and TNM stage (American Joint Commission on Cancer (AJCC) – 8th edition. The data was anonymized and collected as an electronic case report form.
Three treatment courses with curative intent were defined by the MTB according to clinical guidelines and patient characteristics:
Course A: upfront surgery, with or without adjuvant systemic therapy.
Course B: neoadjuvant systemic therapy followed by surgery.
Course C: concomitant or sequential CRT followed by consolidation immunotherapy when indicated.
Patients receiving radical radiotherapy alone were not included. During the study timeframe, treatment standards evolved: platinum-based ChT remained the backbone in neoadjuvant and adjuvant settings, with immunotherapy gradually introduced in Italy from 2018 (consolidation after CRT) and perioperative chemo-immunotherapy or targeted agents accessible through clinical trials or compassionate programs. Details about the diagnostic-therapeutic pathway were previously reported (32). A reproduction of the decision-making clinical algorithm is summarized in Supplementary Figure 1.
To assess the appropriateness of each clinical decision, patient outcomes were analyzed in terms of median relapse-free survival (RFS) and median overall survival (OS) within the intention–to-treat (ITT) population for each treatment course.
2.3
Image segmentation and feature extraction
Baseline CT and FDG-PET/CT scans were reviewed for each patient. PET/CT was used to identify pathological mediastinal LNs based on abnormal FDG uptake or histological confirmation, but all segmentations were performed exclusively on the CT images.
Semi-automatic CT-based segmentation of the primary tumors and mediastinal LNs were performed using HealthMyne software v. 13.6 (HealthMyne, Madison, WI) by a chest radiologist with 10 years of experience, in consensus with a thoracic oncologist. Up to four lesions were segmented per patient: one or two primary lung parenchymal lesions and one or two mediastinal LNs. LN selection was based on short-axis >15 mm on CT, PET evidence of pathological uptake, or histological confirmation of involvement. Each patient, therefore, had up to four segmentations: 1–2 lung parenchymal tumors and 1–2 mediastinal LNs. All segmentations were subsequently verified by a senior radiologist with >25 years of experience.
Radiomic features (first-order statistics, shape, and higher-order texture descriptors) were extracted from each CT segmentation. To reduce variability, pre-processing included exclusion of zero-variance features, harmonization across scanners using the NeuroCombat method (33), and removal of extreme outliers (upper and lower 10% of the interquartile range).
2.4
Machine learning pipeline: RadiomiX
Radiomic and clinicopathological features were analyzed using RadiomiX, an automated machine learning framework designed for radiomic analysis. RadiomiX systematically evaluates multiple combinations of feature selection strategies and classifiers to identify robust and reproducible models while mitigating the risk of overfitting (34).
The classification was structured in two sequential tasks:
1. Course A vs Courses B+C (upfront surgery vs. multimodality treatment).
2. Course B vs. Course C (f neoadjuvant therapy plus surgery vs. CRT-based treatment).
For each task, RadiomiX pipeline evaluated 532 model combinations. The modelling workflow consisted of 10 randomized train–test splits, each preserving the original class distribution. Within every split, a nested 5-fold cross-validation was applied. All preprocessing steps were executed exclusively within the training folds to avoid information leakage.
Model performance was evaluated using area under the receiver operating characteristic curve (AUC) and accuracy. AUC and accuracy were computed for every folds and then averaged across folds and across all 10 splits to obtain final cross-validated performance estimates. Models achieving a mean AUC and accuracy ≥0.70 were considered well-performing. Among these, the model with the highest mean cross-validated AUC was selected as the final model for each task.
Across the 532 combinations, Minimum Redundancy Maximum Relevance (MRMR) emerged as the best-performing feature selection method for both classification tasks. The top classifiers identified were a Support Vector Machine (SVM) for the Course A vs. Rest and a Random Forest for the Course B vs. C.
Feature stability within the modelling process was assessed by quantifying the recurrence of MRMR-selected features across all 10×5 cross-validation iterations. features retained in the final models consistently ranked among those with the highest selection frequencies, typically appearing in >70% of resampling iterations, indicating good robustness to repeated validation. A detailed summary of feature frequencies and their relationship with model coefficients is provided in the Supplementary Material.
Final models were tested on the independent test cohort. To derive stable performance estimates, 100 bootstrap resamplings of the test cohort were performed, and performance metrics were averaged across iterations.
2.5
Statistical analysis
Clinical outcomes were analyzed descriptively to contextualize MTB treatment decisions. Relapse-free survival (RFS) was defined from radical surgery or CRT initiation to first recurrence; overall survival (OS) was defined from histological diagnosis to death. Patients without events were censored at last follow-up.
Although the primary endpoint was to replicate MTB treatment decisions, survival outcomes were analyzed across treatment groups as an exploratory check. The aim was to demonstrate that the MTB-based treatment decisions used as reference labels were consistent with clinical outcomes. Kaplan–Meier curves were generated according to treatment groups and model-predicted classes.
All analyses were conducted using Python 3.9.16. The optimized configurations of the models selected for each classification task are detailed in the Supplementary Material. Statistical evaluations were performed using OriginPro 2024 (OriginLab Corporation, Northampton, MA, USA) and MedCalc Statistical version 22.021 (MedCalc Software Ltd, Ostend, Belgium).
Materials and methods
2.1
Study design and population
This proof-of-concept retrospective observational study included patients with stage III NSCLC consecutively referred to the Thoracic Oncology MTB at the Veneto Institute of Oncology IOV – IRCCS and the University Hospital of Padua (Italy) between November 2012 and July 2022. The study was approved by the Ethics Committee of the IOV-IRCCS (Cod. Int CESC IOV: 2021-89). Written informed consent for data collection, analysis and publication was obtained from all participants in accordance with the regulations of the Italian Data Protection Authority.
The study scheme is depicted in Figure 1.
Inclusion criteria were: (i) histologically confirmed stage III NSCLC; (ii) eligibility for multimodality treatment with radical intent as determined by the MTB; (iii) availability of clinical, hematological, and pathological data; (iv) availability of a baseline total-body CT scan and 18F-fluorodeoxyglucose positron emission tomography (18F-FDG PET-CT); (v) presence of at least one pulmonary target lesion (longest diameter greater than or equal to 10 mm) and -if available- one or two mediastinal LNs that were PET-avid and/or pathologically confirmed as positive; and (vi) a minimum follow-up of 6 months after the completion of radical treatment.
Patients were excluded if their CT images had a slice thickness > 5 mm or were acquired using a single non-replicated CT-scanner model. PET-CT was mandatory to integrate CT-derived LN status. The study population and detailed eligibility criteria are presented in Figure 2.
2.2
Clinicopathological data and treatment pathways
For each patient, we collected clinicopathological variables including age, sex, Eastern Cooperative Oncology Group Performance Status (ECOG PS), smoking history, tumor histology, molecular profile, PDL1 expression levels, and TNM stage (American Joint Commission on Cancer (AJCC) – 8th edition. The data was anonymized and collected as an electronic case report form.
Three treatment courses with curative intent were defined by the MTB according to clinical guidelines and patient characteristics:
Course A: upfront surgery, with or without adjuvant systemic therapy.
Course B: neoadjuvant systemic therapy followed by surgery.
Course C: concomitant or sequential CRT followed by consolidation immunotherapy when indicated.
Patients receiving radical radiotherapy alone were not included. During the study timeframe, treatment standards evolved: platinum-based ChT remained the backbone in neoadjuvant and adjuvant settings, with immunotherapy gradually introduced in Italy from 2018 (consolidation after CRT) and perioperative chemo-immunotherapy or targeted agents accessible through clinical trials or compassionate programs. Details about the diagnostic-therapeutic pathway were previously reported (32). A reproduction of the decision-making clinical algorithm is summarized in Supplementary Figure 1.
To assess the appropriateness of each clinical decision, patient outcomes were analyzed in terms of median relapse-free survival (RFS) and median overall survival (OS) within the intention–to-treat (ITT) population for each treatment course.
2.3
Image segmentation and feature extraction
Baseline CT and FDG-PET/CT scans were reviewed for each patient. PET/CT was used to identify pathological mediastinal LNs based on abnormal FDG uptake or histological confirmation, but all segmentations were performed exclusively on the CT images.
Semi-automatic CT-based segmentation of the primary tumors and mediastinal LNs were performed using HealthMyne software v. 13.6 (HealthMyne, Madison, WI) by a chest radiologist with 10 years of experience, in consensus with a thoracic oncologist. Up to four lesions were segmented per patient: one or two primary lung parenchymal lesions and one or two mediastinal LNs. LN selection was based on short-axis >15 mm on CT, PET evidence of pathological uptake, or histological confirmation of involvement. Each patient, therefore, had up to four segmentations: 1–2 lung parenchymal tumors and 1–2 mediastinal LNs. All segmentations were subsequently verified by a senior radiologist with >25 years of experience.
Radiomic features (first-order statistics, shape, and higher-order texture descriptors) were extracted from each CT segmentation. To reduce variability, pre-processing included exclusion of zero-variance features, harmonization across scanners using the NeuroCombat method (33), and removal of extreme outliers (upper and lower 10% of the interquartile range).
2.4
Machine learning pipeline: RadiomiX
Radiomic and clinicopathological features were analyzed using RadiomiX, an automated machine learning framework designed for radiomic analysis. RadiomiX systematically evaluates multiple combinations of feature selection strategies and classifiers to identify robust and reproducible models while mitigating the risk of overfitting (34).
The classification was structured in two sequential tasks:
1. Course A vs Courses B+C (upfront surgery vs. multimodality treatment).
2. Course B vs. Course C (f neoadjuvant therapy plus surgery vs. CRT-based treatment).
For each task, RadiomiX pipeline evaluated 532 model combinations. The modelling workflow consisted of 10 randomized train–test splits, each preserving the original class distribution. Within every split, a nested 5-fold cross-validation was applied. All preprocessing steps were executed exclusively within the training folds to avoid information leakage.
Model performance was evaluated using area under the receiver operating characteristic curve (AUC) and accuracy. AUC and accuracy were computed for every folds and then averaged across folds and across all 10 splits to obtain final cross-validated performance estimates. Models achieving a mean AUC and accuracy ≥0.70 were considered well-performing. Among these, the model with the highest mean cross-validated AUC was selected as the final model for each task.
Across the 532 combinations, Minimum Redundancy Maximum Relevance (MRMR) emerged as the best-performing feature selection method for both classification tasks. The top classifiers identified were a Support Vector Machine (SVM) for the Course A vs. Rest and a Random Forest for the Course B vs. C.
Feature stability within the modelling process was assessed by quantifying the recurrence of MRMR-selected features across all 10×5 cross-validation iterations. features retained in the final models consistently ranked among those with the highest selection frequencies, typically appearing in >70% of resampling iterations, indicating good robustness to repeated validation. A detailed summary of feature frequencies and their relationship with model coefficients is provided in the Supplementary Material.
Final models were tested on the independent test cohort. To derive stable performance estimates, 100 bootstrap resamplings of the test cohort were performed, and performance metrics were averaged across iterations.
2.5
Statistical analysis
Clinical outcomes were analyzed descriptively to contextualize MTB treatment decisions. Relapse-free survival (RFS) was defined from radical surgery or CRT initiation to first recurrence; overall survival (OS) was defined from histological diagnosis to death. Patients without events were censored at last follow-up.
Although the primary endpoint was to replicate MTB treatment decisions, survival outcomes were analyzed across treatment groups as an exploratory check. The aim was to demonstrate that the MTB-based treatment decisions used as reference labels were consistent with clinical outcomes. Kaplan–Meier curves were generated according to treatment groups and model-predicted classes.
All analyses were conducted using Python 3.9.16. The optimized configurations of the models selected for each classification task are detailed in the Supplementary Material. Statistical evaluations were performed using OriginPro 2024 (OriginLab Corporation, Northampton, MA, USA) and MedCalc Statistical version 22.021 (MedCalc Software Ltd, Ostend, Belgium).
Results
3
Results
3.1
Population description
From the initial ITT population of the train cohort (N = 131), 55 patients were excluded due to missing clinical data or technical reasons (Figure 2), leaving 76 patients for analysis. The majority were males (N = 45; 59%), with ECOG PS 0 –1 (N = 73; 96%), and adenocarcinoma histology (N = 55; 72%). Patient characteristics and the clinicopathological features are summarized in Table 1. Treatment distribution was: course A (upfront surgery) N = 27 (36%), course B (neoadjuvant therapy followed by surgery) N = 30 (39%), and course C (chemo-radiotherapy, with or without consolidation immunotherapy) N = 19 (25%).
In the test cohort, 26 of 45 patients were excluded using the same criteria, leaving 19 patients for analysis. Among them, 8 (42%) were assigned to course A, 3 (16%) to course B; and 8 (42%) to course C.
Systemic therapy details for both cohorts are provided in Table 2. Notably, a few patients were treated within clinical trials [IMpower030, AEGEAN (35, 36)] or received adjuvant osimertinib through compassionate use programs.
3.2
First treatment course classification: Course A versus Rest (B+C)
For the first classification task, 13 features were selected by the final model as most informative for distinguishing course A from the Rest. These included four clinicopathological features (smoking status and mediastinal LN stage), and nine radiomic features (four from the primary tumor and five from mediastinal LNs). Lesion-derived features captured density, heterogeneity, intensity uniformity, and spatial distribution, whereas LN-derived features reflected morphological complexity, intensity variability, and malignancy suspicion (LUNG-RADS).
Interpretability was enhanced using a bar plot of the signed model coefficients (Figure 3a), which illustrates both the magnitude and direction of each feature’s contribution and identifies whether features originate from the primary tumor, lymph nodes, or clinical data. Positive coefficients increased the likelihood of assignment to upfront surgery (Course A), while negative coefficients favor multimodality treatment (Courses B+C). These patterns highlight a coherent clinico-radiomic profile associated with surgical selection.
The model achieved an AUC of 0.847 (95% CI: 0.615 – 1.000) and accuracy of 0.795 (95% CI: 0.600 – 0.937) in the train cohort. Performance remained consistent in the test cohort, with an AUC of 0.808 (95% CI: 0.600–1.000) and accuracy of 0.700 (95% CI: 0.533–0.902) (Figure 3b). Features retained in the final model corresponded to those with the highest recurrence across resampling iterations, supporting their stability (see Supplementary Material).
3.3
Second treatment course classification: Course B vs. C
For the second classification task, nine features were selected: four clinicopathological (age, tumor size category, mediastinal LN stage, and platelet count) and five radiomic features (three from the primary tumor and two from LNs). Tumor-related features described elongation, percentage ground-glass opacity, and calcifications, while LN features included sphericity and intensity variability (RMS).
A signed-coefficient bar plot was generated for the final model (Figure 4a). In this task, all coefficients were positive, indicating that higher feature values were consistently associated with allocation to Course B (neoadjuvant therapy followed by surgery). The plot highlights relative importance rather than directional contrast and provides a clear visualization of how clinical, primary-tumor, and LN-derived radiomic descriptors contribute to discrimination.
The model achieved an AUC of 0.740 (95% CI: 0.514 – 0.990) and accuracy of 0.700 (95% CI: 0.540 – 0.900) in the train cohort. Test performance was comparable, with an AUC of 0.754 (95% CI: 0.500–0.980) and accuracy of 0.740 (95% CI: 0.500–0.960) (Figure 4b). As with the first classifier, the features selected for the final model were among those with the highest recurrence frequencies across validation iterations, supporting their robustness Supplementary Material.
3.4
Survival outcomes
Although MTB decisions were used as the reference standard, survival curves were included to demonstrate clinical consistency.
In the ITT train population (N = 131), with median follow-up of 63.6 months, median RFS and OS were 27.2 months (95% CI: 19.6–45.1) and 51.6 months (95% CI: 43.1–65.0), respectively. Median OS was 65.0, 53.0, and 31.2 months for courses A, B, and C, respectively, while median RFS was 45.1, 26.6, and 17.5 months. For patients in course C, consolidation durvalumab was associated with improved OS (43.6 vs. 15.9 months) and RFS (29.3 vs. 9.7 months) (Figure 5). These curves support the clinical relevance of MTB treatment choices.
A multi-parametric linear regression analysis was performed to investigate the association of selected clinicopatological and radiomic features with OS and RFS.
In the course A versus rest classification, LUNG_RADS_Lymph and INTENSITY_HISTOGRAM_ROBUST_MEAN_DEVIATION_HU of the primary lesion were significantly correlated with OS (p=0.0306) and RFS (p = 0.0050), respectively (R2-adj = 0.09083). In the course B versus C classification, ROOT_MEAN_SQUARE_VOXELS_Lymph was significantly associated with OS (p=0.0003; R2-adj=0.1492) and RFS (p=0.0136; R2-adj= 0.06797).
These results suggest that certain radiomic features may carry prognostic information beyond their predictive value for treatment choice, complementing the interpretability analyses provided by the signed-coefficient visualizations.
Results
3.1
Population description
From the initial ITT population of the train cohort (N = 131), 55 patients were excluded due to missing clinical data or technical reasons (Figure 2), leaving 76 patients for analysis. The majority were males (N = 45; 59%), with ECOG PS 0 –1 (N = 73; 96%), and adenocarcinoma histology (N = 55; 72%). Patient characteristics and the clinicopathological features are summarized in Table 1. Treatment distribution was: course A (upfront surgery) N = 27 (36%), course B (neoadjuvant therapy followed by surgery) N = 30 (39%), and course C (chemo-radiotherapy, with or without consolidation immunotherapy) N = 19 (25%).
In the test cohort, 26 of 45 patients were excluded using the same criteria, leaving 19 patients for analysis. Among them, 8 (42%) were assigned to course A, 3 (16%) to course B; and 8 (42%) to course C.
Systemic therapy details for both cohorts are provided in Table 2. Notably, a few patients were treated within clinical trials [IMpower030, AEGEAN (35, 36)] or received adjuvant osimertinib through compassionate use programs.
3.2
First treatment course classification: Course A versus Rest (B+C)
For the first classification task, 13 features were selected by the final model as most informative for distinguishing course A from the Rest. These included four clinicopathological features (smoking status and mediastinal LN stage), and nine radiomic features (four from the primary tumor and five from mediastinal LNs). Lesion-derived features captured density, heterogeneity, intensity uniformity, and spatial distribution, whereas LN-derived features reflected morphological complexity, intensity variability, and malignancy suspicion (LUNG-RADS).
Interpretability was enhanced using a bar plot of the signed model coefficients (Figure 3a), which illustrates both the magnitude and direction of each feature’s contribution and identifies whether features originate from the primary tumor, lymph nodes, or clinical data. Positive coefficients increased the likelihood of assignment to upfront surgery (Course A), while negative coefficients favor multimodality treatment (Courses B+C). These patterns highlight a coherent clinico-radiomic profile associated with surgical selection.
The model achieved an AUC of 0.847 (95% CI: 0.615 – 1.000) and accuracy of 0.795 (95% CI: 0.600 – 0.937) in the train cohort. Performance remained consistent in the test cohort, with an AUC of 0.808 (95% CI: 0.600–1.000) and accuracy of 0.700 (95% CI: 0.533–0.902) (Figure 3b). Features retained in the final model corresponded to those with the highest recurrence across resampling iterations, supporting their stability (see Supplementary Material).
3.3
Second treatment course classification: Course B vs. C
For the second classification task, nine features were selected: four clinicopathological (age, tumor size category, mediastinal LN stage, and platelet count) and five radiomic features (three from the primary tumor and two from LNs). Tumor-related features described elongation, percentage ground-glass opacity, and calcifications, while LN features included sphericity and intensity variability (RMS).
A signed-coefficient bar plot was generated for the final model (Figure 4a). In this task, all coefficients were positive, indicating that higher feature values were consistently associated with allocation to Course B (neoadjuvant therapy followed by surgery). The plot highlights relative importance rather than directional contrast and provides a clear visualization of how clinical, primary-tumor, and LN-derived radiomic descriptors contribute to discrimination.
The model achieved an AUC of 0.740 (95% CI: 0.514 – 0.990) and accuracy of 0.700 (95% CI: 0.540 – 0.900) in the train cohort. Test performance was comparable, with an AUC of 0.754 (95% CI: 0.500–0.980) and accuracy of 0.740 (95% CI: 0.500–0.960) (Figure 4b). As with the first classifier, the features selected for the final model were among those with the highest recurrence frequencies across validation iterations, supporting their robustness Supplementary Material.
3.4
Survival outcomes
Although MTB decisions were used as the reference standard, survival curves were included to demonstrate clinical consistency.
In the ITT train population (N = 131), with median follow-up of 63.6 months, median RFS and OS were 27.2 months (95% CI: 19.6–45.1) and 51.6 months (95% CI: 43.1–65.0), respectively. Median OS was 65.0, 53.0, and 31.2 months for courses A, B, and C, respectively, while median RFS was 45.1, 26.6, and 17.5 months. For patients in course C, consolidation durvalumab was associated with improved OS (43.6 vs. 15.9 months) and RFS (29.3 vs. 9.7 months) (Figure 5). These curves support the clinical relevance of MTB treatment choices.
A multi-parametric linear regression analysis was performed to investigate the association of selected clinicopatological and radiomic features with OS and RFS.
In the course A versus rest classification, LUNG_RADS_Lymph and INTENSITY_HISTOGRAM_ROBUST_MEAN_DEVIATION_HU of the primary lesion were significantly correlated with OS (p=0.0306) and RFS (p = 0.0050), respectively (R2-adj = 0.09083). In the course B versus C classification, ROOT_MEAN_SQUARE_VOXELS_Lymph was significantly associated with OS (p=0.0003; R2-adj=0.1492) and RFS (p=0.0136; R2-adj= 0.06797).
These results suggest that certain radiomic features may carry prognostic information beyond their predictive value for treatment choice, complementing the interpretability analyses provided by the signed-coefficient visualizations.
Discussion
4
Discussion
Optimizing treatment strategies for LA-NSCLC remains a major clinical challenge, as decisions must balance resectability, patient fitness, and evolving systemic therapies within a multidisciplinary framework (37). Determining surgical eligibility is particularly complex, with tumor size and lymph node involvement serving as critical (38, 39). In resectable disease, perioperative strategies are increasingly characterized by the availability of immunotherapy and targeted agents (40). In unresectable cases, consolidation immunotherapy after concurrent chemoradiation has significantly improved survival and shifted real-world treatment patterns (22, 32, 41). More recently, neoadjuvant and perioperative chemo-immunotherapy have shown high pathological complete response rates, further blurring the boundaries of resectability (42–44). These advances raise new questions, including how best to identify patients benefiting from adjuvant or consolidation immunotherapy beyond PDL1 expression levels (45) and how long adjuvant immunotherapy after radical surgery should be administered (46).
In this context, the role of a multidisciplinary tumor board is critical, particularly in borderline cases (47). A high-volume, experienced MTB including oncologists, radiation oncologists, thoracic surgeons and pulmonologists, interventional radiologists, nuclear medicine physicians, pathologists, and molecular biologists, is essential for evidence-based decision-making (48). However, the availability of such expertise is heterogeneous, and hub-and-spoke models are often required (49). Tools that integrate clinical and imaging data, such as radiomics, may help refine MTB decisions and support treatment personalization.
Our proof-of-concept study demonstrates the feasibility of using an AI-based radiomics pipeline (RadiomiX) (34) to classify stage III NSCLC patients into three therapeutic groups: upfront surgery, neoadjuvant therapy followed by surgery, or CRT. The models integrated both clinical variables and radiomic features from primary tumors and LNs, with LN-derived features providing complementary and clinically relevant information. Classification performance was robust in both training and test cohorts, supporting the potential role of this approach in real-world decision-making. In addition to overall performance, we assessed feature stability across the full modeling workflow and observed a high recurrence of top-ranked features across cross-validation iterations. This consistency supports the robustness of the radiomic and clinical signatures identified. To enhance interpretability, we also examined the direction and magnitude of feature influence using signed model coefficients, which provided an intuitive representation of the clinico-radiomic phenotype associated with each therapeutic pathway.
While survival analysis was not a primary endpoint, the alignment of survival curves with MTB recommendations provides reassurance that treatment allocation was clinically appropriate and consistent with published benchmarks (7–15, 20, 21, 32). Among selected clinical variables, age, smoking status, platelet levels, and LN stage influenced treatment stratification, in agreement with previous reports linking these factors to treatment tolerance, immune response, and prognosis (50–53). Radiomics added quantitative imaging biomarkers that complemented standard staging, which is subject to limitations such as false-negative or false-positive results in endoscopic or metabolic LN assessment (33, 53). Compared with earlier attempts to integrate AI into MTB workflows (e.g., in hepatocellular carcinoma (54)), our model achieved higher accuracy, likely due to the combination of clinical and radiomic features rather than clinical data alone.
The main limitations of our study are the modest sample size, retrospective design, and heterogeneity of imaging protocols, which led to the exclusion of several patients. Recruitment of stage III NSCLC is inherently challenging, as this group represents only about one-third of all lung cancer cases.
Evolving standards of care during the study period (2012–2022) also constrained the integration of molecular features, such as EGFR or ALK status, which are now routinely incorporated into treatment decision-making (18–20). Access to neoadjuvant or perioperative chemo-immunotherapy was similarly limited to clinical trials during most of the study timeframe. These factors emphasize the exploratory and hypothesis-generating nature of our findings. Another limitation concerns model interpretability. Inter-observer variability in lesion segmentation is a recognized source of uncertainty in radiomics. Although consensus and senior review were employed to minimize variability, no formal ICC-based reproducibility analysis was performed. Future work will include robustness assessment according to Zwanenburg et al. (55).
While the signed-coefficient analysis offered insight into the contribution of individual features, more advanced explainability frameworks (e.g., SHAP) were not implemented due to the exploratory nature and sample size of this study. Future work will incorporate model-agnostic explainability methods to better characterize radiomics-derived phenotypes and to support transparent clinical translation.
Future validation in prospective, multicenter settings will be essential to confirm these results under standardized imaging protocols, updated staging, and comprehensive molecular profiling. Such studies will determine whether integrated clinico-radiomic models can reliably support MTB decisions and improve treatment personalization.
Discussion
Optimizing treatment strategies for LA-NSCLC remains a major clinical challenge, as decisions must balance resectability, patient fitness, and evolving systemic therapies within a multidisciplinary framework (37). Determining surgical eligibility is particularly complex, with tumor size and lymph node involvement serving as critical (38, 39). In resectable disease, perioperative strategies are increasingly characterized by the availability of immunotherapy and targeted agents (40). In unresectable cases, consolidation immunotherapy after concurrent chemoradiation has significantly improved survival and shifted real-world treatment patterns (22, 32, 41). More recently, neoadjuvant and perioperative chemo-immunotherapy have shown high pathological complete response rates, further blurring the boundaries of resectability (42–44). These advances raise new questions, including how best to identify patients benefiting from adjuvant or consolidation immunotherapy beyond PDL1 expression levels (45) and how long adjuvant immunotherapy after radical surgery should be administered (46).
In this context, the role of a multidisciplinary tumor board is critical, particularly in borderline cases (47). A high-volume, experienced MTB including oncologists, radiation oncologists, thoracic surgeons and pulmonologists, interventional radiologists, nuclear medicine physicians, pathologists, and molecular biologists, is essential for evidence-based decision-making (48). However, the availability of such expertise is heterogeneous, and hub-and-spoke models are often required (49). Tools that integrate clinical and imaging data, such as radiomics, may help refine MTB decisions and support treatment personalization.
Our proof-of-concept study demonstrates the feasibility of using an AI-based radiomics pipeline (RadiomiX) (34) to classify stage III NSCLC patients into three therapeutic groups: upfront surgery, neoadjuvant therapy followed by surgery, or CRT. The models integrated both clinical variables and radiomic features from primary tumors and LNs, with LN-derived features providing complementary and clinically relevant information. Classification performance was robust in both training and test cohorts, supporting the potential role of this approach in real-world decision-making. In addition to overall performance, we assessed feature stability across the full modeling workflow and observed a high recurrence of top-ranked features across cross-validation iterations. This consistency supports the robustness of the radiomic and clinical signatures identified. To enhance interpretability, we also examined the direction and magnitude of feature influence using signed model coefficients, which provided an intuitive representation of the clinico-radiomic phenotype associated with each therapeutic pathway.
While survival analysis was not a primary endpoint, the alignment of survival curves with MTB recommendations provides reassurance that treatment allocation was clinically appropriate and consistent with published benchmarks (7–15, 20, 21, 32). Among selected clinical variables, age, smoking status, platelet levels, and LN stage influenced treatment stratification, in agreement with previous reports linking these factors to treatment tolerance, immune response, and prognosis (50–53). Radiomics added quantitative imaging biomarkers that complemented standard staging, which is subject to limitations such as false-negative or false-positive results in endoscopic or metabolic LN assessment (33, 53). Compared with earlier attempts to integrate AI into MTB workflows (e.g., in hepatocellular carcinoma (54)), our model achieved higher accuracy, likely due to the combination of clinical and radiomic features rather than clinical data alone.
The main limitations of our study are the modest sample size, retrospective design, and heterogeneity of imaging protocols, which led to the exclusion of several patients. Recruitment of stage III NSCLC is inherently challenging, as this group represents only about one-third of all lung cancer cases.
Evolving standards of care during the study period (2012–2022) also constrained the integration of molecular features, such as EGFR or ALK status, which are now routinely incorporated into treatment decision-making (18–20). Access to neoadjuvant or perioperative chemo-immunotherapy was similarly limited to clinical trials during most of the study timeframe. These factors emphasize the exploratory and hypothesis-generating nature of our findings. Another limitation concerns model interpretability. Inter-observer variability in lesion segmentation is a recognized source of uncertainty in radiomics. Although consensus and senior review were employed to minimize variability, no formal ICC-based reproducibility analysis was performed. Future work will include robustness assessment according to Zwanenburg et al. (55).
While the signed-coefficient analysis offered insight into the contribution of individual features, more advanced explainability frameworks (e.g., SHAP) were not implemented due to the exploratory nature and sample size of this study. Future work will incorporate model-agnostic explainability methods to better characterize radiomics-derived phenotypes and to support transparent clinical translation.
Future validation in prospective, multicenter settings will be essential to confirm these results under standardized imaging protocols, updated staging, and comprehensive molecular profiling. Such studies will determine whether integrated clinico-radiomic models can reliably support MTB decisions and improve treatment personalization.
Conclusions
5
Conclusions
This study provides initial evidence that AI-based radiomics can support MTB decision-making in stage III NSCLC. By integrating clinical and imaging-derived features, our models achieved consistent classification performance across major therapeutic pathways. Although limited by its retrospective design and sample size, this proof-of-concept work highlights the feasibility of radiomics-driven decision support and provides the rationale for prospective validation in treatment personalizing for LA-NSCLC.
Conclusions
This study provides initial evidence that AI-based radiomics can support MTB decision-making in stage III NSCLC. By integrating clinical and imaging-derived features, our models achieved consistent classification performance across major therapeutic pathways. Although limited by its retrospective design and sample size, this proof-of-concept work highlights the feasibility of radiomics-driven decision support and provides the rationale for prospective validation in treatment personalizing for LA-NSCLC.
출처: PubMed Central (JATS). 라이선스는 원 publisher 정책을 따릅니다 — 인용 시 원문을 표기해 주세요.
🏷️ 같은 키워드 · 무료전문 — 이 논문 MeSH/keyword 기반
- Reforming the delivery of smoking cessation: a distributional cost-effectiveness analysis of providing smoking cessation as part of targeted lung cancer screening.
- A Phase II Study of Durvalumab, Doxorubicin, and Ifosfamide in Recurrent and/or Metastatic Pulmonary Sarcomatoid Carcinoma (KCSG LU-19-24).
- A herbal formulation inhibits growth and survival of lung cancer cells through DNA damage and apoptosis - in vitro and in vivo studies.
- Negative trial but positive lesson: reframing immunotherapy resistance from one-size-fits-all to precision strategies.
- Lung Cancer Screening in Adults: State-of-the-Art and Policy Mapping (2025).
- Retrospective dosimetric evaluation of the collapsed cone, AAA, and Acuros XB algorithms for lung cancer Halcyon VMAT plans.