Machine learning-driven PET-CT and clinical pathology model for predicting mediastinal lymph node metastasis in non-small cell lung cancer: a retrospective cohort study.
[OBJECTIVE] This study aims to evaluate whether Positron Emission Tomography-Computed Tomography (PET-CT) imaging features of primary tumors and lymph nodes, combined with clinical and pathological da
APA
Bi T, Qiang M, et al. (2026). Machine learning-driven PET-CT and clinical pathology model for predicting mediastinal lymph node metastasis in non-small cell lung cancer: a retrospective cohort study.. PeerJ, 14, e20788. https://doi.org/10.7717/peerj.20788
MLA
Bi T, et al.. "Machine learning-driven PET-CT and clinical pathology model for predicting mediastinal lymph node metastasis in non-small cell lung cancer: a retrospective cohort study.." PeerJ, vol. 14, 2026, pp. e20788.
PMID
41660073
Abstract
[OBJECTIVE] This study aims to evaluate whether Positron Emission Tomography-Computed Tomography (PET-CT) imaging features of primary tumors and lymph nodes, combined with clinical and pathological data, can accurately predict mediastinal lymph node metastasis (MLNM) in resectable non-small cell lung cancer (NSCLC) using machine learning models.
[METHODS] A retrospective study was conducted on 390 NSCLC patients who underwent tumor resection and lymph node dissection between January 2017 and December 2023. All patients received 18F-fluorodeoxyglucose (18F-FDG) PET-CT scans within two weeks before surgery. Data from 390 primary tumors and 1,026 lymph node stations were analyzed. Clinical and PET-CT imaging features were extracted, and feature selection was performed using a random forest algorithm. Eight machine learning models were evaluated, including Logistic Regression, classification and regression tree (CART), support vector machine (SVM), gradient boosting decision tree (GBDT), Random Forest, multi-layer perceptron (MLP), extreme gradient boosting tree (XGBoost) and k-nearest neighbor algorithm (KNN).
[THREE MODELS WERE DEVELOPED] Tumor-Pathology-Clinical (TPC), Lymph-Pathology-Clinical (LPC), and Tumor-Lymph-Pathology-Clinical (TLPC). Model performance was assessed using Receiver Operating Characteristic (ROC) curves, Decision Curve Analysis (DCA), and confusion matrices.
[RESULTS] The TLPC model, based on the XGBoost algorithm, showed the best performance, with an Area Under the Curve (AUC) of 0.90 (95% CI [0.883-0.957]), specificity of 0.84, and sensitivity of 0.96 ( = 0.0069; significant at < 0.05). In comparison, the TPC model achieved an AUC of 0.67 (95% CI [0.647-0.703]), specificity of 0.46, and sensitivity of 0.56 ( = 0.7037; not significant). The LPC model showed intermediate performance, with an AUC of 0.78 (95% CI [0.713-0.751]), specificity of 0.73, and sensitivity of 0.84 ( = 0.0269; significant at < 0.05). All -values were derived from DeLong's test comparing AUCs between models, with statistical significance defined as < 0.05. Of the 1,026 lymph node stations analyzed, 204 showed metastasis, while 822 did not. XGBoost consistently outperformed other models in predicting MLNM.
[CONCLUSION] Combining PET-CT imaging features of primary tumors and lymph nodes with clinical and pathological data shows promise for accurately predicting MLNM in NSCLC. The TLPC model offers a non-invasive method for identifying lymph node metastasis, supporting personalized treatment strategies. However, since PET-CT was performed selectively rather than routinely acquired, external validation across diverse clinical settings is warranted to confirm model generalizability.
[METHODS] A retrospective study was conducted on 390 NSCLC patients who underwent tumor resection and lymph node dissection between January 2017 and December 2023. All patients received 18F-fluorodeoxyglucose (18F-FDG) PET-CT scans within two weeks before surgery. Data from 390 primary tumors and 1,026 lymph node stations were analyzed. Clinical and PET-CT imaging features were extracted, and feature selection was performed using a random forest algorithm. Eight machine learning models were evaluated, including Logistic Regression, classification and regression tree (CART), support vector machine (SVM), gradient boosting decision tree (GBDT), Random Forest, multi-layer perceptron (MLP), extreme gradient boosting tree (XGBoost) and k-nearest neighbor algorithm (KNN).
[THREE MODELS WERE DEVELOPED] Tumor-Pathology-Clinical (TPC), Lymph-Pathology-Clinical (LPC), and Tumor-Lymph-Pathology-Clinical (TLPC). Model performance was assessed using Receiver Operating Characteristic (ROC) curves, Decision Curve Analysis (DCA), and confusion matrices.
[RESULTS] The TLPC model, based on the XGBoost algorithm, showed the best performance, with an Area Under the Curve (AUC) of 0.90 (95% CI [0.883-0.957]), specificity of 0.84, and sensitivity of 0.96 ( = 0.0069; significant at < 0.05). In comparison, the TPC model achieved an AUC of 0.67 (95% CI [0.647-0.703]), specificity of 0.46, and sensitivity of 0.56 ( = 0.7037; not significant). The LPC model showed intermediate performance, with an AUC of 0.78 (95% CI [0.713-0.751]), specificity of 0.73, and sensitivity of 0.84 ( = 0.0269; significant at < 0.05). All -values were derived from DeLong's test comparing AUCs between models, with statistical significance defined as < 0.05. Of the 1,026 lymph node stations analyzed, 204 showed metastasis, while 822 did not. XGBoost consistently outperformed other models in predicting MLNM.
[CONCLUSION] Combining PET-CT imaging features of primary tumors and lymph nodes with clinical and pathological data shows promise for accurately predicting MLNM in NSCLC. The TLPC model offers a non-invasive method for identifying lymph node metastasis, supporting personalized treatment strategies. However, since PET-CT was performed selectively rather than routinely acquired, external validation across diverse clinical settings is warranted to confirm model generalizability.
MeSH Terms
Humans; Carcinoma, Non-Small-Cell Lung; Positron Emission Tomography Computed Tomography; Retrospective Studies; Female; Male; Lung Neoplasms; Machine Learning; Middle Aged; Lymphatic Metastasis; Aged; Mediastinum; Lymph Nodes; Fluorodeoxyglucose F18; Adult; ROC Curve; Aged, 80 and over