Discrimination, calibration, and variable importance in statistical and machine learning models for predicting overall survival in advanced non-small cell lung cancer patients treated with immune checkpoint inhibitors.

Li LX; Hopkins AM; Woodman R; Abuhelwa AY; Gao Y; Parent N; Rowland A; Sorich MJ

doi:10.1016/j.jclinepi.2025.112082

← 뒤로

Discrimination, calibration, and variable importance in statistical and machine learning models for predicting overall survival in advanced non-small cell lung cancer patients treated with immune checkpoint inhibitors.

1/5 보강

Journal of clinical epidemiology 2026 Vol.190() p. 112082

PICO 자동 추출 (휴리스틱, conf 2/4)

유사 논문

P · Population 대상 환자/모집단

3203 patients, the two statistical models and 5 of the 6 ML models demonstrated comparable and moderate discrimination performances (aggregated Cindex: 0.

I · Intervention 중재 / 시술

추출되지 않음

C · Comparison 대조 / 비교

추출되지 않음

O · Outcome 결과 / 결론

Performance of a given model varied across evaluation cohorts, highlighting the importance of model assessment using multiple independent datasets. All models identified pretreatment NLR and ECOGPS as the key prognostic factors.

Li LX, Hopkins AM, Woodman R, Abuhelwa AY, Gao Y, Parent N, Rowland A, Sorich MJ

원문 ↗ DOI ↗ BibTeX ↓ RIS ↓

📝 환자 설명용 한 줄

[BACKGROUND AND OBJECTIVES] Prognostic models can enhance clinician-patient communication and guide treatment decisions.

이 논문을 인용하기

BibTeX ↓ RIS ↓

APA Li LX, Hopkins AM, et al. (2026). Discrimination, calibration, and variable importance in statistical and machine learning models for predicting overall survival in advanced non-small cell lung cancer patients treated with immune checkpoint inhibitors.. Journal of clinical epidemiology, 190, 112082. https://doi.org/10.1016/j.jclinepi.2025.112082

MLA Li LX, et al.. "Discrimination, calibration, and variable importance in statistical and machine learning models for predicting overall survival in advanced non-small cell lung cancer patients treated with immune checkpoint inhibitors.." Journal of clinical epidemiology, vol. 190, 2026, pp. 112082.

PMID 41276092

DOI 10.1016/j.jclinepi.2025.112082

Abstract

[BACKGROUND AND OBJECTIVES] Prognostic models can enhance clinician-patient communication and guide treatment decisions. Numerous machine learning (ML) algorithms are available and offer a novel approach to predicting survival in patients treated with immune checkpoint inhibitors. However, large-scale benchmarking of their performances-particularly in terms of calibration-has not been evaluated across multiple independent cohorts. This study aimed to develop, evaluate, and compare statistical and ML models regarding discrimination, calibration, and variable importance for predicting overall survival across seven clinical trial cohorts of advanced non-small cell lung cancer (NSCLC) undergoing immune checkpoint inhibitor treatment.

[METHODS] This study included atezolizumab-treated patients with advanced NSCLC from seven clinical trials. We compared two statistical models: Cox proportional-hazard (Coxph) and accelerated failure time models, and 6 ML models: CoxBoost, extreme gradient-boosting (XGBoost), gradient-boosting machines (GBMs), random survival forest, regularized Coxph models (least absolute shrinkage and selection operator [LASSO]), and support vector machines (SVMs). Models were evaluated on discrimination and calibration using a leave-one-study-out nested cross-validation (nCV) framework. Discrimination was assessed using Harrell's concordance index (Cindex), while calibration was assessed using integrated calibration index (ICI) and plot. Variable importance was assessed using Shapley Additive exPlanations (SHAP) values.

[RESULTS] In a cohort of 3203 patients, the two statistical models and 5 of the 6 ML models demonstrated comparable and moderate discrimination performances (aggregated Cindex: 0.69-0.70), while SVM exhibited poor discrimination (aggregated Cindex: 0.57). Regarding calibration, the models appeared largely comparable in aggregated plots, except for LASSO, although the XGBoost models demonstrated superior calibration numerically. Across the evaluation cohorts, individual performance measures varied and no single model consistently outperforming the others. Pretreatment neutrophil-to-lymphocyte ratios (NLRs) and Eastern Cooperative Oncology Group Performance Status (ECOGPS) were ranked among the top five most important predictors across all models.

[CONCLUSION] There was no clear best-performing model for either discrimination or calibration, although XGBoost models showed possible superior calibration numerically. Performance of a given model varied across evaluation cohorts, highlighting the importance of model assessment using multiple independent datasets. All models identified pretreatment NLR and ECOGPS as the key prognostic factors.

MeSH Terms

Humans; Carcinoma, Non-Small-Cell Lung; Lung Neoplasms; Machine Learning; Immune Checkpoint Inhibitors; Male; Female; Middle Aged; Aged; Calibration; Models, Statistical; Prognosis; Antibodies, Monoclonal, Humanized

같은 제1저자의 인용 많은 논문 (1)

KGBN: Augmenting and optimizing logical gene regulatory networks using knowledge graphs.
bioRxiv : the preprint server for biology 2026