본문으로 건너뛰기
← 뒤로

Development of a machine learning-based model to predict prognosis of resected invasive pulmonary adenocarcinoma.

1/5 보강
Journal of thoracic disease 📖 저널 OA 100% 2022: 1/1 OA 2024: 1/1 OA 2025: 78/78 OA 2026: 91/91 OA 2022~2026 2025 Vol.17(12) p. 11057-11067
Retraction 확인
출처

Huang J, Qian J, Zhong Y, Xia B, Feng X, Meng W

📝 환자 설명용 한 줄

[BACKGROUND] Invasive pulmonary adenocarcinoma (IPA) poses a significant threat to global health and patients still experience tumor recurrence and metastasis.

이 논문을 인용하기

↓ .bib ↓ .ris
APA Huang J, Qian J, et al. (2025). Development of a machine learning-based model to predict prognosis of resected invasive pulmonary adenocarcinoma.. Journal of thoracic disease, 17(12), 11057-11067. https://doi.org/10.21037/jtd-2025-1669
MLA Huang J, et al.. "Development of a machine learning-based model to predict prognosis of resected invasive pulmonary adenocarcinoma.." Journal of thoracic disease, vol. 17, no. 12, 2025, pp. 11057-11067.
PMID 41522139 ↗

Abstract

[BACKGROUND] Invasive pulmonary adenocarcinoma (IPA) poses a significant threat to global health and patients still experience tumor recurrence and metastasis. This study aimed to construct an optimized prognosis model using machine learning to predict the disease-free survival (DFS) of IPA patients.

[METHODS] A total of 670 resected IPA patients from 2015 to 2020 were enrolled. Clinicopathological information was collected and the outcomes of patients were followed up. Patients were divided into a training set and a test set at a ratio of 4:1. Four machine learning models were compared to build the DFS models and 5-fold cross validation was performed. The area under the receiver operating characteristic curve (AUC), C-index, calibration curves, and decision curve analysis (DCA) were used to evaluate the model.

[RESULTS] Among the four models, the least absolute shrinkage and selection operator (Lasso) model showed the best performance in predicting DFS at 2-year (training set: AUC =0.906, test set: AUC =0.862), at 3-year (training set: AUC =0.894, test set: AUC =0.879), at 4-year (training set: AUC =0.901, test set: AUC =0.902), and at 5-year (training set: AUC =0.927, test set: AUC =0.887). The calibration curves and DCA exhibited a good predictive performance.

[CONCLUSIONS] Our study successfully constructed a machine-learning based prognostic model to predict DFS, which may provide oncologists with an effective tool for early medical intervention and survival improvement.

🏷️ 키워드 / MeSH 📖 같은 키워드 OA만

같은 제1저자의 인용 많은 논문 (5)

📖 전문 본문 읽기 PMC JATS · ~45 KB · 영문

Introduction

Introduction
Invasive pulmonary adenocarcinoma (IPA) is the most prevalent subtype of lung cancer and poses a significant threat to global health. Although neoadjuvant and adjuvant immunotherapies or targeted therapies have remarkably improved patient health, some patients still experience tumor recurrence and metastasis (1). Therefore, it is imperative to create disease-free survival (DFS) prediction models for patients with IPA to accurately address their concerns about recurrence and guide individualized management.
The International Association for the Study of Lung Cancer (IASLC) has proposed an updated histological grading system to clearly discriminate the prognosis of heterogeneous IPA (2), and recent studies have validated its robustness among various populations (3-7), including patients in Asia, Europe, and America. Several prognostic models have been proposed for lung adenocarcinoma patients using histopathomics (8), genomics (9,10), transcriptomics (11,12), and radiomics (13,14). However, the clinical application of these models is not as convenient as that of the novel grading system. Additionally, several other clinicopathological factors may also affect the prognosis. Previous studies have shown that visceral pleural invasion (VPI) and lymphovascular invasion (LVI) are independent risk factors for early lung cancer after surgery, which is consistent with the results of our study (15-17). Carcinoembryonic antigen (CEA), as one of the most common tumor markers playing an important role in the screening and diagnosis of lung cancer, has also been shown to be an independent prognostic factor for resected non-small cell lung cancer (18,19). Hence, whether an integrative prognosis model with clinicopathological features, as well as the novel grading system, could enhance the classification ability warrants further exploration.
Machine learning, a cutting-edge branch of artificial intelligence (AI), has emerged as a crucial field of interest, offering sophisticated methods, techniques, and tools for analyzing data from the biological sciences. This technology can learn from data samples, enabling it to forecast individual patient survival and develop clinical AI models that boast significantly enhanced predictive accuracy (20). In light of these considerations, the aim of the present study was to construct a prognostic model of IPA patients to predict DFS through machine learning based on the novel grading system. We present this article in accordance with the TRIPOD+AI reporting checklist (available at https://jtd.amegroups.com/article/view/10.21037/jtd-2025-1669/rc).

Methods

Methods

Patient enrollment and screen
This retrospective study was approved by the Institutional Ethics Committee of Hangzhou First People’s Hospital (No. ZN-20230928-0224-01). Informed consent was waived due to the retrospective design of the study, in accordance with national legislation and institutional requirements. This study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. A total of 1,150 lung cancer patients who underwent surgical resection between January 2015 and October 2020 at the Department of Thoracic Surgery, Hangzhou First People’s Hospital were screened and 670 IPA patients were finally included in this retrospective study. The exclusion criteria were as follows: (I) squamous carcinoma, large cell carcinoma, carcinoid, and other types; (II) adenocarcinoma in situ (AIS), invasive mucinous adenocarcinoma (IMA), minimally invasive adenocarcinoma (MIA), and other variants of adenocarcinoma; (III) received neoadjuvant anti-tumor treatments; and (4) lost to follow-up.

Clinicopathological characteristics
Clinicopathological characteristics were obtained by reviewing patients’ electronic medical records, including age, gender, smoking history, package years (1 package-year = smoking 20 cigarettes/day for 1 year), tumor location, surgical procedure, p-TNM staging (according to the 8th edition of the American Joint Committee on Cancer staging manual), tumor size, tumor stage, nodal status, VPI, lymphovascular invasion (LVI), CEA levels, body mass index (BMI), and treatment history. DFS was recorded by clinic or telephone follow-up until September 2022. DFS was defined as the time from surgical resection to the first documented recurrence or death from any cause. Patients who died without documented recurrence were counted as events.

Histological evaluation and grading criteria
As proposed by the novel IASLC grading system (2), all pathological sections were reevaluated by professional pathologists, and the grades were defined as follows: grade 1, lepidic predominant tumor with no or less than 20% high-grade patterns (solid, micropapillary, or complex gland); grade 2, acinar or papillary predominant tumor, both with no or less than 20% high-grade patterns; and grade 3, any tumor with 20% or more high-grade patterns.

Model building
The Python package ‘scikit-survival’ (21) was employed in the model building and evaluation section. Survival models based on (I) the conventional Cox proportional hazard model; (II) penalized Cox model with least absolute shrinkage and selection operator (Lasso); (III) gradient boosted (GB) model, and (IV) random survival forest (RSF) model, were compared. All clinicopathological features including the new grade were input for model building except for the gene information since only 201/670 cases had complete gene mutation details, which may limit the integration of molecular predictors in the current study. A 5-fold cross-validation analysis was performed to maximize the Harrell Concordance index as the performance metric by repeating the cross-validation procedure 100 times and calculating the mean value of the classification accuracy. In our study, 105 recurrence events were observed among 670 patients. The final Lasso model included 16 predictors and the events per predictor (EPP) ratio was approximately 6.6, which is within the commonly recommended threshold of 5 to 10 for stable model development. All candidate predictor variables were fully observed in our study.

Model evaluation
A time-dependent receiver operating characteristic (ROC) curve was drawn, and the area under the ROC curve (AUC) as well as C-index were calculated to assess the prediction ability of the classifier. Calibration curve analysis was applied to estimate the calibration effect, and decision curve analysis (DCA) was utilized to assess the clinical value. SHapley Additive exPlanations (SHAP) is a method to explain the importance of the features in the classifications and thus a Python package ‘sHAP’ (22) was utilized to calculate the SHAP value of each variable in the model and observe the impact of the variable on the classification. Finally, the Python package ‘Pynomo’ (23) was applied to delineate nomograms for predicting survival. Patients in the test set were stratified into low risk (<10%), intermediate risk (10–20%), and high risk (>20%). The number of patients in each group, as well as sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV), were reported. Kaplan-Meier curves and calibration plots were generated for each risk stratum.

Code availability
The source code used in the manuscript are deposited at GitHub and freely accessible (https://github.com/Huang111999/LC_ML).

Statistical analysis
Categorical variables were expressed as percentages and evaluated using the Pearson Chi-squared test or Fisher’s exact test. DFS was estimated using Kaplan-Meier curves and survival differences were assessed using the log-rank test. All tests were two-sided, and a P value <0.05 was considered statistically significant.

Results

Results

Patient characteristics
A total of 670 IPA patients who underwent surgical resection between January 2015 and October 2020 at our site were finally included in this retrospective study (Figure 1). The baseline clinicopathological characteristics including the IASLC grading system of the training and test sets are shown in Table 1 and summarized below. There was no difference in baseline data (except for smoking history and lateral side) between the training and test sets.
Of these patients, 62.7% were female, 57.3% were older than 60 years old, and 74.9% were non-smokers. For the ever-smokers, the median package years were 35 (21.875, 45). Tumors originating in the right lung constituted 60.7% of the cases, and 66.7% tumors were situated in either the upper or middle lobes. In terms of the surgical procedures, about 75.8% of patients underwent lobectomy. Regarding tumor pathology, most tumors were described as grade 2 (53.3%), without VPI (73.3%), without LVI (93.9%), <2 cm (69.3%), pT1 (69.3%), pN0 (86.6%), and stage I (84.1%). Most patients had normal CEA levels (81.2%) while normal BMI values (61.2%). Median follow-up time was 29 months, and 105 (15.7%) recurrence events occurred by the last follow-up timepoint.

Construction of machine learning based model for predicting the DFS of IPA patients
In the training set, four models available in the scikit-survival package, including Cox proportional hazards (Cox), Lasso regression (Lasso), RSF, and gradient boosting (GB), were compared, and 5-fold cross-validation was performed for iterative testing and tuning. The model with the highest C-index was chosen for further validation. As shown in Figure 2A, the 2-year AUC values of Lasso, Cox, RSF, and GB models, as well as the grading system alone were 0.906, 0.895, 0.902, and 0.878, respectively. The 3-year AUC values of Lasso, Cox, RSF, and GB models were 0.894, 0.886, 0.884, and 0.870, respectively. The 4-year AUC values of Lasso, Cox, RSF, and GB models were 0.901, 0.893, 0.889, and 0.877, respectively. The 5-year AUC values of Lasso, Cox, RSF, and GB models were 0.927, 0.923, 0.908, and 0.894, respectively. All models had superior 2-, 3-, and 4-year AUC values than the grading system alone. The C-index values of Lasso, Cox, RSF, and GB models were 0.880, 0.870, 0.874, and 0.855, respectively. Hence, the Lasso model showed superior performance across almost all time points in predicting DFS when compared to the other models. During the Lasso regression model construction, the alpha value was set with the best C-index score, and the factors were entered into the model and nomogram development (Figure S1). The Lasso model performed well in predicting DFS of IPA patients in both the training and test set at 2-year (training set: AUC =0.906, test set: AUC =0.862), at 3-year (training set: AUC =0.894, test set: AUC =0.879), at 4-year (training set: AUC =0.901, test set: AUC =0.902), at 5-year (training set: AUC =0.927, test set: AUC =0.887) (Figure 2).

Evaluation of the constructed model
The Lasso model-related calibration curves displayed good consistency in the probability of 2-, 3-, 4-, and 5-year survival between the actual DFS and the predicted DFS in the training and test sets (Figure 3). Meanwhile, the DCA curves of 2-, 3-, 4-, and 5-year survival in the training and test sets also demonstrated good clinical utility, showing a preferable positive net benefit (Figure 4).
The contribution of predictors to the model was further evaluated using the SHAP value and the novel pathological grade showed the highest value, indicating its essential role in the model construction to predict DFS (Figure 5A). The nomogram of 2-, 3-, and 5-year DFS was then established and is shown in Figure 5B, and patients in the test set were then stratified into low risk (<10%), intermediate risk (10–20%), and high risk (>20%) groups based on the risk score. Notably, the DFS of the low-risk group was remarkably higher than that of the intermediate-risk group (P=0.003) and high-risk group (P<0.001) (Figure 5C). The model achieved a sensitivity of 0.75 and a specificity of 0.913 for predicting 3-year DFS with a good calibration (Figure S2), which indicated a strong potential for risk stratification.

Discussion

Discussion
In current clinical practice, genomic or transcriptomic profiling may greatly add to the economic burden and inconvenience. Therefore, reliable, convenient, and applicable models are clearly needed. In this study, we aimed to develop a prognostic model for patients with IPA based on the updated IASLC grading system and other clinically available clinicopathological factors.
To construct an accurate prognosis model, a machine learning algorithm was employed and different methods were compared. Lasso regression was found to be more suitable for model construction. Notably, 16 basic clinicopathological features of patients with IPA were considered in the final model, which could be helpful in providing a comprehensive and accurate prediction. Our findings revealed that the grading system, nodal status, tumor size, and CEA levels, were relatively vital variables affecting DFS. Among these, the grading system was the most important, further validating the importance of the updated IASLC grading system.
Nodal status and tumor size are well-established prognostic factors, reflecting tumor burden and dissemination potential. Elevated CEA levels likely indicate biological aggressiveness and subclinical metastasis. The integration of these factors enhances DFS prediction by capturing both histopathological severity and clinical indicators of tumor progression.
Recently, machine learning-based prognosis models have attracted increasing attention in clinical and translational cancer research. For instance, artificial neural networks or the ComplEx-N3 model have been employed to predict recurrence in patients with early-stage non-small cell lung cancer (24,25). However, recurrence was mistaken as an outcome without consideration of time support. Kinoshita et al. showed that the AI prognostic model of resected early stage non-small cell lung cancer using XGBoost exhibited good performance, with an AUC of DFS at 5 years of 0.890 (26), whereas our study focused on patients with early-stage IPA.
To the best of our knowledge, this is the first study to create AI prognostic models for patients with IPA by the updated IASLC grading system. Four machine learning algorithms were implemented, five-fold cross-validation for iterative testing and tuning was utilized, and the established model achieved robust performance in both the training and test sets. Of note, our model showed superior AUC values than the grading system alone, which implies that integration of other clinical parameters is also important to predict recurrence precisely. Instead of other reported models that integrate multi-omics data, which makes the prediction unaffordable, our model is convenient and could be freely and easily used by clinical oncologists or patients. Nonetheless, our study had some limitations. Only 15.7% of patients experienced recurrence events by the last follow-up timepoint, which may restrict model development and limit the model’s applicability. Notably, the impact of gene status on prognosis was not included because above half of patients did not perform the next-generation sequencing profiling. The sample size was limited, especially for Grade 3 cases and stages II and III samples. Additionally, all cases were from a single center, only internal validation was performed, and external data validation in the future may support the model in the clinical setting. Future studies should focus on: Multicenter external validation to confirm generalizability; Incorporation of gene mutation or multi-omics data to refine biological insight; Prospective studies to evaluate the model’s impact on clinical decision-making and patient outcomes.

Conclusions

Conclusions
Our study successfully constructed a prognostic model for risk stratification based on the grading system using a machine learning technique, which may provide oncologists and surgeons with an effective tool for recurrence prediction and early medical intervention to improve the survival of patients with IPA. Further external validation and integration of molecular data are warranted to enhance clinical applicability and generalizability.

Supplementary

Supplementary
The article’s supplementary files as

출처: PubMed Central (JATS). 라이선스는 원 publisher 정책을 따릅니다 — 인용 시 원문을 표기해 주세요.

🏷️ 같은 키워드 · 무료전문 — 이 논문 MeSH/keyword 기반

🟢 PMC 전문 열기