CT based quantification of intratumoral and peritumoral heterogeneity for diagnosing lymphovascular invasion for early stage non-small cell lung cancer.
1/5 보강
[OBJECTIVE] To establish a model integrates clinical, traditional radiologic, intratumoral and peritumoral radiomics (ITR and PTR), and intratumoral and peritumoral heterogeneity (ITH and PTH) feature
APA
Long Y, Zhang H, et al. (2025). CT based quantification of intratumoral and peritumoral heterogeneity for diagnosing lymphovascular invasion for early stage non-small cell lung cancer.. BMC medical imaging, 25(1), 505. https://doi.org/10.1186/s12880-025-02041-0
MLA
Long Y, et al.. "CT based quantification of intratumoral and peritumoral heterogeneity for diagnosing lymphovascular invasion for early stage non-small cell lung cancer.." BMC medical imaging, vol. 25, no. 1, 2025, pp. 505.
PMID
41420150 ↗
Abstract 한글 요약
[OBJECTIVE] To establish a model integrates clinical, traditional radiologic, intratumoral and peritumoral radiomics (ITR and PTR), and intratumoral and peritumoral heterogeneity (ITH and PTH) features to diagnose lymphovascular invasion (LVI) status for early stage non small cell lung cancer (NSCLC).
[MATERIALS AND METHODS] Clinical data and chest CT imaging data of NSCLC patients who underwent surgical resection of the lungs from January 2019 to May 2021 were collected. Surgical pathology were the diagnostic gold standard to clarify the LVI status. ITR and PTR features and ITH and PTH features from the total tumor volume and peritumoral tumor volume were extracted. Then clinical, traditional radiologic, ITR and PTR, ITH and PTH models were established to diagnose LVI status. Finally, a column chart diagnostic model was constructed and the diagnostic efficacy was evaluated.
[RESULTS] 366 NSCLC patients were enrolled in this retrospective study from 2 institutions, in which Institution 1 served as the basis for training ( = 154) and internal validation ( = 154) sets, while Institution 2 served as the external validation set ( = 58). In the three cohorts of PTR_(0–3, -3–3 and 0–6), the PTR_0–6 model has better predictive performance, with area under the curve (AUC) of 0.882 and 0.824 for the training and validation groups, respectively. Gender, Vascular Convergence Sign, and N stage were significantly related to LVI status, Finally, the combined model integrated ITH, PTR_0–6, and PTH_0–6 models, N stage and Vascular Convergence Sign has the highest diagnostic accuracy. The AUCs for training set, internal validation set, and external validation set were 0.963, 0.882, and 0.743, respectively.
[CONCLUSIONS] A comprehensive diagnostic model based on clinical features, traditional radiological features, radiomic features, and heterogeneity features of NSCLC were established to diagnose LVI for early stage NSCLC, which has the highest diagnostic efficiency and can help to guide treatment decisions.
[SUPPLEMENTARY INFORMATION] The online version contains supplementary material available at 10.1186/s12880-025-02041-0.
[MATERIALS AND METHODS] Clinical data and chest CT imaging data of NSCLC patients who underwent surgical resection of the lungs from January 2019 to May 2021 were collected. Surgical pathology were the diagnostic gold standard to clarify the LVI status. ITR and PTR features and ITH and PTH features from the total tumor volume and peritumoral tumor volume were extracted. Then clinical, traditional radiologic, ITR and PTR, ITH and PTH models were established to diagnose LVI status. Finally, a column chart diagnostic model was constructed and the diagnostic efficacy was evaluated.
[RESULTS] 366 NSCLC patients were enrolled in this retrospective study from 2 institutions, in which Institution 1 served as the basis for training ( = 154) and internal validation ( = 154) sets, while Institution 2 served as the external validation set ( = 58). In the three cohorts of PTR_(0–3, -3–3 and 0–6), the PTR_0–6 model has better predictive performance, with area under the curve (AUC) of 0.882 and 0.824 for the training and validation groups, respectively. Gender, Vascular Convergence Sign, and N stage were significantly related to LVI status, Finally, the combined model integrated ITH, PTR_0–6, and PTH_0–6 models, N stage and Vascular Convergence Sign has the highest diagnostic accuracy. The AUCs for training set, internal validation set, and external validation set were 0.963, 0.882, and 0.743, respectively.
[CONCLUSIONS] A comprehensive diagnostic model based on clinical features, traditional radiological features, radiomic features, and heterogeneity features of NSCLC were established to diagnose LVI for early stage NSCLC, which has the highest diagnostic efficiency and can help to guide treatment decisions.
[SUPPLEMENTARY INFORMATION] The online version contains supplementary material available at 10.1186/s12880-025-02041-0.
🏷️ 키워드 / MeSH 📖 같은 키워드 OA만
같은 제1저자의 인용 많은 논문 (5)
- A photodynamically activated nanoplatform relieves glucose-driven immunosuppression to potentiate STING immunotherapy in triple-negative breast cancer.
- Periodontitis Is Associated With Serum Prostate Specific Antigen Concentrations in Chinese Male.
- Tetrahydroberberine targets the bcl-2 promoter G-quadruplex to trigger mitochondrial apoptosis and inhibit nasopharyngeal carcinoma progression.
- Administration of multikinase inhibitor followed by radioiodine therapy for poorly differentiated thyroid cancer: a case description and systematic review.
- Early-life tobacco exposure, genetic susceptibility and incident colorectal cancer risk in UK biobank: A prospective cohort analysis.
📖 전문 본문 읽기 PMC JATS · ~73 KB · 영문
Introduction
Introduction
Cancer is a major societal, public health, and economic problem in the 21st century; and lung cancer has the highest morbidity and mortality among all cancer types globally [1]. Regrettably, about half of all new lung cancer cases were in Asia [2]. There are 2 main forms of lung cancer: non small cell lung cancer (NSCLC) and small cell lung cancer, and NSCLC accounts for 85% [3]. Treatment of NSCLC is stage specific. Patients with stage I or II should be treated with complete surgical resection when not contraindicated. Nonsurgical patients should be considered for radiotherapy or chemotherapy [3], Whereas 25%-70% of surgical patients eventually relapse despite complete resection, with 5-year survival of only 35%-65% [4]. Distant metastasis is the most common form of postoperative recurrence [5]. Studies have shown that cancer must first enter and spread to the entire vasculature before it can spread and metastasize through blood vessels or lymphatic vessels [6, 7] .
The detection of neoplastic cells in arterial, venous, or lymphatic lumen with H&E stain is called lymphovascular invasion (LVI) [8–10]. The presence of LVI in patients with lung cancer is associated with poor prognosis [11], and LVI is an independent prognostic factor for recurrence-free survival in patients with stage I lung adenocarcinoma undergoing lobectomy [12]. Studies have shown that LVI can be used as an indication for preoperative neoadjuvant chemotherapy in NSCLC patients. Preoperative neoadjuvant chemotherapy can reduce tumor volume, shorten tumor stage, and provide long-term survival benefit [13, 14]. Lobectomy plus lymphadenectomy is more effective in patients with LVI-positive NSCLC compared with subresection alone [15] .Therefore, it is very important to identify LVI before surgery or other therapy. At present, the diagnosis of LVI of lung cancer is mainly through HE staining, immunohistochemistry and special staining after operation. LVI is a histological condition that can only be recognized postoperatively by surgical specimens [16, 17].
NSCLC exhibits strong intratumor heterogeneity (ITH), and more heterogeneous tumors have higher invasiveness and metastatic potential [18]. Studies have shown that invasive tumors with LVI have significantly higher ITH scores than non-invasive tumors [19]. Because of different ITH, patients with other similar clinical features, such as clinical stage, molecular subtype, have different LVI status. Therefore, ITH could help to predict LVI status.
Previous studies have focused on the relationship between clinical outcomes and radiological features within the primary tumor volume, yet the lung parenchyma surrounding the primary tumor may also be involved in tumor invasion and metastasis [20, 21]. Pathological studies have shown that lung tumors can cause worse clinical manifestations through lymphatic metastasis, hematogenous metastasis, or direct invasion of surrounding lung tissues [22–29]. Furthermore, the association of tumor peripheral cancer cell infiltration with local recurrence or distant metastasis was significantly stronger compared with intratumoral cancer tissue [22, 25, 27].
Among preoperative non-invasive diagnostic methods, conventional computer tomography (CT) is the most commonly used one. But studies have shown that CT-based diagnosis of vascular and mediastinal tumor invasion has limited utility [30–32], as subtle differences in some of the underlying features cannot be identified with the naked eye. Radiomics, as an image analysis technique, can extract complex information that is difficult for the human eye to recognize or quantify from a diagnostic image and convert it into quantitative data [33, 34] .
Thus, based on the importance of diagnosing LVI before surgery and the lack of corresponding method, we establish a model integrating clinical, traditional radiologic, intratumoral and peritumoral radiomics (ITR and PTR), and intratumoral and peritumoral heterogeneity (ITH and PTH) features to diagnose LVI status for early-stage NSCLC.
Cancer is a major societal, public health, and economic problem in the 21st century; and lung cancer has the highest morbidity and mortality among all cancer types globally [1]. Regrettably, about half of all new lung cancer cases were in Asia [2]. There are 2 main forms of lung cancer: non small cell lung cancer (NSCLC) and small cell lung cancer, and NSCLC accounts for 85% [3]. Treatment of NSCLC is stage specific. Patients with stage I or II should be treated with complete surgical resection when not contraindicated. Nonsurgical patients should be considered for radiotherapy or chemotherapy [3], Whereas 25%-70% of surgical patients eventually relapse despite complete resection, with 5-year survival of only 35%-65% [4]. Distant metastasis is the most common form of postoperative recurrence [5]. Studies have shown that cancer must first enter and spread to the entire vasculature before it can spread and metastasize through blood vessels or lymphatic vessels [6, 7] .
The detection of neoplastic cells in arterial, venous, or lymphatic lumen with H&E stain is called lymphovascular invasion (LVI) [8–10]. The presence of LVI in patients with lung cancer is associated with poor prognosis [11], and LVI is an independent prognostic factor for recurrence-free survival in patients with stage I lung adenocarcinoma undergoing lobectomy [12]. Studies have shown that LVI can be used as an indication for preoperative neoadjuvant chemotherapy in NSCLC patients. Preoperative neoadjuvant chemotherapy can reduce tumor volume, shorten tumor stage, and provide long-term survival benefit [13, 14]. Lobectomy plus lymphadenectomy is more effective in patients with LVI-positive NSCLC compared with subresection alone [15] .Therefore, it is very important to identify LVI before surgery or other therapy. At present, the diagnosis of LVI of lung cancer is mainly through HE staining, immunohistochemistry and special staining after operation. LVI is a histological condition that can only be recognized postoperatively by surgical specimens [16, 17].
NSCLC exhibits strong intratumor heterogeneity (ITH), and more heterogeneous tumors have higher invasiveness and metastatic potential [18]. Studies have shown that invasive tumors with LVI have significantly higher ITH scores than non-invasive tumors [19]. Because of different ITH, patients with other similar clinical features, such as clinical stage, molecular subtype, have different LVI status. Therefore, ITH could help to predict LVI status.
Previous studies have focused on the relationship between clinical outcomes and radiological features within the primary tumor volume, yet the lung parenchyma surrounding the primary tumor may also be involved in tumor invasion and metastasis [20, 21]. Pathological studies have shown that lung tumors can cause worse clinical manifestations through lymphatic metastasis, hematogenous metastasis, or direct invasion of surrounding lung tissues [22–29]. Furthermore, the association of tumor peripheral cancer cell infiltration with local recurrence or distant metastasis was significantly stronger compared with intratumoral cancer tissue [22, 25, 27].
Among preoperative non-invasive diagnostic methods, conventional computer tomography (CT) is the most commonly used one. But studies have shown that CT-based diagnosis of vascular and mediastinal tumor invasion has limited utility [30–32], as subtle differences in some of the underlying features cannot be identified with the naked eye. Radiomics, as an image analysis technique, can extract complex information that is difficult for the human eye to recognize or quantify from a diagnostic image and convert it into quantitative data [33, 34] .
Thus, based on the importance of diagnosing LVI before surgery and the lack of corresponding method, we establish a model integrating clinical, traditional radiologic, intratumoral and peritumoral radiomics (ITR and PTR), and intratumoral and peritumoral heterogeneity (ITH and PTH) features to diagnose LVI status for early-stage NSCLC.
Methods
Methods
Patients
This retrospective study was approved by the Ethics Committee of Zhongnan Hospital of Wuhan University, with the ethical approval number 2,021,048 (IRB number). According to the CIMOS guidelines, informed consent was waived for this study. Patients pathologically confirmed for NSCLC were recruited from 2 institutions, Zhongnan Hospital of Wuhan University (Institution 1), and Wuhan Central Hospital (Institution 2). It involved collecting pathological data and chest CT images from patients (Figure S1). All patients had their surgeries within two weeks of their CT scans, and they also completed routine blood tests measuring tumor markers such as Carbohydrate Antigen 125 (CA125, reference value ≤ 35 U/mL), Carcinoembryonic Antigen (CEA, reference value 0.0–5.0 ng/mL), and Neuron Specific Enolase (NSE, reference value ≤ 16.0 ng/mL). The inclusion criteria were: (1) pathologically confirmed NSCLC; (2) availability of a pathological report with clear LVI status; (3) a diagnostic-compliant chest CT performed within two weeks prior to surgery; and (4) complete baseline clinical data. The exclusion criteria included: (1) lack of complete clinical-pathological data or preoperative CT images; (2) presence of other malignant tumors; and (3) prior anti-tumor treatment before surgery.
Data flowchart
As shown in Fig. 1, the data analysis process in this study consists of two main parts: clinical risk factors analysis/modeling and image data analysis/modeling. The analysis of clinical risk factors was performed in two stages: first through univariate analysis, followed by multivariate analysis to identify significant variables, and finally a multivariable logistic regression model (generalized linear model, binomial family with logit link) was constructed. Image data processing was carried out in three steps: preprocessing, segmentation, and model construction. Notably, regions of interest (ROI) within the tumor (intratumoral) and around the tumor (peritumoral) were analyzed respectively, and both the traditional radiomics models and the heterogeneity models that well reflect tumor heterogeneity were developed separately. Ultimately, integrating selected clinical risk factors with the intratumoral and peritumoral models, we established a combined model and drew the corresponding nomogram to enhance its clinical utility.
During data processing, all patients in institution 1 were randomly divided into training and internal validation sets in a 1:1 ratio using the random stratified sampling method, with stratification performed based on the LVI status. The training set was used for model development and construction, while the testing set served as an independent sample for model validation. All model performance metrics were calculated and compared across both the training and testing sets.
Image processing
Before the formal analysis of the images, all patients’ CT images underwent a two-step preprocessing: resampling and normalization of CT values. This ensures consistency and comparability of images acquired from different devices and batches, thereby enhancing the accuracy and generalizability of the diagnostic models. Specifically, we resampled all patients’ chest CT images to a resolution of 1 mm×1 mm×1 mm using linear interpolation, and normalized the CT values to a standard normal distribution N(0,1) using the Z-scoring method.
Subsequently, we manually segmented the tumor regions using ITK-SNAP (version 3.8.0; http://www.itksnap.org) to obtain the tumor regions of interest (ROIs). Based on this, the SimpleITK package was used to automatically contract the tumor ROI (gross tumor volume, GTV) by 3 mm or expand it by 3 mm and 6 mm, defining three different peritumoral tumor volume (PTV) areas (Fig. 2): from 3 mm inside to 3 mm outside the tumor boundary (PTV_-3-3), from the tumor boundary to 3 mm outside (PTV_0–3), and from the tumor boundary to 6 mm outside (PTV_0–6). We manually corrected the automatic expansion results to exclude adjacent pleura, bones and mediastinum included during the process. The delineation of the tumor ROI and the adjustment of the peritumoral areas were completed jointly by two radiologists under CT lung window settings (width 1500 HU; level − 500 HU), and the ICC method was used to ensure stability and repeatability of the results. This process ultimately generated one intratumoral ROI and three peritumoral ROIs for subsequent modeling and analysis.
Model construction
Firstly, univariate analysis was used to evaluate clinical variables and radiological semantic features, selecting variables with p < 0.05 as clinical risk factors. Subsequently, a multivariate logistic regression method was employed to construct the Clinical model.
Next, we conducted traditional radiomics modeling. For the four tumor ROIs (GTV, PTV_-3-3, PTV_0–3, and PTV_0–6), 105 radiomics features were extracted (details in the supplementary document). Feature reduction was performed as follows step by step: (1) features with variances below a threshold of 1 were removed using the variance threshold method; (2) highly correlated features (R > 0.9) were eliminated using the Pearson correlation analysis method; (3) features showing significant differences between classification groups were identified using the t-test (p < 0.05); (4) the least absolute shrinkage and selection operator (LASSO) method was used to select the optimal feature subset and construct the corresponding traditional radiomics models (ITR, PTR_-3-3, PTR_0–3, and PTR_0–6).
Following this, we constructed heterogeneity models. The predictive performance of traditional radiomics models based on three peritumoral ROIs was compared with determine the best peritumoral ROI. Then, intratumoral heterogeneity models (ITH) and peritumoral heterogeneity models (PTH) were developed separately for the intratumoral ROI and the optimal peritumoral ROI. The construction of heterogeneity models involved the following steps: (1) subregion segmentation of the tumor, objectively identifying and segmenting subregions inside the ROI to better understand the internal structure and heterogeneity of the tumor; (2) unsupervised clustering analysis, classifying tumor subregions with similar features to reveal different ecological “populations” within the tumor ROI; (3) calculation of ecological diversity indices for different ecological “populations” within the tumor ROI; (4) constructing the “Tumor Ecological Diversity Feature Vector” (TED) based on the ecological diversity indices, i.e., the heterogeneity model. For specific details on each step’s design and implementation, refer to the supplementary materials (Figure S1).
Finally, the selected clinical risk factors, intratumoral and peritumoral radiomics models (ITR and PTR), as well as intratumoral and peritumoral heterogeneity models (ITH and PTH) were used as independent risk factors for LVI. A multivariate logistic regression method (generalized linear model, binomial family, logit link) was utilized to construct a combined model, and a nomogram was created to visualize the final model, enhancing its clinical interpretability and usability.
Model validation
In this study, eight models were established: four traditional radiomics models (ITR, PTR_-3-3, PTR_0–3, and PTR_0–6) for the tumor ROI and three peritumoral ROIs, two heterogeneity models (ITH and PTH_0–6) for the intratumoral and the optimal peritumoral ROIs, a clinical model, and the final combined model. The predictive accuracy of these models was evaluated using ROC curves and their derived metrics, Area Under the Curve (AUC), accuracy (Acc), sensitivity (Sens), and specificity (Spec). The Delong test was used to determine whether differences in ROC curves between the models were significant. Additionally, the consistency between the model predictions and actual observations was assessed using calibration curves, and the models’ goodness of fit was examined with the Hosmer-Lemeshow (HL) test. Finally, decision curves were used to evaluate the clinical benefits of the models at different risk thresholds.
Statistics
Imaging processing, statistical analyses and modeling were conducted using Python (v3.8.10) and R (v4.1.2) software. The key package used were as follow: scikit-learn (v1.0.2) for the machine learning algorithm implement, SimpleITK (v2.1.1) the image processing, NumPy (v1.21.5) the matrix operation and Pandas (v1.4.2) the data frame operation.
For quantitative data following a normal distribution, the mean ± standard deviation (mean ± SD) was used for presentation, and comparisons between groups were performed using independent t-tests. For data not following a normal distribution, the median and interquartile range (IQR) were reported, and comparisons between groups were made using the Mann-Whitney U test. Categorical data were analyzed using the Chi-square test or Fisher’s exact test. A p-value of less than 0.05 was considered statistically significant. Benjamini-Hochberg procedures were used to adjust all p-values obtained from the multiple pairwise DeLong tests between different models to control the False Discovery Rate (FDR).
Patients
This retrospective study was approved by the Ethics Committee of Zhongnan Hospital of Wuhan University, with the ethical approval number 2,021,048 (IRB number). According to the CIMOS guidelines, informed consent was waived for this study. Patients pathologically confirmed for NSCLC were recruited from 2 institutions, Zhongnan Hospital of Wuhan University (Institution 1), and Wuhan Central Hospital (Institution 2). It involved collecting pathological data and chest CT images from patients (Figure S1). All patients had their surgeries within two weeks of their CT scans, and they also completed routine blood tests measuring tumor markers such as Carbohydrate Antigen 125 (CA125, reference value ≤ 35 U/mL), Carcinoembryonic Antigen (CEA, reference value 0.0–5.0 ng/mL), and Neuron Specific Enolase (NSE, reference value ≤ 16.0 ng/mL). The inclusion criteria were: (1) pathologically confirmed NSCLC; (2) availability of a pathological report with clear LVI status; (3) a diagnostic-compliant chest CT performed within two weeks prior to surgery; and (4) complete baseline clinical data. The exclusion criteria included: (1) lack of complete clinical-pathological data or preoperative CT images; (2) presence of other malignant tumors; and (3) prior anti-tumor treatment before surgery.
Data flowchart
As shown in Fig. 1, the data analysis process in this study consists of two main parts: clinical risk factors analysis/modeling and image data analysis/modeling. The analysis of clinical risk factors was performed in two stages: first through univariate analysis, followed by multivariate analysis to identify significant variables, and finally a multivariable logistic regression model (generalized linear model, binomial family with logit link) was constructed. Image data processing was carried out in three steps: preprocessing, segmentation, and model construction. Notably, regions of interest (ROI) within the tumor (intratumoral) and around the tumor (peritumoral) were analyzed respectively, and both the traditional radiomics models and the heterogeneity models that well reflect tumor heterogeneity were developed separately. Ultimately, integrating selected clinical risk factors with the intratumoral and peritumoral models, we established a combined model and drew the corresponding nomogram to enhance its clinical utility.
During data processing, all patients in institution 1 were randomly divided into training and internal validation sets in a 1:1 ratio using the random stratified sampling method, with stratification performed based on the LVI status. The training set was used for model development and construction, while the testing set served as an independent sample for model validation. All model performance metrics were calculated and compared across both the training and testing sets.
Image processing
Before the formal analysis of the images, all patients’ CT images underwent a two-step preprocessing: resampling and normalization of CT values. This ensures consistency and comparability of images acquired from different devices and batches, thereby enhancing the accuracy and generalizability of the diagnostic models. Specifically, we resampled all patients’ chest CT images to a resolution of 1 mm×1 mm×1 mm using linear interpolation, and normalized the CT values to a standard normal distribution N(0,1) using the Z-scoring method.
Subsequently, we manually segmented the tumor regions using ITK-SNAP (version 3.8.0; http://www.itksnap.org) to obtain the tumor regions of interest (ROIs). Based on this, the SimpleITK package was used to automatically contract the tumor ROI (gross tumor volume, GTV) by 3 mm or expand it by 3 mm and 6 mm, defining three different peritumoral tumor volume (PTV) areas (Fig. 2): from 3 mm inside to 3 mm outside the tumor boundary (PTV_-3-3), from the tumor boundary to 3 mm outside (PTV_0–3), and from the tumor boundary to 6 mm outside (PTV_0–6). We manually corrected the automatic expansion results to exclude adjacent pleura, bones and mediastinum included during the process. The delineation of the tumor ROI and the adjustment of the peritumoral areas were completed jointly by two radiologists under CT lung window settings (width 1500 HU; level − 500 HU), and the ICC method was used to ensure stability and repeatability of the results. This process ultimately generated one intratumoral ROI and three peritumoral ROIs for subsequent modeling and analysis.
Model construction
Firstly, univariate analysis was used to evaluate clinical variables and radiological semantic features, selecting variables with p < 0.05 as clinical risk factors. Subsequently, a multivariate logistic regression method was employed to construct the Clinical model.
Next, we conducted traditional radiomics modeling. For the four tumor ROIs (GTV, PTV_-3-3, PTV_0–3, and PTV_0–6), 105 radiomics features were extracted (details in the supplementary document). Feature reduction was performed as follows step by step: (1) features with variances below a threshold of 1 were removed using the variance threshold method; (2) highly correlated features (R > 0.9) were eliminated using the Pearson correlation analysis method; (3) features showing significant differences between classification groups were identified using the t-test (p < 0.05); (4) the least absolute shrinkage and selection operator (LASSO) method was used to select the optimal feature subset and construct the corresponding traditional radiomics models (ITR, PTR_-3-3, PTR_0–3, and PTR_0–6).
Following this, we constructed heterogeneity models. The predictive performance of traditional radiomics models based on three peritumoral ROIs was compared with determine the best peritumoral ROI. Then, intratumoral heterogeneity models (ITH) and peritumoral heterogeneity models (PTH) were developed separately for the intratumoral ROI and the optimal peritumoral ROI. The construction of heterogeneity models involved the following steps: (1) subregion segmentation of the tumor, objectively identifying and segmenting subregions inside the ROI to better understand the internal structure and heterogeneity of the tumor; (2) unsupervised clustering analysis, classifying tumor subregions with similar features to reveal different ecological “populations” within the tumor ROI; (3) calculation of ecological diversity indices for different ecological “populations” within the tumor ROI; (4) constructing the “Tumor Ecological Diversity Feature Vector” (TED) based on the ecological diversity indices, i.e., the heterogeneity model. For specific details on each step’s design and implementation, refer to the supplementary materials (Figure S1).
Finally, the selected clinical risk factors, intratumoral and peritumoral radiomics models (ITR and PTR), as well as intratumoral and peritumoral heterogeneity models (ITH and PTH) were used as independent risk factors for LVI. A multivariate logistic regression method (generalized linear model, binomial family, logit link) was utilized to construct a combined model, and a nomogram was created to visualize the final model, enhancing its clinical interpretability and usability.
Model validation
In this study, eight models were established: four traditional radiomics models (ITR, PTR_-3-3, PTR_0–3, and PTR_0–6) for the tumor ROI and three peritumoral ROIs, two heterogeneity models (ITH and PTH_0–6) for the intratumoral and the optimal peritumoral ROIs, a clinical model, and the final combined model. The predictive accuracy of these models was evaluated using ROC curves and their derived metrics, Area Under the Curve (AUC), accuracy (Acc), sensitivity (Sens), and specificity (Spec). The Delong test was used to determine whether differences in ROC curves between the models were significant. Additionally, the consistency between the model predictions and actual observations was assessed using calibration curves, and the models’ goodness of fit was examined with the Hosmer-Lemeshow (HL) test. Finally, decision curves were used to evaluate the clinical benefits of the models at different risk thresholds.
Statistics
Imaging processing, statistical analyses and modeling were conducted using Python (v3.8.10) and R (v4.1.2) software. The key package used were as follow: scikit-learn (v1.0.2) for the machine learning algorithm implement, SimpleITK (v2.1.1) the image processing, NumPy (v1.21.5) the matrix operation and Pandas (v1.4.2) the data frame operation.
For quantitative data following a normal distribution, the mean ± standard deviation (mean ± SD) was used for presentation, and comparisons between groups were performed using independent t-tests. For data not following a normal distribution, the median and interquartile range (IQR) were reported, and comparisons between groups were made using the Mann-Whitney U test. Categorical data were analyzed using the Chi-square test or Fisher’s exact test. A p-value of less than 0.05 was considered statistically significant. Benjamini-Hochberg procedures were used to adjust all p-values obtained from the multiple pairwise DeLong tests between different models to control the False Discovery Rate (FDR).
Results
Results
Patients
In this study, institution 1 included 308 patients with NSCLC, of whom 117 were LVI-positive and 191 were LVI-negative. After a 1:1 random assignment, there were 154 patients in the training group and 154 in the internal validation group. The training group consisted of 59 LVI-positive patients and 95 LVI-negative patients, while the internal validation group included 58 LVI-positive patients and 96 LVI-negative patients. Institution 2, serving as the external validation set, included 58 patients with NSCLC, among whom 28 were LVI-positive patients and 30 were LVI-negative patients. Clinical imaging characteristics of the patients are detailed in Table 1. The patients enrollment details are shown in Figure S2.
Clinical risk factors and modeling
In the univariate analysis of the training group, significant statistical differences were observed between the LVI-positive and LVI-negative groups in terms of gender and CEA levels (p < 0.05). In semantic features, significant differences were noted between the two groups in TNM stage, N stage, Vascular Convergence Sign (VCS), and Vacuole Sign (p < 0.05). All clinical and semantic significant variables were included into a multivariable logistic regression model to construct the Clinical model. According to the results of the GLM analysis (Table S1), gender, VCS, and N stage were selected for the Clinical model construction, in which N stage showed the most significant correlation with LVI status (OR = 1.62, 95% CI: 1.30–2.00, p < 0.001).
Radiomics and heterogeneity modeling
This study extracted 105 radiomic features from four regional ROIs (GTV, PTV_0–3, PTV_-3-3, and PTV_0–6). Initially, features with ICC values below 0.75 were excluded. Subsequently, a four-step feature selection process reduced the number of features for GTV (Table S2), PTV_0–3, PTV_-3-3, and PTV_0–6 (Table S3) to 3, 9, 11, and 10, respectively.
Among the three PTV models, the PTV_0–6 model exhibited AUC values of 0.882 in the training group and 0.824 in the validation group, significantly outperforming (DeLong test: p < 0.05) the PTV_-3-3 model (AUCs of 0.771 and 0.696) and the PTV_0–3 model (AUCs of 0.805 and 0.798). Additionally, the accuracy, sensitivity, and specificity of the three models were comprehensively evaluated. Table 2 (Tables S4-S5) indicates that the performance of the PTV_0–6 model was superior to the other two models.
Consequently, intratumoral ITH models and peritumoral PTH models were constructed from the GTV and PTV_0–6 ROIs, respectively. Initially, tumor subregions were identified within the GTV and PTV_0–6 ROIs using a simple linear interactive clustering method. Radiomic features were then extracted, and unsupervised clustering was performed on similar feature-bearing subregions using a Gaussian mixture model. Dimensionality reduction was subsequently carried out using the minimum redundancy and maximum relevance method, and ultimately, five imaging features were selected to represent the spatial heterogeneity of the GTV (Table S6)) and PTV_0–6 (Table S7), and finally we constructed the ITH and PTH_0–6 models, respectively.
Finally in the combined model, a multivariable logistic regression framework (generalized linear model with binomial family and logit link) was applied by integrating the selected clinical, radiomics (ITR and PTR), and heterogeneity (ITH and PTH) predictors. According to the regression results (Table S8), vascular convergence sign (VCS), N stage, PTR 0–6, PTH 0–6, and ITH were retained as independent predictors of LVI (p < 0.05). Among them, PTR 0–6 (OR = 1.04, 95% CI: 1.02–1.06, p < 0.001) and N stage (OR = 1.32, 95% CI: 1.14–1.52, p < 0.001) showed the strongest associations with LVI status. The comprehensive nomogram diagnostic model (Fig. 3) was then drawn. In this nomogram, each “independent risk factor” in a patient’s lesion is converted into a score based on the model’s “Points” column. The sum of the scores from the five risk factors for the lesion is listed in the “Total Points” row, and the corresponding probability in the “Risk of metastasis” column represents the probability of LVI for that lesion.
Models validation and comparison
Figure 4 displays the ROC curves for the constructed intratumoral and peritumoral traditional radiomics models, heterogeneity models, clinical models, and combined models, while Table 3 (Tables S9-S11) lists more comprehensive ROC-related metrics (AUC, accuracy, sensitivity, and specificity) for these models. Figure 5 and S3-S7 showed the calibration curves with H-L test results for all six constructed models. The analysis of these figures and tables reveals:
(1) Combined Model is Best Overall, but PTR_0–6 is More Robust: The Combined model showed the highest predictive power in the training (AUC 0.962) and internal validation (AUC 0.882) sets. However, the PTR_0–6 (peritumoral traditional radiomics) model demonstrated more stable and reliable performance across all three datasets, especially in the external validation set (AUC 0.812).
(2) Peritumoral is Better Than Intratumoral: Models built using features from the area around the tumor (peritumoral) consistently outperformed models based on features from within the tumor (intratumoral). This was true for both traditional radiomics (PTR_0–6 vs. ITR) and heterogeneity models (PTH_0–6 vs. ITH).
(3) Traditional Radiomics Outperforms Heterogeneity: For both tumor and peritumoral regions, traditional radiomics models (PTR_0–6, ITR) achieved better predictive performance than the heterogeneity models (PTH_0–6, ITH).
In summary, although the Combined model shows absolute superiority in training and internal validation, its performance stability in external validation is not as robust as the PTR_0–6 (peritumoral traditional radiomics) model. Information from the peritumoral region demonstrates greater predictive value compared to intratumoral information, and traditional radiomics features are more effective than the heterogeneity features in this study.
In the training cohort, the calibration curve for the Combined Model (Fig. 5A) demonstrates excellent agreement between the predicted LVI risk and the observed frequency, with the curve closely aligning with the ideal 45-degree diagonal line. The accompanying Hosmer-Lemeshow test yielded a non-significant p-value of 0.813 (p > 0.05), indicating no statistical evidence of poor fit and confirming the model’s excellent calibration. Similarly, in the internal and external validation cohort (Fig. 5B and C), the model maintained good calibration, with the HL test showing a p-value of 0.704 (p > 0.05), which suggests the model generalizes well to new data.
A comprehensive review of all constructed models (Figures S3-S7) shows that all six models exhibited good calibration in both the training and validation sets, with all Hosmer-Lemeshow tests returning p-values greater than 0.05. This consistency across models further strengthens our confidence in their predictive reliability. Overall, these results confirm that our final Combined Model is not only highly discriminative but also well-calibrated, making it a reliable tool for clinical risk estimation.
Clinical use, benefit and explanation
To illustrate the clinical use of the constructed nomogram, we listed two examples in Fig. 6. As shown in the figure, we found that we could easily obtain the risk probability of LVI for the NSCLC patient by combining the clinical risk factors and radiomics and heterogeneity models constructed based on their CT images. Case 1 achieved a very high-risk score and was proven to be LVI-positive pathologically, whereas case 2 was categorized as LVI-negative (low risk score).
Figure 7. displays the Decision Curve Analysis (DCA) curves for the constructed intratumoral and peritumoral traditional radiomics models, heterogeneity models, clinical models, and combined models. The observations from the figure are as follows: (1) Across most risk threshold ranges, using these six models yields higher net benefits compared to the “treat all” and “treat none” strategies. (2) Except in the risk threshold range of approximately 0.83–0.9, the combined model achieves the highest net benefit in other ranges. (3) Across most risk threshold ranges, the clinical benefits of the models align with the conclusions drawn from model comparisons in the previous subsection: the combined model shows the highest net benefit, the peritumoral model achieves higher net benefits than the intratumoral model, and the traditional radiomics model provides higher net benefits than the heterogeneity model.
Patients
In this study, institution 1 included 308 patients with NSCLC, of whom 117 were LVI-positive and 191 were LVI-negative. After a 1:1 random assignment, there were 154 patients in the training group and 154 in the internal validation group. The training group consisted of 59 LVI-positive patients and 95 LVI-negative patients, while the internal validation group included 58 LVI-positive patients and 96 LVI-negative patients. Institution 2, serving as the external validation set, included 58 patients with NSCLC, among whom 28 were LVI-positive patients and 30 were LVI-negative patients. Clinical imaging characteristics of the patients are detailed in Table 1. The patients enrollment details are shown in Figure S2.
Clinical risk factors and modeling
In the univariate analysis of the training group, significant statistical differences were observed between the LVI-positive and LVI-negative groups in terms of gender and CEA levels (p < 0.05). In semantic features, significant differences were noted between the two groups in TNM stage, N stage, Vascular Convergence Sign (VCS), and Vacuole Sign (p < 0.05). All clinical and semantic significant variables were included into a multivariable logistic regression model to construct the Clinical model. According to the results of the GLM analysis (Table S1), gender, VCS, and N stage were selected for the Clinical model construction, in which N stage showed the most significant correlation with LVI status (OR = 1.62, 95% CI: 1.30–2.00, p < 0.001).
Radiomics and heterogeneity modeling
This study extracted 105 radiomic features from four regional ROIs (GTV, PTV_0–3, PTV_-3-3, and PTV_0–6). Initially, features with ICC values below 0.75 were excluded. Subsequently, a four-step feature selection process reduced the number of features for GTV (Table S2), PTV_0–3, PTV_-3-3, and PTV_0–6 (Table S3) to 3, 9, 11, and 10, respectively.
Among the three PTV models, the PTV_0–6 model exhibited AUC values of 0.882 in the training group and 0.824 in the validation group, significantly outperforming (DeLong test: p < 0.05) the PTV_-3-3 model (AUCs of 0.771 and 0.696) and the PTV_0–3 model (AUCs of 0.805 and 0.798). Additionally, the accuracy, sensitivity, and specificity of the three models were comprehensively evaluated. Table 2 (Tables S4-S5) indicates that the performance of the PTV_0–6 model was superior to the other two models.
Consequently, intratumoral ITH models and peritumoral PTH models were constructed from the GTV and PTV_0–6 ROIs, respectively. Initially, tumor subregions were identified within the GTV and PTV_0–6 ROIs using a simple linear interactive clustering method. Radiomic features were then extracted, and unsupervised clustering was performed on similar feature-bearing subregions using a Gaussian mixture model. Dimensionality reduction was subsequently carried out using the minimum redundancy and maximum relevance method, and ultimately, five imaging features were selected to represent the spatial heterogeneity of the GTV (Table S6)) and PTV_0–6 (Table S7), and finally we constructed the ITH and PTH_0–6 models, respectively.
Finally in the combined model, a multivariable logistic regression framework (generalized linear model with binomial family and logit link) was applied by integrating the selected clinical, radiomics (ITR and PTR), and heterogeneity (ITH and PTH) predictors. According to the regression results (Table S8), vascular convergence sign (VCS), N stage, PTR 0–6, PTH 0–6, and ITH were retained as independent predictors of LVI (p < 0.05). Among them, PTR 0–6 (OR = 1.04, 95% CI: 1.02–1.06, p < 0.001) and N stage (OR = 1.32, 95% CI: 1.14–1.52, p < 0.001) showed the strongest associations with LVI status. The comprehensive nomogram diagnostic model (Fig. 3) was then drawn. In this nomogram, each “independent risk factor” in a patient’s lesion is converted into a score based on the model’s “Points” column. The sum of the scores from the five risk factors for the lesion is listed in the “Total Points” row, and the corresponding probability in the “Risk of metastasis” column represents the probability of LVI for that lesion.
Models validation and comparison
Figure 4 displays the ROC curves for the constructed intratumoral and peritumoral traditional radiomics models, heterogeneity models, clinical models, and combined models, while Table 3 (Tables S9-S11) lists more comprehensive ROC-related metrics (AUC, accuracy, sensitivity, and specificity) for these models. Figure 5 and S3-S7 showed the calibration curves with H-L test results for all six constructed models. The analysis of these figures and tables reveals:
(1) Combined Model is Best Overall, but PTR_0–6 is More Robust: The Combined model showed the highest predictive power in the training (AUC 0.962) and internal validation (AUC 0.882) sets. However, the PTR_0–6 (peritumoral traditional radiomics) model demonstrated more stable and reliable performance across all three datasets, especially in the external validation set (AUC 0.812).
(2) Peritumoral is Better Than Intratumoral: Models built using features from the area around the tumor (peritumoral) consistently outperformed models based on features from within the tumor (intratumoral). This was true for both traditional radiomics (PTR_0–6 vs. ITR) and heterogeneity models (PTH_0–6 vs. ITH).
(3) Traditional Radiomics Outperforms Heterogeneity: For both tumor and peritumoral regions, traditional radiomics models (PTR_0–6, ITR) achieved better predictive performance than the heterogeneity models (PTH_0–6, ITH).
In summary, although the Combined model shows absolute superiority in training and internal validation, its performance stability in external validation is not as robust as the PTR_0–6 (peritumoral traditional radiomics) model. Information from the peritumoral region demonstrates greater predictive value compared to intratumoral information, and traditional radiomics features are more effective than the heterogeneity features in this study.
In the training cohort, the calibration curve for the Combined Model (Fig. 5A) demonstrates excellent agreement between the predicted LVI risk and the observed frequency, with the curve closely aligning with the ideal 45-degree diagonal line. The accompanying Hosmer-Lemeshow test yielded a non-significant p-value of 0.813 (p > 0.05), indicating no statistical evidence of poor fit and confirming the model’s excellent calibration. Similarly, in the internal and external validation cohort (Fig. 5B and C), the model maintained good calibration, with the HL test showing a p-value of 0.704 (p > 0.05), which suggests the model generalizes well to new data.
A comprehensive review of all constructed models (Figures S3-S7) shows that all six models exhibited good calibration in both the training and validation sets, with all Hosmer-Lemeshow tests returning p-values greater than 0.05. This consistency across models further strengthens our confidence in their predictive reliability. Overall, these results confirm that our final Combined Model is not only highly discriminative but also well-calibrated, making it a reliable tool for clinical risk estimation.
Clinical use, benefit and explanation
To illustrate the clinical use of the constructed nomogram, we listed two examples in Fig. 6. As shown in the figure, we found that we could easily obtain the risk probability of LVI for the NSCLC patient by combining the clinical risk factors and radiomics and heterogeneity models constructed based on their CT images. Case 1 achieved a very high-risk score and was proven to be LVI-positive pathologically, whereas case 2 was categorized as LVI-negative (low risk score).
Figure 7. displays the Decision Curve Analysis (DCA) curves for the constructed intratumoral and peritumoral traditional radiomics models, heterogeneity models, clinical models, and combined models. The observations from the figure are as follows: (1) Across most risk threshold ranges, using these six models yields higher net benefits compared to the “treat all” and “treat none” strategies. (2) Except in the risk threshold range of approximately 0.83–0.9, the combined model achieves the highest net benefit in other ranges. (3) Across most risk threshold ranges, the clinical benefits of the models align with the conclusions drawn from model comparisons in the previous subsection: the combined model shows the highest net benefit, the peritumoral model achieves higher net benefits than the intratumoral model, and the traditional radiomics model provides higher net benefits than the heterogeneity model.
Discussion
Discussion
In this study, a model combining clinical data, radiologic features, radiomic features and tumor heterogeneity features was established to diagnose LVI status for early stage NSCLC. ITR, ITH and PTR, PTH models were established by analyzing the subregions of tumor and peritumor on CT images. The final model included VCS, N stage, ITH model, PTR_0 + 6 model and PTH_0 + 6 model, which was used to quantify the probability of LVI in patients. This model has high diagnostic accuracy, with an AUC of 0.962(95% CI: 0.937–0.988) in the training group, 0.882(95% CI: 0.831–0.933) in the internal validation group, and 0.743(95% CI:0.607–0.879) in the external validation group.
Several previous studies have constructed models to diagnose LVI status of NSCLC by either imaging features or combining imaging features with clinical features. Chen et al. [35] developed an LVI predictive model combined independent predictors (smoking and clinical stage) and the GPTV9 radiomic score, with an AUC of 0.89,0.83, and 0.66 in the training, internal validation, and external validation groups, respectively. Zhang et al. [36] developed an LVI predictive model based on 2D and 3D tumor and peritumoral features, with an average AUC of 0.759. The AUC of the synthetic model established in this study is respectively 0.962, 0.882, and 0.743 in the training, internal validation, and external set, which was better than the previous studies. The reason may be that we not only integrated clinical data, imaging features, ITR and PTR features, but also added ITH and PTH features.
NSCLC presents a large intratumor heterogeneity [37]. A good ITH model requires both imaging features and their spatial distribution. However, current research typically only captures a portion of the information. The definition of computational features in traditional radiomics involves the assumption of uniform distribution heterogeneity, without quantifying the local features of tumors [38]. While the ITH model uses the intensity of different tumors to group pixels and identify similar subregions, and associates the statistical features of each sub region (such as its LVI state), while integrating local radiomic features and global pixel distribution patterns. The model established by Shi et al. [39] integrates the clinicopathological information, ITH characteristics and c-radiation group characteristics, and has a good prediction effect on the complete remission of neoadjuvant chemotherapy pathology in breast cancer patients (AUC values were 0.83–0.87 in the test set). However, the AUC of the combined model of clinical imaging and traditional histology was only 0.78–0.81 in the test set, and 0.74–0.76 in ITH model alone. Thus, the addition of ITH features to the pathologic complete response (pCR) prediction model of breast cancer may increase clinical utility.
Despite the ITH model in this study has a poor ability to discriminate the LVI state, the AUC of internal and external validation groups in GTV-ITH and PTH_0-6models were 0.607, 0.585 and 0.692, 0.708, respectively. The AUC in GTV and PTR_0–6 models were 0.665, 0.690 and 0.824, 0.812, respectively. However, when we add the heterogeneity model to the traditional radiomic model, the performance of the combinatorial model is greatly improved, and the AUC value reaches 0.882, 0.743. It may be that ITH models use multi-region image features to characterize intratumoral spatial heterogeneity, which could further refine the shortcomings of traditional radiomic models, adding abundant information on tumor heterogeneity within the ROI.
Tumor cells are often highly aggressive, disrupting the normal structure of surrounding parenchymal tissue, leading to carcinogenic infiltration of small blood vessels and lymphatic vessels around the lesion, which is often overlooked in studies that focus on intratumoral areas [24, 40]. Therefore, detecting the boundary of lung cancer may help to quantify the invasiveness of the tumor. The study of NSCLC prediction by peritumoral imaging features has become an active and important field. However, the use of peritumoral imaging features in predicting the invasiveness of NSCLC has only been studied in a few studies. Previous studies defined the peritumoral area as 1.5 to 20 mm [24, 41–43]. One study quantified the distance to micrometastases in histopathology lung cancer, resulting in mean distances of 2.94 mm and 2.69 mm for adenocarcinoma and squamous cell carcinoma of the lung to surrounding micrometastases, respectively [44]. Based on this, we defined the peritumoral extent on a 3 mm gradient to explore the relationship between vascular invasion status in NSCLC and peritumoral imaging. The results showed that compared with PTR_0–3 and PTR_-3–3, the AUC of PTR_0–6 has the highest diagnostic value (0.696,0.798 and 0.824). The reason may be that the farther away from the tumor, the higher the reproducibility of imaging features. This finding may be related to the presence of homogeneous lung parenchyma in the distal peritumour [42]. Therefore, PTR_0–6 model performs better than other models in our study. In addition, the performance of the peritumoral model is generally better than that of the intratumoral model in this study, whether it is the traditional radiomic model or the heterogeneity model. Consistent with the findings of Dou TH, et al. [45], their model for predicting distant metastasis in NSCLC patients has a higher prognostic value for tumor marginal radiological features than for tumor radiological features alone; the comparison between the two was statistically significant (p = 0.048) .This may be due to the presence of increased cancer invasion and metastatic activity around the tumour, such as epithelial-mesenchymal transition [46], Tumor-associated macrophages [47, 48], tumor budding [49] and lymphatic vascular invasion [50, 51].
GLM model analysis found that vessel convergence and N stage were independent predictors of LVI, and gender was a marginal correlation factor of LVI. This study established a clinical-imaging model based on this. Of these, N stage was most strongly associated with LVI status in NSCLC patients, as LVI is an initial manifestation of nodal metastasis [52]. The vessel convergence sign is the manifestation of vascular structure being pulled by the focus to concentrate in the direction of the focus or truncated through the focus or at the edge of the focus. This is mainly due to the fact that the Vascular endothelial growth factor secreted by malignant nodules through cancer cells promotes the growth of microvessels in tumor tissues and the destruction of some blood vessel walls, which is consistent with the results of this study. A study [53] has found that the density of memory B cells, which play an important role in human anti-tumor immunity, is higher in lung adenocarcinoma tissues of female patients. Additional studies have suggested that lung cancer in women may have a different genetic profile from that in men because of different natural histories as well as female characteristics (younger age at diagnosis, non-smokers, more likely to have adenocarcinoma than men) [54, 55]. This may explain some of the gender differences observed in this study regarding tumor invasion of vessels.
However, this study has some limitations. Firstly, in this study, a semi-automatic method was used to segment the ROI, which may lead to artificial differences. An accurate automatic segmentation method should be considered in future research. Secondly, we only performed radiomics feature extraction on CT plain scan images. CT enhancement or PET images may contain additional valuable information. Thirdly, due to the small sample size in the external validation set, the predictive efficiency of the external validation cohort may be limited. More large-scale studies are needed in the future to further validate.
In summary, we have constructed a total of 6 models, and within most risk threshold ranges, using these 6 models can achieve higher net benefits than the “all treatment” and “no treatment at all” strategies. Among them, the combination model that integrates clinical data, traditional radiological features, peritumoral radiomic features, and peritumoral heterogeneity features has the highest diagnostic accuracy for the LVI status in early stage NSCLC patients. And DCA analysis shows that within most risk thresholds, combining multiple models improves the clinical value of the models. The comprehensive nomogram model not only enables non-invasive preoperative risk assessment of lung lesions, but also helps to provide objective guidance for rational clinical decision-making.
In this study, a model combining clinical data, radiologic features, radiomic features and tumor heterogeneity features was established to diagnose LVI status for early stage NSCLC. ITR, ITH and PTR, PTH models were established by analyzing the subregions of tumor and peritumor on CT images. The final model included VCS, N stage, ITH model, PTR_0 + 6 model and PTH_0 + 6 model, which was used to quantify the probability of LVI in patients. This model has high diagnostic accuracy, with an AUC of 0.962(95% CI: 0.937–0.988) in the training group, 0.882(95% CI: 0.831–0.933) in the internal validation group, and 0.743(95% CI:0.607–0.879) in the external validation group.
Several previous studies have constructed models to diagnose LVI status of NSCLC by either imaging features or combining imaging features with clinical features. Chen et al. [35] developed an LVI predictive model combined independent predictors (smoking and clinical stage) and the GPTV9 radiomic score, with an AUC of 0.89,0.83, and 0.66 in the training, internal validation, and external validation groups, respectively. Zhang et al. [36] developed an LVI predictive model based on 2D and 3D tumor and peritumoral features, with an average AUC of 0.759. The AUC of the synthetic model established in this study is respectively 0.962, 0.882, and 0.743 in the training, internal validation, and external set, which was better than the previous studies. The reason may be that we not only integrated clinical data, imaging features, ITR and PTR features, but also added ITH and PTH features.
NSCLC presents a large intratumor heterogeneity [37]. A good ITH model requires both imaging features and their spatial distribution. However, current research typically only captures a portion of the information. The definition of computational features in traditional radiomics involves the assumption of uniform distribution heterogeneity, without quantifying the local features of tumors [38]. While the ITH model uses the intensity of different tumors to group pixels and identify similar subregions, and associates the statistical features of each sub region (such as its LVI state), while integrating local radiomic features and global pixel distribution patterns. The model established by Shi et al. [39] integrates the clinicopathological information, ITH characteristics and c-radiation group characteristics, and has a good prediction effect on the complete remission of neoadjuvant chemotherapy pathology in breast cancer patients (AUC values were 0.83–0.87 in the test set). However, the AUC of the combined model of clinical imaging and traditional histology was only 0.78–0.81 in the test set, and 0.74–0.76 in ITH model alone. Thus, the addition of ITH features to the pathologic complete response (pCR) prediction model of breast cancer may increase clinical utility.
Despite the ITH model in this study has a poor ability to discriminate the LVI state, the AUC of internal and external validation groups in GTV-ITH and PTH_0-6models were 0.607, 0.585 and 0.692, 0.708, respectively. The AUC in GTV and PTR_0–6 models were 0.665, 0.690 and 0.824, 0.812, respectively. However, when we add the heterogeneity model to the traditional radiomic model, the performance of the combinatorial model is greatly improved, and the AUC value reaches 0.882, 0.743. It may be that ITH models use multi-region image features to characterize intratumoral spatial heterogeneity, which could further refine the shortcomings of traditional radiomic models, adding abundant information on tumor heterogeneity within the ROI.
Tumor cells are often highly aggressive, disrupting the normal structure of surrounding parenchymal tissue, leading to carcinogenic infiltration of small blood vessels and lymphatic vessels around the lesion, which is often overlooked in studies that focus on intratumoral areas [24, 40]. Therefore, detecting the boundary of lung cancer may help to quantify the invasiveness of the tumor. The study of NSCLC prediction by peritumoral imaging features has become an active and important field. However, the use of peritumoral imaging features in predicting the invasiveness of NSCLC has only been studied in a few studies. Previous studies defined the peritumoral area as 1.5 to 20 mm [24, 41–43]. One study quantified the distance to micrometastases in histopathology lung cancer, resulting in mean distances of 2.94 mm and 2.69 mm for adenocarcinoma and squamous cell carcinoma of the lung to surrounding micrometastases, respectively [44]. Based on this, we defined the peritumoral extent on a 3 mm gradient to explore the relationship between vascular invasion status in NSCLC and peritumoral imaging. The results showed that compared with PTR_0–3 and PTR_-3–3, the AUC of PTR_0–6 has the highest diagnostic value (0.696,0.798 and 0.824). The reason may be that the farther away from the tumor, the higher the reproducibility of imaging features. This finding may be related to the presence of homogeneous lung parenchyma in the distal peritumour [42]. Therefore, PTR_0–6 model performs better than other models in our study. In addition, the performance of the peritumoral model is generally better than that of the intratumoral model in this study, whether it is the traditional radiomic model or the heterogeneity model. Consistent with the findings of Dou TH, et al. [45], their model for predicting distant metastasis in NSCLC patients has a higher prognostic value for tumor marginal radiological features than for tumor radiological features alone; the comparison between the two was statistically significant (p = 0.048) .This may be due to the presence of increased cancer invasion and metastatic activity around the tumour, such as epithelial-mesenchymal transition [46], Tumor-associated macrophages [47, 48], tumor budding [49] and lymphatic vascular invasion [50, 51].
GLM model analysis found that vessel convergence and N stage were independent predictors of LVI, and gender was a marginal correlation factor of LVI. This study established a clinical-imaging model based on this. Of these, N stage was most strongly associated with LVI status in NSCLC patients, as LVI is an initial manifestation of nodal metastasis [52]. The vessel convergence sign is the manifestation of vascular structure being pulled by the focus to concentrate in the direction of the focus or truncated through the focus or at the edge of the focus. This is mainly due to the fact that the Vascular endothelial growth factor secreted by malignant nodules through cancer cells promotes the growth of microvessels in tumor tissues and the destruction of some blood vessel walls, which is consistent with the results of this study. A study [53] has found that the density of memory B cells, which play an important role in human anti-tumor immunity, is higher in lung adenocarcinoma tissues of female patients. Additional studies have suggested that lung cancer in women may have a different genetic profile from that in men because of different natural histories as well as female characteristics (younger age at diagnosis, non-smokers, more likely to have adenocarcinoma than men) [54, 55]. This may explain some of the gender differences observed in this study regarding tumor invasion of vessels.
However, this study has some limitations. Firstly, in this study, a semi-automatic method was used to segment the ROI, which may lead to artificial differences. An accurate automatic segmentation method should be considered in future research. Secondly, we only performed radiomics feature extraction on CT plain scan images. CT enhancement or PET images may contain additional valuable information. Thirdly, due to the small sample size in the external validation set, the predictive efficiency of the external validation cohort may be limited. More large-scale studies are needed in the future to further validate.
In summary, we have constructed a total of 6 models, and within most risk threshold ranges, using these 6 models can achieve higher net benefits than the “all treatment” and “no treatment at all” strategies. Among them, the combination model that integrates clinical data, traditional radiological features, peritumoral radiomic features, and peritumoral heterogeneity features has the highest diagnostic accuracy for the LVI status in early stage NSCLC patients. And DCA analysis shows that within most risk thresholds, combining multiple models improves the clinical value of the models. The comprehensive nomogram model not only enables non-invasive preoperative risk assessment of lung lesions, but also helps to provide objective guidance for rational clinical decision-making.
Supplementary Information
Supplementary Information
Below is the link to the electronic supplementary material.
Below is the link to the electronic supplementary material.
출처: PubMed Central (JATS). 라이선스는 원 publisher 정책을 따릅니다 — 인용 시 원문을 표기해 주세요.
🏷️ 같은 키워드 · 무료전문 — 이 논문 MeSH/keyword 기반
- Multifocal hepatocellular carcinoma in a noncirrhotic liver: A diagnostic pitfall mimicking cholangiocarcinoma and metastatic disease.
- A Rare Case of Anti-Yo Antibody Positive Paraneoplastic Neurologic Syndromes With EGFR Mutation Positive Non-Small Cell Lung Cancer.
- Hepatitis C Management in a Lung Cancer Patient on Checkpoint Inhibition: A Case Report.
- Family history enrichment in Non-Small cell Lung Cancer: A cross-sectional - prospective study to inform referral for germline testing.
- Clear Cell Renal Cell Carcinoma with Synchronous Bladder Metastasis: Diagnostic, Surgical, and Pathological Insights from a Rare Presentation.
- Refined risk stratification in residual triple-negative breast cancer after neoadjuvant therapy using residual cancer burden class and lymphovascular invasion.