Prognostic stratification in non-small cell lung cancer using a TIDE-informed transcriptomic signature: model development and validation.
2/5 보강
OpenAlex 토픽 ·
Lung Cancer Treatments and Mutations
Lung Cancer Diagnosis and Treatment
Ferroptosis and cancer prognosis
[BACKGROUND] Non-small cell lung cancer (NSCLC) remains a major cause of cancer mortality.
- 표본수 (n) 1,153
- p-value P<0.001
APA
Jiaxuan Zhou, Na Li, et al. (2026). Prognostic stratification in non-small cell lung cancer using a TIDE-informed transcriptomic signature: model development and validation.. Translational cancer research, 15(3), 158. https://doi.org/10.21037/tcr-2025-aw-2300
MLA
Jiaxuan Zhou, et al.. "Prognostic stratification in non-small cell lung cancer using a TIDE-informed transcriptomic signature: model development and validation.." Translational cancer research, vol. 15, no. 3, 2026, pp. 158.
PMID
41969460 ↗
Abstract 한글 요약
[BACKGROUND] Non-small cell lung cancer (NSCLC) remains a major cause of cancer mortality. The Tumor Immune Dysfunction and Exclusion (TIDE) score is widely used to estimate immune-checkpoint blockade response, but its broader prognostic relevance in unselected NSCLC populations is unclear. This study aimed to determine whether TIDE-informed strata carry prognostic information beyond immunotherapy settings, and to develop and externally validate an immune gene expression-based prognostic signature derived from differentially expressed genes (DEGs) between these strata.
[METHODS] Gene expression data and clinical information for NSCLC patients (n=1,153) were obtained from The Cancer Genome Atlas (TCGA). TIDE scores were calculated to stratify patients, and least absolute shrinkage and selection operator (LASSO) and Cox regressions were used to identify prognosis-related immune DEGs. A prognostic model was developed and validated using an external dataset (GSE50081, n=127). Immune cell infiltration was assessed using CIBERSORT, while drug sensitivity predictions were made based on the Genomics of Drug Sensitivity in Cancer (GDSC) database. Pathway enrichment analyses, including gene set variation analysis (GSVA) and gene set enrichment analysis (GSEA), were conducted to explore key molecular mechanisms.
[RESULTS] The prognostic model, based on 24 immune-related DEGs, effectively stratified NSCLC patients into high- and low-risk groups, with significant differences in survival outcomes (P<0.001). Key signaling pathways, including interleukin (IL)-17, p53, and tumor necrosis factor (TNF), were found to be associated with immune-related genes such as , , , , , , and . Exploratory drug-response modeling with pRRophetic suggested lower estimated half-maximal inhibitory concentration (IC50) values for agents including MS-275 (entinostat), PF-4708671, and roscovitine in the high-risk group. External validation confirmed the model's reproducible prognostic performance.
[CONCLUSIONS] The TIDE algorithm carries prognostic information in NSCLC beyond immunotherapy settings. The proposed TIDE-informed gene signature reproduced prognostic stratification across cohorts, suggesting potential applicability to a broader NSCLC population and supporting future personalized risk stratification.
[METHODS] Gene expression data and clinical information for NSCLC patients (n=1,153) were obtained from The Cancer Genome Atlas (TCGA). TIDE scores were calculated to stratify patients, and least absolute shrinkage and selection operator (LASSO) and Cox regressions were used to identify prognosis-related immune DEGs. A prognostic model was developed and validated using an external dataset (GSE50081, n=127). Immune cell infiltration was assessed using CIBERSORT, while drug sensitivity predictions were made based on the Genomics of Drug Sensitivity in Cancer (GDSC) database. Pathway enrichment analyses, including gene set variation analysis (GSVA) and gene set enrichment analysis (GSEA), were conducted to explore key molecular mechanisms.
[RESULTS] The prognostic model, based on 24 immune-related DEGs, effectively stratified NSCLC patients into high- and low-risk groups, with significant differences in survival outcomes (P<0.001). Key signaling pathways, including interleukin (IL)-17, p53, and tumor necrosis factor (TNF), were found to be associated with immune-related genes such as , , , , , , and . Exploratory drug-response modeling with pRRophetic suggested lower estimated half-maximal inhibitory concentration (IC50) values for agents including MS-275 (entinostat), PF-4708671, and roscovitine in the high-risk group. External validation confirmed the model's reproducible prognostic performance.
[CONCLUSIONS] The TIDE algorithm carries prognostic information in NSCLC beyond immunotherapy settings. The proposed TIDE-informed gene signature reproduced prognostic stratification across cohorts, suggesting potential applicability to a broader NSCLC population and supporting future personalized risk stratification.
🏷️ 키워드 / MeSH 📖 같은 키워드 OA만
같은 제1저자의 인용 많은 논문 (5)
- Prognostic comparison of transoral laser microsurgery for early glottic cancer with or without anterior commissure involvement: A meta-analysis.
- Oriental nose elongation using an L-shaped polyethylene sheet implant for combined septal spreading and extension.
- Impact of visceral obesity on postoperative complications and oncological outcomes in elderly patients with colorectal cancer.
- Replication stress-inducing ELF3 upregulation promotes BRCA1-deficient breast tumorigenesis in luminal progenitors.
- Comparison of drug regimens for recurrent or metastatic cervical cancer: a systematic review and network meta-analysis.
📖 전문 본문 읽기 PMC JATS · ~39 KB · 영문
Introduction
Introduction
Non-small cell lung cancer (NSCLC) accounts for the majority of lung cancer cases and is a leading cause of cancer-related mortality worldwide (1,2). Although advancements in treatment have improved outcomes for certain patients, the overall prognosis remains poor, particularly in advanced stages of the disease (2). Immunotherapy has revolutionized the treatment landscape of NSCLC (3,4). However, response rates to immunotherapy vary widely, emphasizing the need for more reliable prognostic models to aid in patient stratification and treatment selection (5).
Previous researches have increasingly focused on the tumor immune microenvironment and its role in determining patient outcomes, especially in relation to immunotherapy (6,7). In addition, bioinformatics-driven workflows have emerged as powerful tools for identifying prognostic biomarkers and dissecting molecular mechanisms across cancer types. For instance, recent studies have employed multi-omics approaches to identify epigenetic biomarkers in NSCLC (8) and systematic family gene analyses to uncover potential biomarkers in breast cancer (9). These studies highlight the utility of integrative bioinformatics pipelines in advancing precision oncology. The Tumor Immune Dysfunction and Exclusion (TIDE) algorithm has been introduced as a tool for predicting immunotherapy responses (10). However, its use has been largely limited to predicting therapeutic outcomes in patients undergoing immunotherapy. The broader prognostic potential of TIDE, particularly in patients who have not received immunotherapy, remains underexplored.
While TIDE has been primarily used as an immunotherapy response predictor, its potential as a general prognostic biomarker in NSCLC populations remains underexplored. This study aims to repurpose TIDE-derived transcriptional signals to construct a novel prognostic signature. In this study, we (I) evaluated the prognostic association of TIDE-predicted strata in NSCLC, and (II) derived a TIDE-informed immune gene risk score from differentially expressed genes (DEGs) associated with these strata. We further characterized immune cell composition, pathway signatures, and exploratory drug-response estimates to provide biological context for the observed associations. Our intent was to assess whether TIDE-related transcriptional signals provide incremental context for prognosis in unselected NSCLC cohorts. We present this article in accordance with the TRIPOD reporting checklist (available at https://tcr.amegroups.com/article/view/10.21037/tcr-2025-aw-2300/rc).
Non-small cell lung cancer (NSCLC) accounts for the majority of lung cancer cases and is a leading cause of cancer-related mortality worldwide (1,2). Although advancements in treatment have improved outcomes for certain patients, the overall prognosis remains poor, particularly in advanced stages of the disease (2). Immunotherapy has revolutionized the treatment landscape of NSCLC (3,4). However, response rates to immunotherapy vary widely, emphasizing the need for more reliable prognostic models to aid in patient stratification and treatment selection (5).
Previous researches have increasingly focused on the tumor immune microenvironment and its role in determining patient outcomes, especially in relation to immunotherapy (6,7). In addition, bioinformatics-driven workflows have emerged as powerful tools for identifying prognostic biomarkers and dissecting molecular mechanisms across cancer types. For instance, recent studies have employed multi-omics approaches to identify epigenetic biomarkers in NSCLC (8) and systematic family gene analyses to uncover potential biomarkers in breast cancer (9). These studies highlight the utility of integrative bioinformatics pipelines in advancing precision oncology. The Tumor Immune Dysfunction and Exclusion (TIDE) algorithm has been introduced as a tool for predicting immunotherapy responses (10). However, its use has been largely limited to predicting therapeutic outcomes in patients undergoing immunotherapy. The broader prognostic potential of TIDE, particularly in patients who have not received immunotherapy, remains underexplored.
While TIDE has been primarily used as an immunotherapy response predictor, its potential as a general prognostic biomarker in NSCLC populations remains underexplored. This study aims to repurpose TIDE-derived transcriptional signals to construct a novel prognostic signature. In this study, we (I) evaluated the prognostic association of TIDE-predicted strata in NSCLC, and (II) derived a TIDE-informed immune gene risk score from differentially expressed genes (DEGs) associated with these strata. We further characterized immune cell composition, pathway signatures, and exploratory drug-response estimates to provide biological context for the observed associations. Our intent was to assess whether TIDE-related transcriptional signals provide incremental context for prognosis in unselected NSCLC cohorts. We present this article in accordance with the TRIPOD reporting checklist (available at https://tcr.amegroups.com/article/view/10.21037/tcr-2025-aw-2300/rc).
Methods
Methods
Data acquisition
We obtained 1,153 NSCLC tumor samples from The Cancer Genome Atlas (TCGA) database (https://portal.gdc.cancer.gov/). We used the Gene Expression Omnibus (GEO) public database to download the GSE50081 data file from the Series Matrix File. We also used the GPL570 annotation platform. We extracted 127 patient data from GSE50081 for external validation, all of which had complete expression profiles and survival information. This study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments.
TIDE immunotherapy response prediction
We performed immunotherapy response prediction on NSCLC patients in the TCGA database. To quantify immune response based on a patient’s expression profile, we used the TIDE assay. This quantification algorithm is based on multiple published immune markers and can predict a patient’s immune response. All data were downloaded from the TIDE online database (http://tide.dfci.harvard.edu/).
Differential expression analysis
To analyze differences in the molecular mechanisms of NSCLC data and identify DEGs between the non-response and response immunotherapy groups, we used the R package “Limma”. DEGs were identified with |log2 fold change (FC)| >0.585 (corresponding to a 1.5-fold change) and an adjusted P value <0.05 after Benjamini-Hochberg false discovery rate (FDR) correction. We drew a differential gene volcano map and a heat map.
Analysis of Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) functions
We performed functional annotation of differential genes using the R package “ClusterProfiler” to evaluate the functional correlation of these differentially associated genes. We used the GO and KEGG databases to study the relevant functional categories. GO- and KEGG-enriched pathways with q-value and P values less than 0.05 were considered significant.
Model design and prognosis
To avoid circularity, our analysis followed a strict two-stage design. First, DEGs were identified between TIDE-stratified groups using the entire TCGA cohort (gene discovery stage). Second, using only these pre-identified DEGs, we constructed a prognostic model based on survival data from a randomly split training subset of TCGA. The model was then validated in the held-out TCGA testing subset and the independent GSE50081 cohort. This design ensured that the model’s prognostic performance is evaluated on data not used for gene selection. We first selected the NSCLC differential genes and then constructed the prognostic model using the least absolute shrinkage and selection operator (LASSO) regression.
After considering the expression value of each distinct gene, we created a risk score formula for each patient. We weighted this score with its estimated regression coefficient in the LASSO regression analysis. According to the risk score, we placed patients into low- and high-risk groups and used the median risk score value as the cutoff value. To calculate and compare the survival differences between groups, we used Kaplan-Meier and log-rank statistical methods. We conducted LASSO regression analysis and stratified analysis to analyze the effectiveness of using the risk score to predict patient prognosis. We used the receiver operating characteristic (ROC) curve to verify the effectiveness of the model’s prediction.
Analysis of immune cell infiltration
The CIBERSORT method is based on support vector regression and is used to identify different immune cell subtypes in a microenvironment. According to this popular approach, the expression matrix of immune cell subtypes is subjected to deconvolution analysis. Its 547 biomarkers can distinguish among the 22 human immune cell phenotypes, including B cells, T cells, plasma cells, and myeloid cell subsets. In this study, we analyzed patient data using the CIBERSORT program and estimated the relative fraction of 22 types of immune infiltrating cells by applying a Spearman correlation analysis of the immune cell content and gene expression.
Drug sensitivity analysis
We accessed the pharmacogenomic database [Genomics of Drug Sensitivity in Cancer (GDSC) Cancer Drug Sensitivity Genomics Database, https://www.cancerrxgene.org/] and used the R package “pRRophetic” to predict each tumor sample’s chemosensitivity. Using this method, we obtained half-maximal inhibitory concentration (IC50) estimates for each chemotherapeutic drug treatment. We performed 10-fold cross-validation to test prediction and regression accuracy using the GDSC training set. To eliminate batch effects and averaging of replicate gene expression, we set all options to their default levels, including “combat”.
Gene set variation analysis (GSVA)
The unsupervised and nonparametric GSVA approach can be applied to evaluate the enrichment of transcriptome gene sets. Using this approach, the gene set of interest can be thoroughly scored and then can be used to translate gene-level alterations into pathway-level modifications to assess the biological function of the samples. We used the Molecular Signatures database (version 7.0) to retrieve the gene set for this study and used the GSVA algorithm to score each gene set and assess potential alterations in biological function among the samples.
Gene set enrichment analysis (GSEA)
According to the model’s risk categories, we placed the patients into high- and low-risk groups. We examined the differences in the two groups’ signaling pathways using GSEA. We downloaded the background gene set of the subtype pathway from the MsigDB database as an annotated gene set. After performing a differential expression analysis of the pathways between subtypes, the gene set that is significantly enriched is sorted based on the consistency score (adjusted P<0.05). The strong relationship between tumor classification and biological significance is frequently investigated using GSEA.
Nomogram construction
To describe the interaction among variables in a prediction model, we built a nomogram according to regression analysis. Depending on the level of gene expression and clinical symptoms, a nomogram uses a line segment in accordance with a specific ratio to draw on the same plane to illustrate this relationship. Using a multifactor regression model, we assigned each value level a score according to the degree to which it influenced the outcome variable (i.e., the size of the regression coefficient). We then determined the predicted value according to the total score.
MicroRNA (miRNA) network construction
Small non-coding RNAs known as miRNAs have been shown to control gene expression through boosting the degradation of mRNAs or preventing their translation. In order to determine whether specific miRNAs in key genes regulate the transcription or degradation of specific hazardous genes, we furthered our analysis. We used the TargetScan database to identify the miRNAs associated with the most important genes and used Cytoscape software to display the miRNA gene network.
Statistical analysis
To create the survival curves, we used the Kaplan-Meier method. We used log-rank to compare results and used Cox proportional hazards models for multivariate analyses. According to R (version 4.2.2), we conducted statistical analyses and set a statistically significant value of 0.05.
Data acquisition
We obtained 1,153 NSCLC tumor samples from The Cancer Genome Atlas (TCGA) database (https://portal.gdc.cancer.gov/). We used the Gene Expression Omnibus (GEO) public database to download the GSE50081 data file from the Series Matrix File. We also used the GPL570 annotation platform. We extracted 127 patient data from GSE50081 for external validation, all of which had complete expression profiles and survival information. This study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments.
TIDE immunotherapy response prediction
We performed immunotherapy response prediction on NSCLC patients in the TCGA database. To quantify immune response based on a patient’s expression profile, we used the TIDE assay. This quantification algorithm is based on multiple published immune markers and can predict a patient’s immune response. All data were downloaded from the TIDE online database (http://tide.dfci.harvard.edu/).
Differential expression analysis
To analyze differences in the molecular mechanisms of NSCLC data and identify DEGs between the non-response and response immunotherapy groups, we used the R package “Limma”. DEGs were identified with |log2 fold change (FC)| >0.585 (corresponding to a 1.5-fold change) and an adjusted P value <0.05 after Benjamini-Hochberg false discovery rate (FDR) correction. We drew a differential gene volcano map and a heat map.
Analysis of Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) functions
We performed functional annotation of differential genes using the R package “ClusterProfiler” to evaluate the functional correlation of these differentially associated genes. We used the GO and KEGG databases to study the relevant functional categories. GO- and KEGG-enriched pathways with q-value and P values less than 0.05 were considered significant.
Model design and prognosis
To avoid circularity, our analysis followed a strict two-stage design. First, DEGs were identified between TIDE-stratified groups using the entire TCGA cohort (gene discovery stage). Second, using only these pre-identified DEGs, we constructed a prognostic model based on survival data from a randomly split training subset of TCGA. The model was then validated in the held-out TCGA testing subset and the independent GSE50081 cohort. This design ensured that the model’s prognostic performance is evaluated on data not used for gene selection. We first selected the NSCLC differential genes and then constructed the prognostic model using the least absolute shrinkage and selection operator (LASSO) regression.
After considering the expression value of each distinct gene, we created a risk score formula for each patient. We weighted this score with its estimated regression coefficient in the LASSO regression analysis. According to the risk score, we placed patients into low- and high-risk groups and used the median risk score value as the cutoff value. To calculate and compare the survival differences between groups, we used Kaplan-Meier and log-rank statistical methods. We conducted LASSO regression analysis and stratified analysis to analyze the effectiveness of using the risk score to predict patient prognosis. We used the receiver operating characteristic (ROC) curve to verify the effectiveness of the model’s prediction.
Analysis of immune cell infiltration
The CIBERSORT method is based on support vector regression and is used to identify different immune cell subtypes in a microenvironment. According to this popular approach, the expression matrix of immune cell subtypes is subjected to deconvolution analysis. Its 547 biomarkers can distinguish among the 22 human immune cell phenotypes, including B cells, T cells, plasma cells, and myeloid cell subsets. In this study, we analyzed patient data using the CIBERSORT program and estimated the relative fraction of 22 types of immune infiltrating cells by applying a Spearman correlation analysis of the immune cell content and gene expression.
Drug sensitivity analysis
We accessed the pharmacogenomic database [Genomics of Drug Sensitivity in Cancer (GDSC) Cancer Drug Sensitivity Genomics Database, https://www.cancerrxgene.org/] and used the R package “pRRophetic” to predict each tumor sample’s chemosensitivity. Using this method, we obtained half-maximal inhibitory concentration (IC50) estimates for each chemotherapeutic drug treatment. We performed 10-fold cross-validation to test prediction and regression accuracy using the GDSC training set. To eliminate batch effects and averaging of replicate gene expression, we set all options to their default levels, including “combat”.
Gene set variation analysis (GSVA)
The unsupervised and nonparametric GSVA approach can be applied to evaluate the enrichment of transcriptome gene sets. Using this approach, the gene set of interest can be thoroughly scored and then can be used to translate gene-level alterations into pathway-level modifications to assess the biological function of the samples. We used the Molecular Signatures database (version 7.0) to retrieve the gene set for this study and used the GSVA algorithm to score each gene set and assess potential alterations in biological function among the samples.
Gene set enrichment analysis (GSEA)
According to the model’s risk categories, we placed the patients into high- and low-risk groups. We examined the differences in the two groups’ signaling pathways using GSEA. We downloaded the background gene set of the subtype pathway from the MsigDB database as an annotated gene set. After performing a differential expression analysis of the pathways between subtypes, the gene set that is significantly enriched is sorted based on the consistency score (adjusted P<0.05). The strong relationship between tumor classification and biological significance is frequently investigated using GSEA.
Nomogram construction
To describe the interaction among variables in a prediction model, we built a nomogram according to regression analysis. Depending on the level of gene expression and clinical symptoms, a nomogram uses a line segment in accordance with a specific ratio to draw on the same plane to illustrate this relationship. Using a multifactor regression model, we assigned each value level a score according to the degree to which it influenced the outcome variable (i.e., the size of the regression coefficient). We then determined the predicted value according to the total score.
MicroRNA (miRNA) network construction
Small non-coding RNAs known as miRNAs have been shown to control gene expression through boosting the degradation of mRNAs or preventing their translation. In order to determine whether specific miRNAs in key genes regulate the transcription or degradation of specific hazardous genes, we furthered our analysis. We used the TargetScan database to identify the miRNAs associated with the most important genes and used Cytoscape software to display the miRNA gene network.
Statistical analysis
To create the survival curves, we used the Kaplan-Meier method. We used log-rank to compare results and used Cox proportional hazards models for multivariate analyses. According to R (version 4.2.2), we conducted statistical analyses and set a statistically significant value of 0.05.
Results
Results
Exploring the differential genes in the NSCLC cohort
We stratified the NSCLC cohort in the TCGA database into response group (low risk, 398 cases) and non-response group (high risk, 755 cases). In total, we discovered 358 differential genes, which included 217 down-regulated and 141 up-regulated genes (Figure 1).
Functional enrichment of differential genes
On 358 differential genes, we carried out a pathway analysis. According to the GO analysis, we found that the genes were enriched in pathways primarily related to serine-type endopeptidase inhibitor activity, collagen metabolic process, and peptidase regulator activity (Figure 2A). The KEGG showed that these genes were enriched in the AGE-RAGE and PI3K-Akt pathways (Figure 2B).
Prognostic model using prognosis-related genes
Cox univariate regression identified 54 genes altogether that were linked to prognosis (P value 0.05). To identify the key genes in the prognostic gene set, we collected clinical data for patients with NSCLC and selected 24 distinctive genes using the LASSO algorithm (Figure 3A-3C). For further analysis, we generated each sample’s optimum risk score value and then randomly placed the patients into a validation and a training set at a ratio of 1:4 [risk score = CYP4B1 × (−0.055853115) + SCGB3A1 × (−0.045877387) + HLA_DMA × (−0.019213216) +PLK1 × 0.006017918 + TGFBI × 0.006198455 + JAG1 × 0.00889568 +MMP3 × 0.014191727 + SERPINE1 × 0.014816346 + ANLN × 0.018560479 + AHNAK2 × 0.019041753 + GJB3 × 0.023170764 + LOXL2 × 0.024198478 + SPP1 × 0.026760772 + PLAU × 0.029470445 + FGG × 0.035051844 +CILP2 × 0.036715756 +LY6D × 0.041299382 + LGALS1 × 0.041888419 + MMP12 × 0.043954687 + CPXM1 × 0.076709178 + DDIT4 × 0.078239392 + TESC × 0.105496298 + GPRC5A × 0.116656173 + SLC7A5 × 0.148245012]. Based on these risk scores, we placed patients into high- and low-risk groups and used the Kaplan-Meier curve for further analysis. The results showed that the overall survival (OS) of the high-risk group in both the training and test sets was significantly lower than that of the low-risk group (Figure 3D,3E). According to the ROC of the training and test sets, this model had good verification performance (Figure 4A,4B).
Clinical predictive value according to multiomics research
Figure 5A showed the content of immune cells in the high- and low-risk groups. Compared with patients in the high-risk group, the levels of NK cells resting, activated CD4 memory T cells, and M0 macrophages in samples from the low-risk group were significantly lower (Figure 5B). In the training dataset, the high-risk group had higher IC50 values for roscovitine, salubrinal, MS.275, and PF.4708671 compared with the low-risk group. The IC50 results for the aforementioned drugs were also validated in the testing dataset (Figure S1). These predictions are based on cell-line data and are not standard-of-care agents for NSCLC; they are presented here as exploratory findings to generate hypotheses for future experimental validation.
We also analyzed immune regulatory genes. It presented the expression differences between the high- and low-risk groups for immunosuppressants, immune stimulators (Figure 6A,6B). Meanwhile, the expression differences for chemokines and its receptors were presented in Figure S2A,S2B.
Specific signaling mechanism related to the prognosis model
According to the GSVA, the differential pathways of the two groups were enriched primarily in signaling pathways, including tumor necrosis factor (TNF)α signaling via nuclear factor kappa B (NF-κB), epithelial-mesenchymal transition (EMT), and hypoxia (Figure 7A). According to the GSEA, interleukin (IL)-17, p53, and the TNF signaling pathway were involved (Figure 7B). Figure 7C showed the molecular interaction network among the pathways. Figure 7D showed the correlation between 24 immune-related DEGs and signaling pathways.
Prognostic model validation with external datasets
In the GEO external validation set, OS of the low-risk group was significantly higher than that of the high-risk group (Figure 8A). To verify the model’s accuracy, we analyzed the ROC curve of the model using the external dataset. The results showed that the model had a significant predictive effect on patient prognosis (Figure 8B). Furthermore, to evaluate the added prognostic value of our signature, we performed a head-to-head comparison with the original TIDE score in the TCGA cohort. Time-dependent ROC analysis demonstrated that our 24-gene risk score consistently achieved higher area under the curve (AUC) values than the TIDE score in predicting 1-, 2-, and 3-year OS (Figure S3), indicating its superior discriminative power.
Risk score and independent prognosis analysis
According to the logistic regression analysis, the risk score values significantly contributed to nomogram prediction model (Figure 9A). We also conducted 3- and 5-year prognosis analyses for patients with NSCLC (Figure 9B) and the predicted results were consistent. Using univariate and multivariate analyses, we also found that risk score was an independent prognostic factor for patients with NSCLC (Figure 10A,10B). Furthermore, correlation analyses revealed statistically significant, though modest, associations between the risk score and the status of key driver mutations (EGFR and KRAS) (Figure S4).
Correlation analysis of clinical indicators and risk score
The box plots in Figure 11 showed the results for each group of clinical indicators. According to the rank-sum test, the distribution of risk score values was significant (P<0.05) for gender, state, fustat, and T. These results indicated that the risk score from the modeling analysis could be used to classify NSCLC. We used the miRcode database and performed a reverse prediction on 24 genes. In addition, we used Cytoscape to visualize the 84 miRNAs and 654 mRNA-miRNA relationship pairs (Figure 12). This hypothesis-generating analysis identifies candidate miRNAs that may fine-tune the expression of key risk genes, offering insights into additional regulatory layers that could influence the prognostic phenotype.
Exploring the differential genes in the NSCLC cohort
We stratified the NSCLC cohort in the TCGA database into response group (low risk, 398 cases) and non-response group (high risk, 755 cases). In total, we discovered 358 differential genes, which included 217 down-regulated and 141 up-regulated genes (Figure 1).
Functional enrichment of differential genes
On 358 differential genes, we carried out a pathway analysis. According to the GO analysis, we found that the genes were enriched in pathways primarily related to serine-type endopeptidase inhibitor activity, collagen metabolic process, and peptidase regulator activity (Figure 2A). The KEGG showed that these genes were enriched in the AGE-RAGE and PI3K-Akt pathways (Figure 2B).
Prognostic model using prognosis-related genes
Cox univariate regression identified 54 genes altogether that were linked to prognosis (P value 0.05). To identify the key genes in the prognostic gene set, we collected clinical data for patients with NSCLC and selected 24 distinctive genes using the LASSO algorithm (Figure 3A-3C). For further analysis, we generated each sample’s optimum risk score value and then randomly placed the patients into a validation and a training set at a ratio of 1:4 [risk score = CYP4B1 × (−0.055853115) + SCGB3A1 × (−0.045877387) + HLA_DMA × (−0.019213216) +PLK1 × 0.006017918 + TGFBI × 0.006198455 + JAG1 × 0.00889568 +MMP3 × 0.014191727 + SERPINE1 × 0.014816346 + ANLN × 0.018560479 + AHNAK2 × 0.019041753 + GJB3 × 0.023170764 + LOXL2 × 0.024198478 + SPP1 × 0.026760772 + PLAU × 0.029470445 + FGG × 0.035051844 +CILP2 × 0.036715756 +LY6D × 0.041299382 + LGALS1 × 0.041888419 + MMP12 × 0.043954687 + CPXM1 × 0.076709178 + DDIT4 × 0.078239392 + TESC × 0.105496298 + GPRC5A × 0.116656173 + SLC7A5 × 0.148245012]. Based on these risk scores, we placed patients into high- and low-risk groups and used the Kaplan-Meier curve for further analysis. The results showed that the overall survival (OS) of the high-risk group in both the training and test sets was significantly lower than that of the low-risk group (Figure 3D,3E). According to the ROC of the training and test sets, this model had good verification performance (Figure 4A,4B).
Clinical predictive value according to multiomics research
Figure 5A showed the content of immune cells in the high- and low-risk groups. Compared with patients in the high-risk group, the levels of NK cells resting, activated CD4 memory T cells, and M0 macrophages in samples from the low-risk group were significantly lower (Figure 5B). In the training dataset, the high-risk group had higher IC50 values for roscovitine, salubrinal, MS.275, and PF.4708671 compared with the low-risk group. The IC50 results for the aforementioned drugs were also validated in the testing dataset (Figure S1). These predictions are based on cell-line data and are not standard-of-care agents for NSCLC; they are presented here as exploratory findings to generate hypotheses for future experimental validation.
We also analyzed immune regulatory genes. It presented the expression differences between the high- and low-risk groups for immunosuppressants, immune stimulators (Figure 6A,6B). Meanwhile, the expression differences for chemokines and its receptors were presented in Figure S2A,S2B.
Specific signaling mechanism related to the prognosis model
According to the GSVA, the differential pathways of the two groups were enriched primarily in signaling pathways, including tumor necrosis factor (TNF)α signaling via nuclear factor kappa B (NF-κB), epithelial-mesenchymal transition (EMT), and hypoxia (Figure 7A). According to the GSEA, interleukin (IL)-17, p53, and the TNF signaling pathway were involved (Figure 7B). Figure 7C showed the molecular interaction network among the pathways. Figure 7D showed the correlation between 24 immune-related DEGs and signaling pathways.
Prognostic model validation with external datasets
In the GEO external validation set, OS of the low-risk group was significantly higher than that of the high-risk group (Figure 8A). To verify the model’s accuracy, we analyzed the ROC curve of the model using the external dataset. The results showed that the model had a significant predictive effect on patient prognosis (Figure 8B). Furthermore, to evaluate the added prognostic value of our signature, we performed a head-to-head comparison with the original TIDE score in the TCGA cohort. Time-dependent ROC analysis demonstrated that our 24-gene risk score consistently achieved higher area under the curve (AUC) values than the TIDE score in predicting 1-, 2-, and 3-year OS (Figure S3), indicating its superior discriminative power.
Risk score and independent prognosis analysis
According to the logistic regression analysis, the risk score values significantly contributed to nomogram prediction model (Figure 9A). We also conducted 3- and 5-year prognosis analyses for patients with NSCLC (Figure 9B) and the predicted results were consistent. Using univariate and multivariate analyses, we also found that risk score was an independent prognostic factor for patients with NSCLC (Figure 10A,10B). Furthermore, correlation analyses revealed statistically significant, though modest, associations between the risk score and the status of key driver mutations (EGFR and KRAS) (Figure S4).
Correlation analysis of clinical indicators and risk score
The box plots in Figure 11 showed the results for each group of clinical indicators. According to the rank-sum test, the distribution of risk score values was significant (P<0.05) for gender, state, fustat, and T. These results indicated that the risk score from the modeling analysis could be used to classify NSCLC. We used the miRcode database and performed a reverse prediction on 24 genes. In addition, we used Cytoscape to visualize the 84 miRNAs and 654 mRNA-miRNA relationship pairs (Figure 12). This hypothesis-generating analysis identifies candidate miRNAs that may fine-tune the expression of key risk genes, offering insights into additional regulatory layers that could influence the prognostic phenotype.
Discussion
Discussion
Immune-related genes were key factors associated with modulation of the tumor microenvironment and treatment response, and the immune system played a central role in cancer development and progression (11,12). In this study, we developed and validated a TIDE-based prognostic model for NSCLC based on immune-related DEGs. Briefly, the model stratifies patients into distinct risk groups: the high-risk group shows elevated TIDE scores, poorer prognosis, and increased infiltration of immune-suppressive cells, while the low-risk group is associated with lower TIDE scores, better prognosis, and enhanced immune activity. Our model showed a good OS predictive performance in both internal and external validation, demonstrating its robustness in predicting patient outcomes.
Unlike previous studies that focus on TIDE as a predictor of immunotherapy response, we leveraged TIDE-stratified cohorts to derive a concise 24-gene prognostic signature that operates beyond the immunotherapy context. This approach translates TIDE’s immune-dysfunction logic into a generalizable prognostic tool, offering a novel integration of immune microenvironment insights with traditional survival prediction. To evaluate whether our model adds value beyond known immune biology, we directly compared it to the original TIDE score. In the same patient cohort, our 24-gene risk score showed higher accuracy (AUC) in predicting 1-, 2-, and 3-year survival than the TIDE score alone. This result indicates that our signature is not just a copy of the TIDE signal. Instead, focusing on a smaller set of genes related to immune dysfunction, it provides better survival prediction. While future comparisons with other immune scores are needed, this finding supports the unique prognostic value of our model.
In our prognostic model, 24 DEGs show prognostic values, including SLC7A5, PLAU, ANLN, MMP12, SCGB3A1, AHNAK2 and GJB3. SLC7A5, an amino acid transporter, supplies amino acids to cancer cells and maintains intracellular leucine, which is a master regulator of the mTORC1 signaling pathway (13). SLC7A5 has been identified as an indicator of poor prognosis and is overexpressed in a variety of cancers (14,15). In gastric cancer progression (14), PLAU has been found to promote cell proliferation and epithelial-mesenchymal transition, and in neck and head squamous cell carcinoma, it has also demonstrated its prognostic value (16,17). ANLN overexpression is associated with progression and metastasis in lung adenocarcinoma (18,19). In more than 10 types of cancer, MMP12 is up-regulated, but unlike our study, survival analyses have shown that its prognostic value is limited to clear-cell renal carcinoma (20). In lung adenocarcinoma, the independent prognostic biomarker SCGB3A1 has been associated with tumor immune cell infiltration as well as acquired EGFR-tyrosine kinase inhibitor (TKI) resistance (21). In addition, the AHNAK2 gene is significantly overexpressed in lung adenocarcinoma tissue and serves as an independent prognostic marker for patients with lung adenocarcinoma (22). Patients with lung adenocarcinoma who have high expression of GJB3 tend to have poorer prognosis (23). Collectively, the 24-gene signature reflects an integrated biological state encompassing immune suppression, enhanced invasiveness, and pro-tumor microenvironment remodeling. Their coordinated activity is associated with a high-risk phenotype characterized by immune evasion and aggressive tumor behavior, aligning with the pathway enrichments identified in GSVA/GSEA. This integrative perspective moves beyond individual gene annotations and provides a coherent mechanistic basis for the prognostic utility of the signature.
We also investigated the underlying molecular mechanisms involved in our prognostic model. Signaling pathways that play an important role in tumor initiation and progression (e.g., EMT, hypoxia, TNFα signaling via NF-κB) have demonstrated differences between the high- and low-risk groups. For example, EMT is closely linked to tumor initiation, progression, and metastasis (24), and hypoxia is relevant in treatment resistance and poor survival (25). One study showed that hypoxia inhibited the miR-27a expression and promoted lung cancer cell proliferation, migration, and invasion, as well as the EMT process (26). TNFα signaling via NF-κB plays a critical role in inflammation, immune regulation, and cell apoptosis.
Our study also identified several key signaling pathways, including IL-17, p53, and TNF, which are implicated in the immune response and tumor progression in NSCLC. These pathways were found to correlate with specific immune-related DEGs, such as SLC7A5, PLAU, ANLN, and MMP12, suggesting a potential mechanistic link between these genes and the tumor immune microenvironment. Inflammatory responses and antitumor immune responses are associated with the IL-17 signaling pathway, which has been shown to promote inflammation in the tumor microenvironment and thus affects the initiation and progression of lung cancer (27). Abnormalities in the p53 signaling pathway are commonly associated with tumor development and poor prognosis (28). One of the most common genetic alterations in lung cancer is the p53 gene mutation. Damage to this signaling pathway causes dysregulated cell-cycle control and promotes the proliferation and progression of lung cancer cells. By regulating inflammatory responses, promoting cell proliferation, and inducing angiogenesis, the TNF signaling pathway influences tumor initiation and development (29). Activation of the TNF signaling pathway was correlated with enhanced invasiveness and malignancy of lung cancer cells. The TNF signaling pathway also affects the efficacy of immunotherapy by regulating immune cells in the tumor microenvironment.
Tumor prognosis and treatment sensitivity are significantly affected by the tumor microenvironment. Therefore, we further explored the molecular mechanism of risk score in the progression of NSCLC by analyzing the relationship between risk score and tumor immune infiltration. We found that the prognostic model’s risk score was significantly correlated with tumor immune infiltration. The levels of resting NK cells, activated CD4 memory T cells, and M0 macrophages were significantly lower in the low-risk group than in the high-risk group. Thus, we determined that the immune microenvironment in high-risk patients may be more conducive to tumor progression and evasion of immune surveillance.
Although drug sensitivity analysis suggested potential differential responses to certain agents (roscovitine, salubrinal, MS.275, and PF.4708671) between risk groups, these findings are preliminary and require validation in preclinical or clinical settings. Future studies should prioritize experimental validation of these predictions and explore whether the identified gene signature aligns with known mechanisms of drug response or resistance. The miRNA-mRNA network served as an exploratory extension of our signature, highlighting potential upstream regulators that could modulate the expression of risk-associated genes. While not integral to the prognostic model, this analysis suggests regulatory avenues for future experimental validation.
There are several limitations in this study. First, this study’s reliance on TCGA data and relatively small external validation set may limit the generalizability of our findings, as the heterogeneity of GSE50081 may better reflect clinical reality, its sample size and platform differences (microarray vs. RNA-seq) pose challenges for transportability. Larger and more diverse cohorts would further strengthen the generalizability of our findings. Second, although our model predicts drug sensitivity, we lack in vivo and in vitro validation to confirm these predictions. Future research should focus on experimentally validating these findings and further exploring the biological mechanisms underlying the observed drug sensitivity patterns. Lastly, prospective clinical trials are needed to correlate our predicted treatment responses with actual clinical outcomes, thereby providing a stronger foundation for the clinical application of our model.
Immune-related genes were key factors associated with modulation of the tumor microenvironment and treatment response, and the immune system played a central role in cancer development and progression (11,12). In this study, we developed and validated a TIDE-based prognostic model for NSCLC based on immune-related DEGs. Briefly, the model stratifies patients into distinct risk groups: the high-risk group shows elevated TIDE scores, poorer prognosis, and increased infiltration of immune-suppressive cells, while the low-risk group is associated with lower TIDE scores, better prognosis, and enhanced immune activity. Our model showed a good OS predictive performance in both internal and external validation, demonstrating its robustness in predicting patient outcomes.
Unlike previous studies that focus on TIDE as a predictor of immunotherapy response, we leveraged TIDE-stratified cohorts to derive a concise 24-gene prognostic signature that operates beyond the immunotherapy context. This approach translates TIDE’s immune-dysfunction logic into a generalizable prognostic tool, offering a novel integration of immune microenvironment insights with traditional survival prediction. To evaluate whether our model adds value beyond known immune biology, we directly compared it to the original TIDE score. In the same patient cohort, our 24-gene risk score showed higher accuracy (AUC) in predicting 1-, 2-, and 3-year survival than the TIDE score alone. This result indicates that our signature is not just a copy of the TIDE signal. Instead, focusing on a smaller set of genes related to immune dysfunction, it provides better survival prediction. While future comparisons with other immune scores are needed, this finding supports the unique prognostic value of our model.
In our prognostic model, 24 DEGs show prognostic values, including SLC7A5, PLAU, ANLN, MMP12, SCGB3A1, AHNAK2 and GJB3. SLC7A5, an amino acid transporter, supplies amino acids to cancer cells and maintains intracellular leucine, which is a master regulator of the mTORC1 signaling pathway (13). SLC7A5 has been identified as an indicator of poor prognosis and is overexpressed in a variety of cancers (14,15). In gastric cancer progression (14), PLAU has been found to promote cell proliferation and epithelial-mesenchymal transition, and in neck and head squamous cell carcinoma, it has also demonstrated its prognostic value (16,17). ANLN overexpression is associated with progression and metastasis in lung adenocarcinoma (18,19). In more than 10 types of cancer, MMP12 is up-regulated, but unlike our study, survival analyses have shown that its prognostic value is limited to clear-cell renal carcinoma (20). In lung adenocarcinoma, the independent prognostic biomarker SCGB3A1 has been associated with tumor immune cell infiltration as well as acquired EGFR-tyrosine kinase inhibitor (TKI) resistance (21). In addition, the AHNAK2 gene is significantly overexpressed in lung adenocarcinoma tissue and serves as an independent prognostic marker for patients with lung adenocarcinoma (22). Patients with lung adenocarcinoma who have high expression of GJB3 tend to have poorer prognosis (23). Collectively, the 24-gene signature reflects an integrated biological state encompassing immune suppression, enhanced invasiveness, and pro-tumor microenvironment remodeling. Their coordinated activity is associated with a high-risk phenotype characterized by immune evasion and aggressive tumor behavior, aligning with the pathway enrichments identified in GSVA/GSEA. This integrative perspective moves beyond individual gene annotations and provides a coherent mechanistic basis for the prognostic utility of the signature.
We also investigated the underlying molecular mechanisms involved in our prognostic model. Signaling pathways that play an important role in tumor initiation and progression (e.g., EMT, hypoxia, TNFα signaling via NF-κB) have demonstrated differences between the high- and low-risk groups. For example, EMT is closely linked to tumor initiation, progression, and metastasis (24), and hypoxia is relevant in treatment resistance and poor survival (25). One study showed that hypoxia inhibited the miR-27a expression and promoted lung cancer cell proliferation, migration, and invasion, as well as the EMT process (26). TNFα signaling via NF-κB plays a critical role in inflammation, immune regulation, and cell apoptosis.
Our study also identified several key signaling pathways, including IL-17, p53, and TNF, which are implicated in the immune response and tumor progression in NSCLC. These pathways were found to correlate with specific immune-related DEGs, such as SLC7A5, PLAU, ANLN, and MMP12, suggesting a potential mechanistic link between these genes and the tumor immune microenvironment. Inflammatory responses and antitumor immune responses are associated with the IL-17 signaling pathway, which has been shown to promote inflammation in the tumor microenvironment and thus affects the initiation and progression of lung cancer (27). Abnormalities in the p53 signaling pathway are commonly associated with tumor development and poor prognosis (28). One of the most common genetic alterations in lung cancer is the p53 gene mutation. Damage to this signaling pathway causes dysregulated cell-cycle control and promotes the proliferation and progression of lung cancer cells. By regulating inflammatory responses, promoting cell proliferation, and inducing angiogenesis, the TNF signaling pathway influences tumor initiation and development (29). Activation of the TNF signaling pathway was correlated with enhanced invasiveness and malignancy of lung cancer cells. The TNF signaling pathway also affects the efficacy of immunotherapy by regulating immune cells in the tumor microenvironment.
Tumor prognosis and treatment sensitivity are significantly affected by the tumor microenvironment. Therefore, we further explored the molecular mechanism of risk score in the progression of NSCLC by analyzing the relationship between risk score and tumor immune infiltration. We found that the prognostic model’s risk score was significantly correlated with tumor immune infiltration. The levels of resting NK cells, activated CD4 memory T cells, and M0 macrophages were significantly lower in the low-risk group than in the high-risk group. Thus, we determined that the immune microenvironment in high-risk patients may be more conducive to tumor progression and evasion of immune surveillance.
Although drug sensitivity analysis suggested potential differential responses to certain agents (roscovitine, salubrinal, MS.275, and PF.4708671) between risk groups, these findings are preliminary and require validation in preclinical or clinical settings. Future studies should prioritize experimental validation of these predictions and explore whether the identified gene signature aligns with known mechanisms of drug response or resistance. The miRNA-mRNA network served as an exploratory extension of our signature, highlighting potential upstream regulators that could modulate the expression of risk-associated genes. While not integral to the prognostic model, this analysis suggests regulatory avenues for future experimental validation.
There are several limitations in this study. First, this study’s reliance on TCGA data and relatively small external validation set may limit the generalizability of our findings, as the heterogeneity of GSE50081 may better reflect clinical reality, its sample size and platform differences (microarray vs. RNA-seq) pose challenges for transportability. Larger and more diverse cohorts would further strengthen the generalizability of our findings. Second, although our model predicts drug sensitivity, we lack in vivo and in vitro validation to confirm these predictions. Future research should focus on experimentally validating these findings and further exploring the biological mechanisms underlying the observed drug sensitivity patterns. Lastly, prospective clinical trials are needed to correlate our predicted treatment responses with actual clinical outcomes, thereby providing a stronger foundation for the clinical application of our model.
Conclusions
Conclusions
In summary, TIDE-related transcriptional signals are consistently associated with prognosis in NSCLC, and a TIDE-informed 24-gene signature reproduces prognostic stratification across cohorts with coherent immune and pathway correlations. These results are associational and hypothesis-generating, yet may be applicable to a broader NSCLC population beyond those treated with immunotherapy, thereby extending the practical utility of TIDE and supporting future personalized stratification efforts. Further analytical validation, transportability testing, and mechanistic studies are needed before clinical adoption.
In summary, TIDE-related transcriptional signals are consistently associated with prognosis in NSCLC, and a TIDE-informed 24-gene signature reproduces prognostic stratification across cohorts with coherent immune and pathway correlations. These results are associational and hypothesis-generating, yet may be applicable to a broader NSCLC population beyond those treated with immunotherapy, thereby extending the practical utility of TIDE and supporting future personalized stratification efforts. Further analytical validation, transportability testing, and mechanistic studies are needed before clinical adoption.
Supplementary
Supplementary
The article’s supplementary files as
The article’s supplementary files as
출처: PubMed Central (JATS). 라이선스는 원 publisher 정책을 따릅니다 — 인용 시 원문을 표기해 주세요.
🏷️ 같은 키워드 · 무료전문 — 이 논문 MeSH/keyword 기반
- DIP-like Adenocarcinoma Presenting as a Part-Solid Nodule: A Case Report.
- Dynamic fluorine-18 fluorodeoxyglucose PET for evaluating different-sized metastatic lymph nodes in patients with non-small cell lung cancers.
- Lactotransferrin upregulation affects the pathological changes of non-small cell lung cancer by regulating ferroptosis.
- Adaptive therapy for perioperative non-small cell lung cancer: strategies guided by dynamic minimal residual disease adjustment.
- Predicting Stereotactic Body Radiation Therapy Response Using an AI-Based Tumor Vessel Biomarker.
- Artificial Intelligence Approaches for Predictive Biomarker Discovery in Non-Small Cell Lung Cancer.