ITPG: an immune-related transcriptomic predictive model for gastric cancer prognosis.
1/5 보강
[BACKGROUND] Although the global incidence of gastric cancer (GC) has declined over the past 5 years, it remains the fourth leading cause of cancer-related mortality worldwide.
- 95% CI 0.735-0.803
APA
Li M, Sun Y, et al. (2026). ITPG: an immune-related transcriptomic predictive model for gastric cancer prognosis.. Translational cancer research, 15(2), 125. https://doi.org/10.21037/tcr-2025-aw-2368
MLA
Li M, et al.. "ITPG: an immune-related transcriptomic predictive model for gastric cancer prognosis.." Translational cancer research, vol. 15, no. 2, 2026, pp. 125.
PMID
41815176 ↗
Abstract 한글 요약
[BACKGROUND] Although the global incidence of gastric cancer (GC) has declined over the past 5 years, it remains the fourth leading cause of cancer-related mortality worldwide. Given the molecular heterogeneity of GC, survival outcomes can vary significantly among patients receiving the same treatment at the same stage. Therefore, this study aimed to develop and validate a robust prognostic model for GC that complements the current staging system, to ultimately facilitate better clinical decision-making.
[METHODS] Utilizing gene expression data from four independent cohorts comprising 1,305 GC patients, we developed and validated the immune-related transcriptomic predictive model for gastric cancer prognosis (ITPG), which incorporates transcriptomic biomarkers and explores gene-gene interactions. Specifically, the ITPG model integrates two genes with main effects (, ) and two pairs of genes with gene-gene interactions (×, ×), in addition to clinical variables including age and pathological stage. Prognostic biomarkers were identified in The Cancer Genome Atlas (TCGA) cohort. The model's risk stratification ability, predictive performance, and clinical utility were subsequently evaluated in three external cohorts: GSE66229, GSE15459, and GSE84437.
[RESULTS] The ITPG demonstrated strong risk stratification potential in identifying high-risk patients. Compared to those in the lowest 25 percentile of ITPG scores, patients in the top 90 percentile had significantly shorter overall survival [hazard ratio (HR) =9.79, 95% confidence interval (CI): 7.25-13.21, P=2.78×10]. Furthermore, ITPG exhibited robust predictive performance across four cohorts, with pooled area under the curve (AUC) values for 1-year of 0.769 (95% CI: 0.735-0.803), 3-year of 0.762 (95% CI: 0.723-0.802), and 5-year of 0.765 (95% CI: 0.704-0.826) survival, and a C-index of 0.704 (95% CI: 0.678-0.729). Additionally, the model displayed substantial clinical utility in identifying GC patients at high risk of mortality [net benefit (NB) at 1-year =1.8%, NB =15.8%, NB =23.7%; net reduction (NR) at 1-year =58.6%, NR =20.4%, NR =11.7%]. Subgroup analyses confirmed the model's robustness across different population stratifications.
[CONCLUSIONS] The ITPG model is an efficient and clinically relevant tool for prognostic prediction in GC.
[METHODS] Utilizing gene expression data from four independent cohorts comprising 1,305 GC patients, we developed and validated the immune-related transcriptomic predictive model for gastric cancer prognosis (ITPG), which incorporates transcriptomic biomarkers and explores gene-gene interactions. Specifically, the ITPG model integrates two genes with main effects (, ) and two pairs of genes with gene-gene interactions (×, ×), in addition to clinical variables including age and pathological stage. Prognostic biomarkers were identified in The Cancer Genome Atlas (TCGA) cohort. The model's risk stratification ability, predictive performance, and clinical utility were subsequently evaluated in three external cohorts: GSE66229, GSE15459, and GSE84437.
[RESULTS] The ITPG demonstrated strong risk stratification potential in identifying high-risk patients. Compared to those in the lowest 25 percentile of ITPG scores, patients in the top 90 percentile had significantly shorter overall survival [hazard ratio (HR) =9.79, 95% confidence interval (CI): 7.25-13.21, P=2.78×10]. Furthermore, ITPG exhibited robust predictive performance across four cohorts, with pooled area under the curve (AUC) values for 1-year of 0.769 (95% CI: 0.735-0.803), 3-year of 0.762 (95% CI: 0.723-0.802), and 5-year of 0.765 (95% CI: 0.704-0.826) survival, and a C-index of 0.704 (95% CI: 0.678-0.729). Additionally, the model displayed substantial clinical utility in identifying GC patients at high risk of mortality [net benefit (NB) at 1-year =1.8%, NB =15.8%, NB =23.7%; net reduction (NR) at 1-year =58.6%, NR =20.4%, NR =11.7%]. Subgroup analyses confirmed the model's robustness across different population stratifications.
[CONCLUSIONS] The ITPG model is an efficient and clinically relevant tool for prognostic prediction in GC.
🏷️ 키워드 / MeSH 📖 같은 키워드 OA만
같은 제1저자의 인용 많은 논문 (5)
- Icariside II suppresses NF-κB/STAT3 signaling to prevent the progression of chronic atrophic gastritis toward gastric cancer.
- The protective role of a moderate protein diet in ETEC-infected piglets: Optimization of growth, immunity, and microbial balance.
- STOML2 promotes hepatocellular carcinoma cell proliferation, invasion and migration by activating the PI3K/AKT signaling pathway (Review).
- Kinesin Family Member 26A Disrupts DNA-Dependent Protein Kinase Complex Formation to Enhance Chemoradiotherapy Sensitivity in Colorectal Cancer.
- Developing an automatic decision-assistance tool to choose proton/photon radiotherapy for patients with prostate cancer.
📖 전문 본문 읽기 PMC JATS · ~59 KB · 영문
Introduction
Introduction
Gastric cancer (GC) ranks as the fifth most prevalent and fourth most lethal malignancy globally, with disproportionately high incidence and mortality rates observed in East Asia and Eastern Europe (1,2). Despite advancements in early detection, surgical interventions, and chemotherapeutic regimens, the prognosis for GC patients remains poor (3). The overall five-year survival rate for GC is approximately 20% to 30% in most regions worldwide, with exceptions in countries such as Japan and Korea (4). Prognosis is highly dependent on tumor stage at diagnosis, with lymph node involvement, tumor grade, and depth of invasion serving as critical determinants of survival outcomes (5).
Recent advances in high-throughput technologies have facilitated the identification of molecular biomarkers for GC, including genomic (6), epigenomic (7), transcriptomic (8), and proteomic alterations (9), which are crucial for enhancing survival rates and improving the quality of life for GC patients.
Several studies have underscored the critical role of immune cells within the tumor microenvironment (TME), which profoundly influences tumor initiation, progression, and prognosis (10,11). GC, as an immunosuppressive tumor, has demonstrated significant associations with TME dynamics and immune-related genes (IRGs) (12). For instance, co-expression of programmed cell death protein 1 (PD-1) and T cell immunoglobulin and mucin-domain containing-3 (TIM-3) has been linked to highly dysfunctional T cells which are prevalent in tumor-infiltrating lymphocytes (TILs) in advanced-stage gastric tumors, suggesting their involvement in tumor immune evasion through impaired T cell function (13,14). Moreover, the activation and function of immune cells are regulated by changes in the expression of IRG, further influencing the progression of GC (15). Additionally, regulatory elements such as microRNAs (miRNAs) and DNA methylation are pivotal in gene expression modulation, contributing to the oncogenesis and progression of GC (16,17).
Given these insights, IRGs have emerged as promising biomarkers for predicting tumor prognosis. Prognostic models that integrate IRG-associated alterations with overall survival outcomes have been widely explored across various cancers (18-20). However, many GC prognostic models face limitations, such as small sample sizes and the lack of external validation, which undermine their generalizability (21,22). Furthermore, most models focusing on prognostic biomarkers emphasize the main effects of predictors while neglecting gene-gene (G×G) interactions (23,24). Accordingly, the progression of diseases is regulated by intricate biological networks, where G×G interactions may elucidate deeper biological mechanisms and pathophysiological processes (25). Meanwhile, recent studies have demonstrated that incorporating predictor and G×G interactions can significantly enhance the accuracy of prognostic models for complex diseases (26,27). Nevertheless, most current GC prognostic signatures remain largely additive and seldom assess the robustness of interaction effects across independent cohorts (28,29). Collectively, these gaps underscore the critical need for an interaction-informed prognostic framework that is not only integrative but also rigorously validated.
In this study, we aimed to address these limitations by developing and validating an immune-related transcriptomic predictive model for gastric cancer prognosis (ITPG). Distinct from previous additive models, the ITPG uniquely integrates both the main effects of transcriptomic biomarkers and their G×G interactions. Utilizing data from four independent cohorts obtained from The Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO), we constructed the model and, more importantly, performed extensive validation to ensure its robustness and generalizability. Furthermore, we explored the relationships between the transcriptomic score and TME to provide deeper insights into the immune landscape of GC. We present this article in accordance with the TRIPOD reporting checklist (available at https://tcr.amegroups.com/article/view/10.21037/tcr-2025-aw-2368/rc).
Gastric cancer (GC) ranks as the fifth most prevalent and fourth most lethal malignancy globally, with disproportionately high incidence and mortality rates observed in East Asia and Eastern Europe (1,2). Despite advancements in early detection, surgical interventions, and chemotherapeutic regimens, the prognosis for GC patients remains poor (3). The overall five-year survival rate for GC is approximately 20% to 30% in most regions worldwide, with exceptions in countries such as Japan and Korea (4). Prognosis is highly dependent on tumor stage at diagnosis, with lymph node involvement, tumor grade, and depth of invasion serving as critical determinants of survival outcomes (5).
Recent advances in high-throughput technologies have facilitated the identification of molecular biomarkers for GC, including genomic (6), epigenomic (7), transcriptomic (8), and proteomic alterations (9), which are crucial for enhancing survival rates and improving the quality of life for GC patients.
Several studies have underscored the critical role of immune cells within the tumor microenvironment (TME), which profoundly influences tumor initiation, progression, and prognosis (10,11). GC, as an immunosuppressive tumor, has demonstrated significant associations with TME dynamics and immune-related genes (IRGs) (12). For instance, co-expression of programmed cell death protein 1 (PD-1) and T cell immunoglobulin and mucin-domain containing-3 (TIM-3) has been linked to highly dysfunctional T cells which are prevalent in tumor-infiltrating lymphocytes (TILs) in advanced-stage gastric tumors, suggesting their involvement in tumor immune evasion through impaired T cell function (13,14). Moreover, the activation and function of immune cells are regulated by changes in the expression of IRG, further influencing the progression of GC (15). Additionally, regulatory elements such as microRNAs (miRNAs) and DNA methylation are pivotal in gene expression modulation, contributing to the oncogenesis and progression of GC (16,17).
Given these insights, IRGs have emerged as promising biomarkers for predicting tumor prognosis. Prognostic models that integrate IRG-associated alterations with overall survival outcomes have been widely explored across various cancers (18-20). However, many GC prognostic models face limitations, such as small sample sizes and the lack of external validation, which undermine their generalizability (21,22). Furthermore, most models focusing on prognostic biomarkers emphasize the main effects of predictors while neglecting gene-gene (G×G) interactions (23,24). Accordingly, the progression of diseases is regulated by intricate biological networks, where G×G interactions may elucidate deeper biological mechanisms and pathophysiological processes (25). Meanwhile, recent studies have demonstrated that incorporating predictor and G×G interactions can significantly enhance the accuracy of prognostic models for complex diseases (26,27). Nevertheless, most current GC prognostic signatures remain largely additive and seldom assess the robustness of interaction effects across independent cohorts (28,29). Collectively, these gaps underscore the critical need for an interaction-informed prognostic framework that is not only integrative but also rigorously validated.
In this study, we aimed to address these limitations by developing and validating an immune-related transcriptomic predictive model for gastric cancer prognosis (ITPG). Distinct from previous additive models, the ITPG uniquely integrates both the main effects of transcriptomic biomarkers and their G×G interactions. Utilizing data from four independent cohorts obtained from The Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO), we constructed the model and, more importantly, performed extensive validation to ensure its robustness and generalizability. Furthermore, we explored the relationships between the transcriptomic score and TME to provide deeper insights into the immune landscape of GC. We present this article in accordance with the TRIPOD reporting checklist (available at https://tcr.amegroups.com/article/view/10.21037/tcr-2025-aw-2368/rc).
Methods
Methods
GC patient cohorts and data preparation
Gene expression data from 410 samples, methylation data from 393 samples, and miRNA expression data from 444 samples were obtained from the discovery cohort (TCGA-STAD, RRID:SCR_003193) using the R package “TCGAbiolinks”. For the TCGA dataset, RNA-sequencing data (Illumina RNA-seq) were normalized into transcripts per kilobase million (TPM) values. Subsequently, log2 transformation and z-score normalization were applied to the prognostic modeling phase. Similarly, miRNA sequencing data (Illumina miRNA-seq) were converted into reads per million mapped (RPM) values. Methylation data (Infinium HumanMethylation450K) were filtered and normalized using the “ChAMP” R package.
Three independent cohorts with clinical annotations and gene expression data were retrieved from the Gene Expression Omnibus (GEO, RRID:SCR_005012) via the “GEOquery” R package and utilized for external validation. These cohorts include GSE66229 (30) (Asian Cancer Research Group, ACRG, n=300), GSE15459 (31) (Gastric Cancer Project of Singapore Patient Cohort, SPC, n=191) and GSE84437 (32) (Yonsei Gastric Cancer Cohort, YGC, n=431). For all these gene expression datasets, log2 transformation and z-score normalization were performed as described above. To ensure comparability of gene expression data across different cohorts, we performed batch effect correction using the ComBat algorithm from the “sva” R package. The effectiveness of batch correction was validated through principal component analysis (PCA), with results presented in Figure S1.
Patients diagnosed with GC and with available transcriptomics data were retained across all cohorts. A total of 1,305 GC patients were included in the study, and their demographic and clinical characteristics are summarized in Table S1. Figure 1 illustrates the study design and workflow. This study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments.
Transcriptomic feature selection in TCGA cohort
Differential expression analyses of mRNA and miRNA between the early-stage (Stage I) and advanced-stage (Stage III and Stage IV) GC were performed on read count matrixes using the “DESeq2” R package. Genes with absolute log2 fold-change greater than 0.585 and a false discovery rate (FDR) of less than 0.05 were classified as differentially expressed genes (DEGs). Additionally, differential methylation analysis was conducted on probes between the two groups using the ChAMP package. Probes with delta beta value greater than 0.05 and FDR less than 0.05 were designated as differentially methylated probes (DMPs).
Subsequently, the identified differential miRNAs and CpG sites were mapped to corresponding genes using the miRTarBase (33) (RRID:SCR_017355) and ChAMP (34) (RRID:SCR_012891) databases, facilitating the connection between the regulatory elements and gene expression. Finally, genes associated with immune function were extracted for further analysis, based on the ImmPort (35) (RRID:SCR_012804) and InnateDB (36) (RRID:SCR_006714) databases.
Development and validation of the model
To develop and validate the prognostic model, we employed a 3-D strategy previously proposed in the literature (27), consisting of two types of effects, two stages of screening, and two phases of modeling.
First, we aimed to integrate transcriptomic predictors encompassing both main effects and G×G interactions. Specifically, (I) for the main effect, we utilized the Cox proportional hazards (Cox-PH) model, adjusting for covariates such as age, gender, and stage; (II) for the G×G interaction effect, we incorporated statistical interaction terms in the Cox-PH model with the same covariate adjustments. Specifically, we included multiplicative interaction terms (e.g., A×B) to evaluate the interaction effects between gene pairs, with statistical significance assessed through likelihood ratio tests. Pathological stage was classified according to the American Joint Committee on Cancer (AJCC) into four categories: Stage I, Stage II, Stage III, and Stage IV.
Next, we examined the selected immune genes to identify potential candidate genes and their interactions, which were validated in an independent dataset. In the TCGA cohort, models were fitted for each gene and interaction separately, with significant features selected while controlling the FDR at 5% (FDR <0.05). These genes and interactions were subsequently validated in the ACRG cohort, with only those exhibiting P<0.05 and consistent effect directions with the discovery phase considered as candidate biomarkers for the next modeling stage.
Using the candidate genes and interactions identified, we performed stepwise regression analysis on the TCGA cohort with Cox models adjusted for age and AJCC stage, to derive a final multivariable Cox model and construct ITPG, utilizing the R package “MASS”. To assess model stability, we performed internal validation using 1000 iterations of Bootstrap resampling on the TCGA cohort. This approach allowed us to robustly evaluate the model’s performance by calculating the mean C-index values. The prognostic transcriptomic model derived from the TCGA cohort was then validated in three GEO cohorts. The risk stratification ability of ITPG was assessed using the log-rank test and Kaplan-Meier survival analysis via the “survival” R package. The optimal cutoff value for risk stratification was determined as the median ITPG score in the TCGA training cohort and was consistently applied across all validation cohorts. The predictive performance of ITPG was further evaluated using the concordance index (C-index) and time-dependent receiver operating characteristic (ROC) curves and their area under the curve (AUC), employing the “survival” and “timeROC” R packages. Additionally, a meta-analysis was conducted to summarize ITPG’s prediction accuracy across all four cohorts using the “meta” R package. The clinical efficacy of the model, including the net benefit (NB) of correctly identifying high-risk patients and the net reduction (NR) of unnecessary interventions, was calculated using the “ggDCA” and “dcurves” R packages.
Finally, to improve the accessibility and application of the model, we developed a free online tool that predicts survival rates and 95% confidence intervals (CIs) for GC patients over time (0 to 72 months), based on an interactive web-based Kaplan-Meier survival curve (https://yilab5-njmu.shinyapps.io/itpg/).
EMT biological pathway
Given that ITPG was constructed in a data-driven manner, we recognized the importance of considering established immune-oncology pathways relevant to GC, such as epithelial-mesenchymal transition (EMT) pathway and the ZEB1-PD-L1 axis. These pathways play critical roles in GC progression and metastasis, which can influence patient prognosis (37,38). To explore the relationship between our model and these canonical pathways, we performed the following analyses. First, we obtained EMT-related genes from the dbEMT database and examined the overlap between this gene set and candidate predictive factors (39). Second, to formally test whether the inclusion of known key markers would improve model performance, we constructed an extended model, denoted as ITPG_plus, by incorporating the expression levels of ZEB1 and PD-L1 (CD274) genes as additional predictive factors. Then we evaluated the predictive performance of the ITPG_plus model across different cohorts and compared it with the original ITPG model.
Bioinformatics analysis
To explore the potential biological functions of the identified transcriptomic predictors, we performed gene enrichment pathway analysis utilizing the “clusterProfiler” R package, integrating the Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG, RRID:SCR_012773) databases. The Estimation of STromal and Immune cells in MAlignant Tumor tissues using Expression data (ESTIMATE) algorithm was employed to investigate the pattern of the tumor immune microenvironment (TIME) (40) based on gene expression data. CIBERSORT, a deconvolution algorithm based on linear support vector regression, was applied to estimate the proportions of 22 immune cell types within tumor samples (41). Additionally, we evaluated the differential expression of immune checkpoint genes (ICGs) across subgroups categorized by transcriptomic scores and examined their correlation with transcriptomic score by Pearson correlation analysis. Finally, we explored immune-related drugs targeting the transcriptomic predictors through the resources available in the DrugBank (SCR_002700) database (42).
Statistical analysis
All statistical analyses were conducted using R version 4.3.0. Continuous variables were described as means ± standard deviation, while categorical variables were summarized as frequencies (n) and proportions (%). The Wilcoxon test was applied to compare two paired groups, while Chi-square tests were used to assess the associations between categorical variables. Missing covariates in the TCGA cohort were imputed using the multiple imputation method with the “mice” R package. The associations between patient features and overall survival were analyzed using Cox-PH models, with the “survival” R package. The Kaplan-Meier method was employed to estimate the survival probabilities for each group, and log-rank tests were performed to compare survival distributions across two groups. Restricted mean survival times were calculated by the “survRM2” R package. Nomogram and calibration curve for the model were calculated with the “rms” R package. All statistical P values were considered two-sided unless otherwise specified.
GC patient cohorts and data preparation
Gene expression data from 410 samples, methylation data from 393 samples, and miRNA expression data from 444 samples were obtained from the discovery cohort (TCGA-STAD, RRID:SCR_003193) using the R package “TCGAbiolinks”. For the TCGA dataset, RNA-sequencing data (Illumina RNA-seq) were normalized into transcripts per kilobase million (TPM) values. Subsequently, log2 transformation and z-score normalization were applied to the prognostic modeling phase. Similarly, miRNA sequencing data (Illumina miRNA-seq) were converted into reads per million mapped (RPM) values. Methylation data (Infinium HumanMethylation450K) were filtered and normalized using the “ChAMP” R package.
Three independent cohorts with clinical annotations and gene expression data were retrieved from the Gene Expression Omnibus (GEO, RRID:SCR_005012) via the “GEOquery” R package and utilized for external validation. These cohorts include GSE66229 (30) (Asian Cancer Research Group, ACRG, n=300), GSE15459 (31) (Gastric Cancer Project of Singapore Patient Cohort, SPC, n=191) and GSE84437 (32) (Yonsei Gastric Cancer Cohort, YGC, n=431). For all these gene expression datasets, log2 transformation and z-score normalization were performed as described above. To ensure comparability of gene expression data across different cohorts, we performed batch effect correction using the ComBat algorithm from the “sva” R package. The effectiveness of batch correction was validated through principal component analysis (PCA), with results presented in Figure S1.
Patients diagnosed with GC and with available transcriptomics data were retained across all cohorts. A total of 1,305 GC patients were included in the study, and their demographic and clinical characteristics are summarized in Table S1. Figure 1 illustrates the study design and workflow. This study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments.
Transcriptomic feature selection in TCGA cohort
Differential expression analyses of mRNA and miRNA between the early-stage (Stage I) and advanced-stage (Stage III and Stage IV) GC were performed on read count matrixes using the “DESeq2” R package. Genes with absolute log2 fold-change greater than 0.585 and a false discovery rate (FDR) of less than 0.05 were classified as differentially expressed genes (DEGs). Additionally, differential methylation analysis was conducted on probes between the two groups using the ChAMP package. Probes with delta beta value greater than 0.05 and FDR less than 0.05 were designated as differentially methylated probes (DMPs).
Subsequently, the identified differential miRNAs and CpG sites were mapped to corresponding genes using the miRTarBase (33) (RRID:SCR_017355) and ChAMP (34) (RRID:SCR_012891) databases, facilitating the connection between the regulatory elements and gene expression. Finally, genes associated with immune function were extracted for further analysis, based on the ImmPort (35) (RRID:SCR_012804) and InnateDB (36) (RRID:SCR_006714) databases.
Development and validation of the model
To develop and validate the prognostic model, we employed a 3-D strategy previously proposed in the literature (27), consisting of two types of effects, two stages of screening, and two phases of modeling.
First, we aimed to integrate transcriptomic predictors encompassing both main effects and G×G interactions. Specifically, (I) for the main effect, we utilized the Cox proportional hazards (Cox-PH) model, adjusting for covariates such as age, gender, and stage; (II) for the G×G interaction effect, we incorporated statistical interaction terms in the Cox-PH model with the same covariate adjustments. Specifically, we included multiplicative interaction terms (e.g., A×B) to evaluate the interaction effects between gene pairs, with statistical significance assessed through likelihood ratio tests. Pathological stage was classified according to the American Joint Committee on Cancer (AJCC) into four categories: Stage I, Stage II, Stage III, and Stage IV.
Next, we examined the selected immune genes to identify potential candidate genes and their interactions, which were validated in an independent dataset. In the TCGA cohort, models were fitted for each gene and interaction separately, with significant features selected while controlling the FDR at 5% (FDR <0.05). These genes and interactions were subsequently validated in the ACRG cohort, with only those exhibiting P<0.05 and consistent effect directions with the discovery phase considered as candidate biomarkers for the next modeling stage.
Using the candidate genes and interactions identified, we performed stepwise regression analysis on the TCGA cohort with Cox models adjusted for age and AJCC stage, to derive a final multivariable Cox model and construct ITPG, utilizing the R package “MASS”. To assess model stability, we performed internal validation using 1000 iterations of Bootstrap resampling on the TCGA cohort. This approach allowed us to robustly evaluate the model’s performance by calculating the mean C-index values. The prognostic transcriptomic model derived from the TCGA cohort was then validated in three GEO cohorts. The risk stratification ability of ITPG was assessed using the log-rank test and Kaplan-Meier survival analysis via the “survival” R package. The optimal cutoff value for risk stratification was determined as the median ITPG score in the TCGA training cohort and was consistently applied across all validation cohorts. The predictive performance of ITPG was further evaluated using the concordance index (C-index) and time-dependent receiver operating characteristic (ROC) curves and their area under the curve (AUC), employing the “survival” and “timeROC” R packages. Additionally, a meta-analysis was conducted to summarize ITPG’s prediction accuracy across all four cohorts using the “meta” R package. The clinical efficacy of the model, including the net benefit (NB) of correctly identifying high-risk patients and the net reduction (NR) of unnecessary interventions, was calculated using the “ggDCA” and “dcurves” R packages.
Finally, to improve the accessibility and application of the model, we developed a free online tool that predicts survival rates and 95% confidence intervals (CIs) for GC patients over time (0 to 72 months), based on an interactive web-based Kaplan-Meier survival curve (https://yilab5-njmu.shinyapps.io/itpg/).
EMT biological pathway
Given that ITPG was constructed in a data-driven manner, we recognized the importance of considering established immune-oncology pathways relevant to GC, such as epithelial-mesenchymal transition (EMT) pathway and the ZEB1-PD-L1 axis. These pathways play critical roles in GC progression and metastasis, which can influence patient prognosis (37,38). To explore the relationship between our model and these canonical pathways, we performed the following analyses. First, we obtained EMT-related genes from the dbEMT database and examined the overlap between this gene set and candidate predictive factors (39). Second, to formally test whether the inclusion of known key markers would improve model performance, we constructed an extended model, denoted as ITPG_plus, by incorporating the expression levels of ZEB1 and PD-L1 (CD274) genes as additional predictive factors. Then we evaluated the predictive performance of the ITPG_plus model across different cohorts and compared it with the original ITPG model.
Bioinformatics analysis
To explore the potential biological functions of the identified transcriptomic predictors, we performed gene enrichment pathway analysis utilizing the “clusterProfiler” R package, integrating the Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG, RRID:SCR_012773) databases. The Estimation of STromal and Immune cells in MAlignant Tumor tissues using Expression data (ESTIMATE) algorithm was employed to investigate the pattern of the tumor immune microenvironment (TIME) (40) based on gene expression data. CIBERSORT, a deconvolution algorithm based on linear support vector regression, was applied to estimate the proportions of 22 immune cell types within tumor samples (41). Additionally, we evaluated the differential expression of immune checkpoint genes (ICGs) across subgroups categorized by transcriptomic scores and examined their correlation with transcriptomic score by Pearson correlation analysis. Finally, we explored immune-related drugs targeting the transcriptomic predictors through the resources available in the DrugBank (SCR_002700) database (42).
Statistical analysis
All statistical analyses were conducted using R version 4.3.0. Continuous variables were described as means ± standard deviation, while categorical variables were summarized as frequencies (n) and proportions (%). The Wilcoxon test was applied to compare two paired groups, while Chi-square tests were used to assess the associations between categorical variables. Missing covariates in the TCGA cohort were imputed using the multiple imputation method with the “mice” R package. The associations between patient features and overall survival were analyzed using Cox-PH models, with the “survival” R package. The Kaplan-Meier method was employed to estimate the survival probabilities for each group, and log-rank tests were performed to compare survival distributions across two groups. Restricted mean survival times were calculated by the “survRM2” R package. Nomogram and calibration curve for the model were calculated with the “rms” R package. All statistical P values were considered two-sided unless otherwise specified.
Results
Results
IRGs of GC progress
Patients with GC in the TCGA cohort were stratified into two groups: early-stage (including stage I) and advanced-stage (including stage III and stage IV). The early-stage group comprised 58 gene expression samples, 52 methylation samples, and 58 miRNA samples. In contrast, the advanced-stage group included 206 gene expression samples, 205 methylation samples, and 221 miRNA samples.
Through differential expression analysis, we identified 1,870 DEGs, 33 differentially expressed miRNAs, and 11,533 differentially methylated sites in GC patients (Figure S2). The overlap among these gene sets was examined, and a Venn diagram was used to illustrate the intersection of the three gene sets (Figure S3).
In our analysis, we identified a total of 127 genes associated with the progression of GC across all three omics layers. Furthermore, among the 12,323 genes potentially linked to GC progression, 3,215 were classified as immune functional genes based on annotations from the ImmPort and InnateDB databases. Finally, 506 genes were found to be present in the transcriptomic data of the TCGA cohort. The GO and KEGG enrichment analysis of 506 genes revealed significant enrichment in immune response pathways (Figure S4). GO analysis showed these genes were primarily enriched in processes such as immune response activation, immune cell proliferation, and immune signaling pathways, while KEGG analysis further confirmed their enrichment in immune-related pathways including Cytokine-cytokine receptor interaction, B cell receptor signaling pathway, and other immune-related pathways.
Model construction
Univariate Cox regression analysis identified 87 genes with main effects and 391 gene pairs exhibiting G×G interactions potentially associated with overall survival in the TCGA cohort (FDR <0.05). Among these, 35 genes and 55 gene pairs were validated as candidate transcriptional predictors in the ACRG cohort (Tables S2,S3). Subsequently, a stepwise regression approach was applied to the TCGA training cohort, leading to the development of a transcriptomic model that incorporated two genes with main effects (KCNQ1, FLRT2), which are enriched in pathways of potassium ion homeostasis and transmembrane transport, and two pairs of genes with G×G interactions (ATP4B×CD84, NPY×ITGBL1). The final model, which includes age, pathological stage, and transcriptomic predictors, is expressed as follows (Table S4):
To evaluate model robustness and discriminative performance, we performed 1,000 bootstrap iterations on the TCGA cohort. The analysis yielded a mean C-index of 0.6895 (95% CI: 0.6728–0.7062) for the ITPG model and 0.6313 (95% CI: 0.5841–0.6805) for the Clinic model, indicating robust internal consistency of the model’s predictive performance (Tables S5,S6).
Multivariable Cox regression analysis demonstrated that the transcriptomic scores functioned as a robust independent prognostic factor across various cohorts after adjusting age and pathological stage (Figure S5). The sensitivity analysis, in which pathological stage was categorized into Stage I (reference), Stage II, and combined Stage III/IV, further demonstrated the consistent and significant prognostic value of the transcriptomic score (Figure S6).
Risk stratification ability of the model
Patients in the TCGA training cohort were stratified into low- and high-risk groups based on the median risk score (cutoff =0.598), while those in the testing cohorts were categorized using the same cutoff. The distribution of risk scores, survival status of GC patients, and the expression of genes as prognostic predictors between the high-risk and low-risk groups of the TCGA cohort are illustrated in Figure S7. The model demonstrated robust stratification ability in both the training and testing sets.
Compared to the low-risk group in the corresponding cohort, the high-risk group was associated with worse survival outcomes in the TCGA (training set) and ACRG cohort (internal testing cohort), exhibiting substantial hazard ratios (HR) (HRTCGA =3.06, 95% CI: 2.18–4.29, P=1.06×10−10; HRACRG =2.80, 95% CI: 2.00–3.92, P<1.91×10−9) (Figure 2A,2B). Similarly, in the two external testing sets, significant differences in survival were observed (HRSPC =3.64, 95% CI: 2.25–5.91, P=1.61×10−7; HRYGC =2.68, 95% CI: 2.02–3.56, P=7.49×10−12) (Figure 2C,2D).
Furthermore, we assessed the stratification ability of the ITPG in the TCGA cohort by categorizing patients into low- and high-risk groups based on the median of (I) the clinical score, which was a weighted linear combination of demographic (age) and clinical factors (AJCC stage), and (II) the transcriptomic score, which incorporated both main effect genes and G×G interactions. To validate the generalizability of these risk stratifications, we applied the same median cutoffs derived from the TCGA cohort to the three independent validation cohorts (ACRG, SPC, and YGC). For the TCGA cohort, a progressive increase in HR was observed, starting from the clinical model (HRclinic =2.19, 95% CI: 1.58–3.04, P=2.29×10−6), to the transcriptomic model (HRtranscriptomic =2.29, 95% CI: 1.65–3.17, P=7.28×10−7), and finally to the full model, which integrates all predictive factors (Figure S8). In all external cohorts, the transcriptomic score remained a significant and independent prognostic factor, confirming that its prognostic value is not driven by age and tumor stage. The full integrated ITPG score demonstrated robust and consistent stratification across all cohorts.
We also evaluated the discriminative capability of the ITPG by stratifying GC patients into five groups based on quintiles and the 90th percentile of the risk score within the combined cohort, after adjusting for batch effects between cohorts. The restricted mean survival times (RMST) showed a significant decline from 8.03 years in the level 1 group (below the 25th percentile) to 1.92 years in the level 5 group (above the 90th percentile), with the truncation time point set at 10 years. A clear dose-response relationship was observed, where higher-percentile groups were associated with progressively reduced survival and increased risk of death. Specifically, the HRs demonstrated a stepwise increase (HRlevel2 =2.00, 95% CI: 1.50–2.66, P=2.04×10−6; HRlevel3 =2.78, 95% CI: 2.11–3.66, P=2.68×10−13; HRlevel4 =6.14, 95% CI: 4.63–8.14, P=2.03×10−36; HRlevel5 =9.79, 95% CI: 7.25–13.21, P=2.78×10−50) (Figure 2E,2F).
Predictive performance of the model
The model demonstrated strong predictive ability for 1-, 3-, and 5-year survival probabilities across both the TCGA training set and the ACRG testing set, with the following AUC values: AUC1-year =0.737 and 0.816; AUC3-year =0.730 and 0.783; AUC5-year =0.797 and 0.753 (Figure 3A,3B). Additionally, the model exhibited notable predictive performance in the external SPC and YGC testing cohorts, with AUC values: AUC1-year =0.762 and 0.769; AUC3-year =0.809 and 0.727; AUC5-year =0.833 and 0.702 (Figure 3C,3D). Meta-analysis results further reinforced the model’s predictive ability across combined data, with AUC values: AUC1-year =0.769, 95% CI: 0.735–0.803; AUC3-year =0.762, 95% CI: 0.723–0.802; AUC5-year =0.765, 95% CI: 0.704–0.826.
Moreover, the model achieved favorable C-index scores in the TCGA training cohort (0.703), ACRG internal testing cohort (0.729), and two external testing cohorts: SPC (0.715) and YGC (0.675), resulting in an overall pooled C-index of 0.704 (95% CI: 0.678–0.729) (Figure 3E-3H).
By incorporating the transcriptomic predictor, the full model significantly outperformed the basic clinic model, which included demographic and clinical factors, in the TCGA cohort. The inclusion of the transcriptomic predictor improved the time-dependent AUC for overall survival prediction by 0.088 (13.6%) at 1 year, 0.092 (14.4%) at 3 years, and 0.187 (30.7%) at 5 years (Figure S9). This superior predictive performance was consistently validated in three independent validation cohorts (ACRG, SPC, and YGC), especially in long-term prognosis prediction. Notably, the ITPG model consistently achieved higher AUC values than the clinical model in all cohorts, demonstrating that the transcriptomic score provides incremental and generalizable prognostic information beyond conventional clinical factors.
Clinical efficacy of the model
Decision curve analysis (DCA) indicated that the ITPG model provided higher clinical NBs compared to several alternative intervention strategies, including intervention for all, no intervention, and intervention based on a basic model incorporating clinical and demographic factors. Notably, when compared to the “no intervention” strategy and within a reasonable threshold probability (e.g., Pt=0.4), the ITPG model achieved a higher NB than the basic model: NBITPG =0.018 vs. NBBasic =0.014 for 1-year survival, NBITPG =0.158 vs. NBBasic =0.142 for 3-year survival, and NBITPG =0.237 vs. NBBasic =0.202 for 5-year survival (Figure 4A-4C). Practically, that means the ITPG model identified 237 true positives per thousand patients who required intervention, whereas the basic model identified only 202, using 5-year survival as the endpoint.
In contrast, compared to the “intervention for all” approach, the ITPG model yielded a higher NR in unnecessary interventions than the basic model: NRITPG =58.6% vs. NRBasic =58.0% for 1-year survival, NRITPG =20.4% vs. NRBasic =18.1% for 3-year survival, and NRITPG =11.7% vs. NRBasic =6.7% for 5-year survival (Figure 4D-4F). Thus, the ITPG model could reduce unnecessary interventions by 11.7% without omitting any high-risk patients, compared to a 6.7% reduction with the basic model for 5-year survival.
Sensitivity analysis, which varied the threshold probability from 0 to 0.5, showed that the decision curves for the ITPG model consistently outperformed other strategies across this range of probabilities. The ITPG model achieved the highest average NB and NR for 1-, 3-, and 5-year survival predictions: NB1-year =0.083, NR1-year =38.30%; NB3-year =0.221, NR3-year =11.10%; and NB5-year =0.284, NR5-year =6.05%, confirming its robustness and suitability for clinical application (Figure 4A-4F).
To improve individualized prognostic assessment and facilitate the identification of high-risk patients, we developed an ITPG nomogram for estimating 1-, 3- and 5-year survival, as shown in Figure 4G. The calibration curve for both the training and testing cohorts indicated that the ITPG model exhibited a good fit (Figure 4H; Figure S10).
Sensitivity analysis of the model prediction
To assess the robustness of the ITPG model, we performed subgroup analyses based on age, gender, and AJCC stage. The ITPG model consistently exhibited strong predictive ability across different subgroups, with HRs reflecting the association between risk scores and overall survival ranging from 1.92 (95% CI: 1.51–2.46, P=1.66×10−7) to 2.80 (95% CI: 2.35–3.33, P<1.00×10−16) (Figure 5). Additionally, the model achieved favorable AUC values across all subgroups, with AUCs spanning from 0.635 (95% CI: 0.388–0.882) to 0.808 (95% CI: 0.758–0.858) for 1-year survival, from 0.672 (95% CI: 0.626–0.717) to 0.799 (95% CI: 0.752–0.846) for 3-year survival, and from 0.663 (95% CI: 0.595–0.730) to 0.768 (95% CI: 0.656–0.879) for 5-year survival (Figure 5).
Transcriptomic predictors of ITPG and their immune relevance
KEGG annotation revealed that genes serving as transcriptomic predictors were significantly enriched in the “Gastric acid secretion” pathway. Meanwhile, GO annotation identified 160 biological process pathways, 19 molecular function pathways, and 22 cellular component pathways, indicating potential biological functions (Table S7).
In TIME analysis, our study revealed that the transcriptomic score was significantly correlated with stromal, immune, and ESTIMATE scores (Figure 6A-6C). We compared the proportions of 22 immune cell types between the high- and low-risk groups, defined by the median transcriptomic score. The composition of 10 immune cell types differed significantly between the two groups, with 6 types showing a positive correlation with the transcriptomic score (e.g., Monocytes), while 4 types exhibited a negative correlation (e.g., T cells CD4 memory) (Figure 6D,6E). Additionally, 19 ICGs exhibited significant expression differences between the high- and low-risk subgroups. Among these, 11 ICGs were positively correlated with the transcriptomic score, and 8 ICGs were negatively correlated, suggesting that the transcriptional predictors may influence immune responses (Figure S11). Multiple immune-related drugs targeting the transcriptomic predictors have been recorded in the DrugBank database (Table S8), suggesting that ITPG may play a valuable role in guiding immunotherapy strategies.
Association of ITPG with the EMT biological pathway
We observed overlaps between DEGs across multiple omics layers and genes involved in EMT. Specifically, 12 EMT-related genes showed differential expression across all three omics layers (Figure S3). Within the candidate immune gene set, 40 EMT-related genes were identified (Table S9). Additionally, a Cox proportional hazards model (adjusted for age and stage) in the TCGA training cohort revealed that higher expression of ZEB1 was significantly associated with worse prognosis (β=0.2953, P=3.95×10−4) (Table S10).
However, when ZEB1 and PD-L1 were were included in the extended model (ITPG_plus), both ZEB1 (β=0.0018, P=0.99) and PD-L1 (β=−0.1690, P=0.15) lost statistical significance in the multivariate Cox model (Table S11). Compared to the original ITPG model, ITPG_plus did not provide additional predictive contributions across multiple validation cohorts (Table S12), potentially due to the correlation between ZEB1 and existing predictors in the original model (Figure S12).
IRGs of GC progress
Patients with GC in the TCGA cohort were stratified into two groups: early-stage (including stage I) and advanced-stage (including stage III and stage IV). The early-stage group comprised 58 gene expression samples, 52 methylation samples, and 58 miRNA samples. In contrast, the advanced-stage group included 206 gene expression samples, 205 methylation samples, and 221 miRNA samples.
Through differential expression analysis, we identified 1,870 DEGs, 33 differentially expressed miRNAs, and 11,533 differentially methylated sites in GC patients (Figure S2). The overlap among these gene sets was examined, and a Venn diagram was used to illustrate the intersection of the three gene sets (Figure S3).
In our analysis, we identified a total of 127 genes associated with the progression of GC across all three omics layers. Furthermore, among the 12,323 genes potentially linked to GC progression, 3,215 were classified as immune functional genes based on annotations from the ImmPort and InnateDB databases. Finally, 506 genes were found to be present in the transcriptomic data of the TCGA cohort. The GO and KEGG enrichment analysis of 506 genes revealed significant enrichment in immune response pathways (Figure S4). GO analysis showed these genes were primarily enriched in processes such as immune response activation, immune cell proliferation, and immune signaling pathways, while KEGG analysis further confirmed their enrichment in immune-related pathways including Cytokine-cytokine receptor interaction, B cell receptor signaling pathway, and other immune-related pathways.
Model construction
Univariate Cox regression analysis identified 87 genes with main effects and 391 gene pairs exhibiting G×G interactions potentially associated with overall survival in the TCGA cohort (FDR <0.05). Among these, 35 genes and 55 gene pairs were validated as candidate transcriptional predictors in the ACRG cohort (Tables S2,S3). Subsequently, a stepwise regression approach was applied to the TCGA training cohort, leading to the development of a transcriptomic model that incorporated two genes with main effects (KCNQ1, FLRT2), which are enriched in pathways of potassium ion homeostasis and transmembrane transport, and two pairs of genes with G×G interactions (ATP4B×CD84, NPY×ITGBL1). The final model, which includes age, pathological stage, and transcriptomic predictors, is expressed as follows (Table S4):
To evaluate model robustness and discriminative performance, we performed 1,000 bootstrap iterations on the TCGA cohort. The analysis yielded a mean C-index of 0.6895 (95% CI: 0.6728–0.7062) for the ITPG model and 0.6313 (95% CI: 0.5841–0.6805) for the Clinic model, indicating robust internal consistency of the model’s predictive performance (Tables S5,S6).
Multivariable Cox regression analysis demonstrated that the transcriptomic scores functioned as a robust independent prognostic factor across various cohorts after adjusting age and pathological stage (Figure S5). The sensitivity analysis, in which pathological stage was categorized into Stage I (reference), Stage II, and combined Stage III/IV, further demonstrated the consistent and significant prognostic value of the transcriptomic score (Figure S6).
Risk stratification ability of the model
Patients in the TCGA training cohort were stratified into low- and high-risk groups based on the median risk score (cutoff =0.598), while those in the testing cohorts were categorized using the same cutoff. The distribution of risk scores, survival status of GC patients, and the expression of genes as prognostic predictors between the high-risk and low-risk groups of the TCGA cohort are illustrated in Figure S7. The model demonstrated robust stratification ability in both the training and testing sets.
Compared to the low-risk group in the corresponding cohort, the high-risk group was associated with worse survival outcomes in the TCGA (training set) and ACRG cohort (internal testing cohort), exhibiting substantial hazard ratios (HR) (HRTCGA =3.06, 95% CI: 2.18–4.29, P=1.06×10−10; HRACRG =2.80, 95% CI: 2.00–3.92, P<1.91×10−9) (Figure 2A,2B). Similarly, in the two external testing sets, significant differences in survival were observed (HRSPC =3.64, 95% CI: 2.25–5.91, P=1.61×10−7; HRYGC =2.68, 95% CI: 2.02–3.56, P=7.49×10−12) (Figure 2C,2D).
Furthermore, we assessed the stratification ability of the ITPG in the TCGA cohort by categorizing patients into low- and high-risk groups based on the median of (I) the clinical score, which was a weighted linear combination of demographic (age) and clinical factors (AJCC stage), and (II) the transcriptomic score, which incorporated both main effect genes and G×G interactions. To validate the generalizability of these risk stratifications, we applied the same median cutoffs derived from the TCGA cohort to the three independent validation cohorts (ACRG, SPC, and YGC). For the TCGA cohort, a progressive increase in HR was observed, starting from the clinical model (HRclinic =2.19, 95% CI: 1.58–3.04, P=2.29×10−6), to the transcriptomic model (HRtranscriptomic =2.29, 95% CI: 1.65–3.17, P=7.28×10−7), and finally to the full model, which integrates all predictive factors (Figure S8). In all external cohorts, the transcriptomic score remained a significant and independent prognostic factor, confirming that its prognostic value is not driven by age and tumor stage. The full integrated ITPG score demonstrated robust and consistent stratification across all cohorts.
We also evaluated the discriminative capability of the ITPG by stratifying GC patients into five groups based on quintiles and the 90th percentile of the risk score within the combined cohort, after adjusting for batch effects between cohorts. The restricted mean survival times (RMST) showed a significant decline from 8.03 years in the level 1 group (below the 25th percentile) to 1.92 years in the level 5 group (above the 90th percentile), with the truncation time point set at 10 years. A clear dose-response relationship was observed, where higher-percentile groups were associated with progressively reduced survival and increased risk of death. Specifically, the HRs demonstrated a stepwise increase (HRlevel2 =2.00, 95% CI: 1.50–2.66, P=2.04×10−6; HRlevel3 =2.78, 95% CI: 2.11–3.66, P=2.68×10−13; HRlevel4 =6.14, 95% CI: 4.63–8.14, P=2.03×10−36; HRlevel5 =9.79, 95% CI: 7.25–13.21, P=2.78×10−50) (Figure 2E,2F).
Predictive performance of the model
The model demonstrated strong predictive ability for 1-, 3-, and 5-year survival probabilities across both the TCGA training set and the ACRG testing set, with the following AUC values: AUC1-year =0.737 and 0.816; AUC3-year =0.730 and 0.783; AUC5-year =0.797 and 0.753 (Figure 3A,3B). Additionally, the model exhibited notable predictive performance in the external SPC and YGC testing cohorts, with AUC values: AUC1-year =0.762 and 0.769; AUC3-year =0.809 and 0.727; AUC5-year =0.833 and 0.702 (Figure 3C,3D). Meta-analysis results further reinforced the model’s predictive ability across combined data, with AUC values: AUC1-year =0.769, 95% CI: 0.735–0.803; AUC3-year =0.762, 95% CI: 0.723–0.802; AUC5-year =0.765, 95% CI: 0.704–0.826.
Moreover, the model achieved favorable C-index scores in the TCGA training cohort (0.703), ACRG internal testing cohort (0.729), and two external testing cohorts: SPC (0.715) and YGC (0.675), resulting in an overall pooled C-index of 0.704 (95% CI: 0.678–0.729) (Figure 3E-3H).
By incorporating the transcriptomic predictor, the full model significantly outperformed the basic clinic model, which included demographic and clinical factors, in the TCGA cohort. The inclusion of the transcriptomic predictor improved the time-dependent AUC for overall survival prediction by 0.088 (13.6%) at 1 year, 0.092 (14.4%) at 3 years, and 0.187 (30.7%) at 5 years (Figure S9). This superior predictive performance was consistently validated in three independent validation cohorts (ACRG, SPC, and YGC), especially in long-term prognosis prediction. Notably, the ITPG model consistently achieved higher AUC values than the clinical model in all cohorts, demonstrating that the transcriptomic score provides incremental and generalizable prognostic information beyond conventional clinical factors.
Clinical efficacy of the model
Decision curve analysis (DCA) indicated that the ITPG model provided higher clinical NBs compared to several alternative intervention strategies, including intervention for all, no intervention, and intervention based on a basic model incorporating clinical and demographic factors. Notably, when compared to the “no intervention” strategy and within a reasonable threshold probability (e.g., Pt=0.4), the ITPG model achieved a higher NB than the basic model: NBITPG =0.018 vs. NBBasic =0.014 for 1-year survival, NBITPG =0.158 vs. NBBasic =0.142 for 3-year survival, and NBITPG =0.237 vs. NBBasic =0.202 for 5-year survival (Figure 4A-4C). Practically, that means the ITPG model identified 237 true positives per thousand patients who required intervention, whereas the basic model identified only 202, using 5-year survival as the endpoint.
In contrast, compared to the “intervention for all” approach, the ITPG model yielded a higher NR in unnecessary interventions than the basic model: NRITPG =58.6% vs. NRBasic =58.0% for 1-year survival, NRITPG =20.4% vs. NRBasic =18.1% for 3-year survival, and NRITPG =11.7% vs. NRBasic =6.7% for 5-year survival (Figure 4D-4F). Thus, the ITPG model could reduce unnecessary interventions by 11.7% without omitting any high-risk patients, compared to a 6.7% reduction with the basic model for 5-year survival.
Sensitivity analysis, which varied the threshold probability from 0 to 0.5, showed that the decision curves for the ITPG model consistently outperformed other strategies across this range of probabilities. The ITPG model achieved the highest average NB and NR for 1-, 3-, and 5-year survival predictions: NB1-year =0.083, NR1-year =38.30%; NB3-year =0.221, NR3-year =11.10%; and NB5-year =0.284, NR5-year =6.05%, confirming its robustness and suitability for clinical application (Figure 4A-4F).
To improve individualized prognostic assessment and facilitate the identification of high-risk patients, we developed an ITPG nomogram for estimating 1-, 3- and 5-year survival, as shown in Figure 4G. The calibration curve for both the training and testing cohorts indicated that the ITPG model exhibited a good fit (Figure 4H; Figure S10).
Sensitivity analysis of the model prediction
To assess the robustness of the ITPG model, we performed subgroup analyses based on age, gender, and AJCC stage. The ITPG model consistently exhibited strong predictive ability across different subgroups, with HRs reflecting the association between risk scores and overall survival ranging from 1.92 (95% CI: 1.51–2.46, P=1.66×10−7) to 2.80 (95% CI: 2.35–3.33, P<1.00×10−16) (Figure 5). Additionally, the model achieved favorable AUC values across all subgroups, with AUCs spanning from 0.635 (95% CI: 0.388–0.882) to 0.808 (95% CI: 0.758–0.858) for 1-year survival, from 0.672 (95% CI: 0.626–0.717) to 0.799 (95% CI: 0.752–0.846) for 3-year survival, and from 0.663 (95% CI: 0.595–0.730) to 0.768 (95% CI: 0.656–0.879) for 5-year survival (Figure 5).
Transcriptomic predictors of ITPG and their immune relevance
KEGG annotation revealed that genes serving as transcriptomic predictors were significantly enriched in the “Gastric acid secretion” pathway. Meanwhile, GO annotation identified 160 biological process pathways, 19 molecular function pathways, and 22 cellular component pathways, indicating potential biological functions (Table S7).
In TIME analysis, our study revealed that the transcriptomic score was significantly correlated with stromal, immune, and ESTIMATE scores (Figure 6A-6C). We compared the proportions of 22 immune cell types between the high- and low-risk groups, defined by the median transcriptomic score. The composition of 10 immune cell types differed significantly between the two groups, with 6 types showing a positive correlation with the transcriptomic score (e.g., Monocytes), while 4 types exhibited a negative correlation (e.g., T cells CD4 memory) (Figure 6D,6E). Additionally, 19 ICGs exhibited significant expression differences between the high- and low-risk subgroups. Among these, 11 ICGs were positively correlated with the transcriptomic score, and 8 ICGs were negatively correlated, suggesting that the transcriptional predictors may influence immune responses (Figure S11). Multiple immune-related drugs targeting the transcriptomic predictors have been recorded in the DrugBank database (Table S8), suggesting that ITPG may play a valuable role in guiding immunotherapy strategies.
Association of ITPG with the EMT biological pathway
We observed overlaps between DEGs across multiple omics layers and genes involved in EMT. Specifically, 12 EMT-related genes showed differential expression across all three omics layers (Figure S3). Within the candidate immune gene set, 40 EMT-related genes were identified (Table S9). Additionally, a Cox proportional hazards model (adjusted for age and stage) in the TCGA training cohort revealed that higher expression of ZEB1 was significantly associated with worse prognosis (β=0.2953, P=3.95×10−4) (Table S10).
However, when ZEB1 and PD-L1 were were included in the extended model (ITPG_plus), both ZEB1 (β=0.0018, P=0.99) and PD-L1 (β=−0.1690, P=0.15) lost statistical significance in the multivariate Cox model (Table S11). Compared to the original ITPG model, ITPG_plus did not provide additional predictive contributions across multiple validation cohorts (Table S12), potentially due to the correlation between ZEB1 and existing predictors in the original model (Figure S12).
Discussion
Discussion
GC is a highly aggressive malignancy characterized by significant heterogeneity, with patient survival time varying from less than 5 months to over 10 years (43,44). The histological classification of GC is primarily based on the WHO classification (papillary, tubular, mucinous, and poorly cohesive types) (45) and the Lauren classification (intestinal, diffuse, and mixed types) (46). However, some studies have reported that these classifications exhibit limited prognostic discriminative power (47,48). A likely explanation for the molecular heterogeneity of tumors in GC is the association with diverse clinical phenotypes, immune marker expressions, and prognostic outcomes (49). Therefore, integrating molecular prognostic markers with pathological staging system is essential for accurately identifying patients at high risk of poor prognosis and for better guiding adjuvant clinical decision-making.
Transcriptomic analysis, which utilizes small quantities of RNA, allows for comprehensive profiling of malignant cells and TME across various cancers (50). Numerous studies have reported that transcriptional changes in specific genes are closely linked to survival outcomes in GC (51-53). In this study, we used transcriptomic data to develop and validate a prognostic model for GC, named ITPG, using data from four publicly available and independent cohorts from different regions.
ITPG demonstrates potential value in screening high-risk patients. Importantly, the consistency of the transcriptomic score’s prognostic significance under different tumor stage specifications suggests that it does not merely recapitulate pathological staging information; rather, it captures additional biological heterogeneity associated with tumor aggressiveness and disease progression that is not fully accounted for by the conventional staging system. According to the Global Cancer Statistics for 2022, there were 968,350 new cases of GC worldwide (54). When setting the threshold for clinical intervention at a mortality probability of ≥0.4, our model can reduce unnecessary interventions by 58.6%, 20.4%, and 11.7% for 1-, 3-, and 5-year survival outcomes, respectively. This means that compared to the strategy where all GC patients receive intervention, the ITPG model can help filter out 567,453 (968.35×58.6%), 197,543 (968.35×20.4%), and 113,297 (968.35×11.7%) unnecessary interventions for 1-, 3-, and 5-year survival outcomes.
We concisely summarized the biological roles of genes acting as transcriptomic biomarkers in ITPG. KCNQ1 encodes the pore-forming α-subunit of a voltage-gated potassium channel, which generates K⁺ currents following membrane depolarization (55). In parietal cells, KCNQ1 plays a pivotal role in gastric acid secretion through its function as a luminal K⁺ channel (56). Biallelic mutations in KCNQ1 result in Jervell and Lange-Nielsen syndrome (JLNS). Research indicates that patients with JLNS are more likely to exhibit elevated gastrin levels, impaired gastric acid secretion, and an increased risk of gastric adenocarcinoma, compared to single KCNQ1 mutation carriers (57). Previous studies have identified KCNQ1 as a tumor suppressor gene with functional significance in gastrointestinal cancers, where its low expression is strongly associated with poor overall survival (51).
FLRT2, a member of the FLRT family, encodes cell adhesion molecules involved in cell adhesion, migration, and axon guidance (58). Recent studies have revealed that FLRT2 is involved in tumor progression and correlates negatively with the long-term survival of patients with gastric and colorectal cancers (59). NPY encodes a brain-gut peptide that is widely expressed in both the central and peripheral nervous systems (60,61). Its overexpression has been reported to be associated with reduced survival rates in GC patients (62,63). ITGBL1, an integrin, encodes a beta integrin-related extracellular matrix protein and has been found to be dysregulated in various cancers, such as colorectal cancer, hepatocellular carcinoma, and non-small cell lung cancer (64-66). Recent research has demonstrated that ITGBL1 overexpression significantly enhances the resistance of GC cells to anoikis and promotes their metastatic potential (53). ATP4B encodes the β-subunit of the proton pump H+/K+-ATPase, which mediates gastric acid secretion by parietal cells. Recent research has confirmed ATP4B as a tumor suppressor that restricts GC progression by modulating mitochondrial metabolism and apoptotic signaling pathways (52). Interestingly, although previous studies have identified the prognostic significance of ATP4B for GC patients based on different datasets (67,68), our study observed the positive association between ATP4B and overall survival time in the TCGA cohort through G×G interaction rather than the main effect. CD84, a member of the SLAM family of cell-surface immunoreceptors, is widely expressed across various immune cell subsets and acts as a homophilic adhesion molecule, modulating leukocyte functions by either activating or inhibiting their responses (69). Previous studies have revealed that CD84 is overexpressed in Epstein-Barr virus (EBV)-positive GC, a subtype exhibiting an improved prognosis compared to other GC subtypes, suggesting a potential association between CD84 and patient survival outcomes (70,71).
The construction of the ITPG model was primarily data-driven. We anticipate that integrating the model with established biological pathways, such as the EMT and ZEB1-PD-L1 regulatory axis, will enable a more refined characterization of cancer patient prognosis from a biomedical perspective. EMT is a biological process through which epithelial cells transition to a mesenchymal state, characterized by the loss of intercellular adhesion and cell polarity, along with the acquisition of migratory and invasive properties (72). Aberrant activation of EMT plays a critical role in GC initiation, invasion, and metastasis (73). The EMT transcription factor ZEB1 is a key inducer of cellular plasticity and promotes tumor progression towards metastasis. The GRHL2/ZEB1 feedback loop can upregulate PD-L1 expression, thereby helping GC cells evade immune attack (38,74). We further evaluated the model by incorporating core pathway genes ZEB1 and PD-L1. Although the predictive performance of the extended model was not significantly improved, comparative analyses revealed that one of the original predictor, FLRT2, exhibited a strong correlation with ZEB1 expression across different cohorts. Moreover, ITGBL1, another predictor, has been reported to be associated with the EMT signaling pathway in gastric, liver, and prostate cancers, and may contribute to cancer cell invasion and metastasis by inducing EMT (65,75-77).
There are several strengths in this study. First, the development and validation of ITPG were based on four independent cohorts, encompassing a total sample size of 1,305 cases. The model demonstrated strong predictive performance across different cohorts and subgroup analyses, exhibiting its transferability and robustness. Second, this study investigates the impact of G×G interactions on the survival of GC patients at the transcriptomic level, offering insights into the complex biological mechanisms underlying disease progression and providing novel perspectives on prognosis for GC patients. Third, we employed a two-step validation strategy for identifying prognostic biomarkers, focusing on those with significant main effects or G×G interactions. This approach enhances the predictive accuracy and generalizability of the model. Finally, we developed an online visualization and calculation tool to facilitate the practical application of ITPG model.
However, several limitations should also be considered. First, the experimental methods and technical platforms used to measure gene expression varied across cohorts, including RNA sequencing and expression microarrays, which may contribute to data heterogeneity. We addressed this issue by applying normal transformation and standardization techniques, which partially minimized these differences. Second, some cohorts lacked complete data on established prognostic factors for GC, such as microsatellite instability status (78) and EBV infection status (79). We anticipate that the availability of more comprehensive clinical annotations in the future will provide opportunities to further refine the model. Third, the ITPG model was primarily developed and validated in European and Asian populations, so its application to patients from other ancestries should be interpreted with caution. Finally, as this study is based on retrospective analysis, our findings require validation through prospective studies. Additionally, further biological experiments are needed to elucidate the underlying mechanisms of the predictive factors.
GC is a highly aggressive malignancy characterized by significant heterogeneity, with patient survival time varying from less than 5 months to over 10 years (43,44). The histological classification of GC is primarily based on the WHO classification (papillary, tubular, mucinous, and poorly cohesive types) (45) and the Lauren classification (intestinal, diffuse, and mixed types) (46). However, some studies have reported that these classifications exhibit limited prognostic discriminative power (47,48). A likely explanation for the molecular heterogeneity of tumors in GC is the association with diverse clinical phenotypes, immune marker expressions, and prognostic outcomes (49). Therefore, integrating molecular prognostic markers with pathological staging system is essential for accurately identifying patients at high risk of poor prognosis and for better guiding adjuvant clinical decision-making.
Transcriptomic analysis, which utilizes small quantities of RNA, allows for comprehensive profiling of malignant cells and TME across various cancers (50). Numerous studies have reported that transcriptional changes in specific genes are closely linked to survival outcomes in GC (51-53). In this study, we used transcriptomic data to develop and validate a prognostic model for GC, named ITPG, using data from four publicly available and independent cohorts from different regions.
ITPG demonstrates potential value in screening high-risk patients. Importantly, the consistency of the transcriptomic score’s prognostic significance under different tumor stage specifications suggests that it does not merely recapitulate pathological staging information; rather, it captures additional biological heterogeneity associated with tumor aggressiveness and disease progression that is not fully accounted for by the conventional staging system. According to the Global Cancer Statistics for 2022, there were 968,350 new cases of GC worldwide (54). When setting the threshold for clinical intervention at a mortality probability of ≥0.4, our model can reduce unnecessary interventions by 58.6%, 20.4%, and 11.7% for 1-, 3-, and 5-year survival outcomes, respectively. This means that compared to the strategy where all GC patients receive intervention, the ITPG model can help filter out 567,453 (968.35×58.6%), 197,543 (968.35×20.4%), and 113,297 (968.35×11.7%) unnecessary interventions for 1-, 3-, and 5-year survival outcomes.
We concisely summarized the biological roles of genes acting as transcriptomic biomarkers in ITPG. KCNQ1 encodes the pore-forming α-subunit of a voltage-gated potassium channel, which generates K⁺ currents following membrane depolarization (55). In parietal cells, KCNQ1 plays a pivotal role in gastric acid secretion through its function as a luminal K⁺ channel (56). Biallelic mutations in KCNQ1 result in Jervell and Lange-Nielsen syndrome (JLNS). Research indicates that patients with JLNS are more likely to exhibit elevated gastrin levels, impaired gastric acid secretion, and an increased risk of gastric adenocarcinoma, compared to single KCNQ1 mutation carriers (57). Previous studies have identified KCNQ1 as a tumor suppressor gene with functional significance in gastrointestinal cancers, where its low expression is strongly associated with poor overall survival (51).
FLRT2, a member of the FLRT family, encodes cell adhesion molecules involved in cell adhesion, migration, and axon guidance (58). Recent studies have revealed that FLRT2 is involved in tumor progression and correlates negatively with the long-term survival of patients with gastric and colorectal cancers (59). NPY encodes a brain-gut peptide that is widely expressed in both the central and peripheral nervous systems (60,61). Its overexpression has been reported to be associated with reduced survival rates in GC patients (62,63). ITGBL1, an integrin, encodes a beta integrin-related extracellular matrix protein and has been found to be dysregulated in various cancers, such as colorectal cancer, hepatocellular carcinoma, and non-small cell lung cancer (64-66). Recent research has demonstrated that ITGBL1 overexpression significantly enhances the resistance of GC cells to anoikis and promotes their metastatic potential (53). ATP4B encodes the β-subunit of the proton pump H+/K+-ATPase, which mediates gastric acid secretion by parietal cells. Recent research has confirmed ATP4B as a tumor suppressor that restricts GC progression by modulating mitochondrial metabolism and apoptotic signaling pathways (52). Interestingly, although previous studies have identified the prognostic significance of ATP4B for GC patients based on different datasets (67,68), our study observed the positive association between ATP4B and overall survival time in the TCGA cohort through G×G interaction rather than the main effect. CD84, a member of the SLAM family of cell-surface immunoreceptors, is widely expressed across various immune cell subsets and acts as a homophilic adhesion molecule, modulating leukocyte functions by either activating or inhibiting their responses (69). Previous studies have revealed that CD84 is overexpressed in Epstein-Barr virus (EBV)-positive GC, a subtype exhibiting an improved prognosis compared to other GC subtypes, suggesting a potential association between CD84 and patient survival outcomes (70,71).
The construction of the ITPG model was primarily data-driven. We anticipate that integrating the model with established biological pathways, such as the EMT and ZEB1-PD-L1 regulatory axis, will enable a more refined characterization of cancer patient prognosis from a biomedical perspective. EMT is a biological process through which epithelial cells transition to a mesenchymal state, characterized by the loss of intercellular adhesion and cell polarity, along with the acquisition of migratory and invasive properties (72). Aberrant activation of EMT plays a critical role in GC initiation, invasion, and metastasis (73). The EMT transcription factor ZEB1 is a key inducer of cellular plasticity and promotes tumor progression towards metastasis. The GRHL2/ZEB1 feedback loop can upregulate PD-L1 expression, thereby helping GC cells evade immune attack (38,74). We further evaluated the model by incorporating core pathway genes ZEB1 and PD-L1. Although the predictive performance of the extended model was not significantly improved, comparative analyses revealed that one of the original predictor, FLRT2, exhibited a strong correlation with ZEB1 expression across different cohorts. Moreover, ITGBL1, another predictor, has been reported to be associated with the EMT signaling pathway in gastric, liver, and prostate cancers, and may contribute to cancer cell invasion and metastasis by inducing EMT (65,75-77).
There are several strengths in this study. First, the development and validation of ITPG were based on four independent cohorts, encompassing a total sample size of 1,305 cases. The model demonstrated strong predictive performance across different cohorts and subgroup analyses, exhibiting its transferability and robustness. Second, this study investigates the impact of G×G interactions on the survival of GC patients at the transcriptomic level, offering insights into the complex biological mechanisms underlying disease progression and providing novel perspectives on prognosis for GC patients. Third, we employed a two-step validation strategy for identifying prognostic biomarkers, focusing on those with significant main effects or G×G interactions. This approach enhances the predictive accuracy and generalizability of the model. Finally, we developed an online visualization and calculation tool to facilitate the practical application of ITPG model.
However, several limitations should also be considered. First, the experimental methods and technical platforms used to measure gene expression varied across cohorts, including RNA sequencing and expression microarrays, which may contribute to data heterogeneity. We addressed this issue by applying normal transformation and standardization techniques, which partially minimized these differences. Second, some cohorts lacked complete data on established prognostic factors for GC, such as microsatellite instability status (78) and EBV infection status (79). We anticipate that the availability of more comprehensive clinical annotations in the future will provide opportunities to further refine the model. Third, the ITPG model was primarily developed and validated in European and Asian populations, so its application to patients from other ancestries should be interpreted with caution. Finally, as this study is based on retrospective analysis, our findings require validation through prospective studies. Additionally, further biological experiments are needed to elucidate the underlying mechanisms of the predictive factors.
Conclusions
Conclusions
We introduced an ITPG, which demonstrated notable predictive accuracy and robustness through external validation. This model offers a cost-effective approach for identifying high-risk GC patients with elevated mortality. Moreover, a free and user-friendly online application has been developed and is accessible at https://yilab5-njmu.shinyapps.io/itpg/.
We introduced an ITPG, which demonstrated notable predictive accuracy and robustness through external validation. This model offers a cost-effective approach for identifying high-risk GC patients with elevated mortality. Moreover, a free and user-friendly online application has been developed and is accessible at https://yilab5-njmu.shinyapps.io/itpg/.
Supplementary
Supplementary
The article’s supplementary files as
The article’s supplementary files as
출처: PubMed Central (JATS). 라이선스는 원 publisher 정책을 따릅니다 — 인용 시 원문을 표기해 주세요.
🏷️ 같은 키워드 · 무료전문 — 이 논문 MeSH/keyword 기반
- F-box protein 28 serves as a prognostic and predictive biomarker for gastric cancer.
- Identification of necrosis-related signature for predicting prognosis and immunotherapy response in gastric cancer.
- PHGDH knockdown activates autophagic flux to suppress migration and invasion of gastric cancer cells.
- Establishment and validation of a prognostic model based on liquid-liquid phase separation-related genes in gastric cancer.
- Pathway Mutation Accumulate Perturbation Score: A prognostic and predictive biomarker for immunotherapy in advanced gastric cancer.
- Integrated network toxicology and immune profiling identify ESR1 as a potential hub linking Benzo[a]pyrene exposure to gastric cancer risk.