A nine-gene nicotine-metabolism signature predicts prognosis and characterizes the immune landscape in colon adenocarcinoma.
1/5 보강
[BACKGROUND] Smoking is a preventable cause of colorectal cancer (CRC), and nicotine metabolism may promote tumorigenesis.
APA
Li J, Zhang J, et al. (2026). A nine-gene nicotine-metabolism signature predicts prognosis and characterizes the immune landscape in colon adenocarcinoma.. Journal of gastrointestinal oncology, 17(1), 10. https://doi.org/10.21037/jgo-2025-aw-905
MLA
Li J, et al.. "A nine-gene nicotine-metabolism signature predicts prognosis and characterizes the immune landscape in colon adenocarcinoma.." Journal of gastrointestinal oncology, vol. 17, no. 1, 2026, pp. 10.
PMID
41816596 ↗
Abstract 한글 요약
[BACKGROUND] Smoking is a preventable cause of colorectal cancer (CRC), and nicotine metabolism may promote tumorigenesis. We aimed to investigate the prognostic significance of nicotine metabolism‑related genes (NRGs) in colon adenocarcinoma (COAD) and to develop a gene signature for patient stratification.
[METHODS] Using the Cancer Genome Atlas COAD cohort, we performed differential expression analysis and weighted gene co-expression network analysis to identify NRGs. Functional enrichment, Cox regression, and least absolute shrinkage and selection operator (LASSO) modelling were applied to build a multigene signature, which was validated in external cohorts. Single-cell transcriptomic data and immune infiltration analyses were used to evaluate gene expression patterns and tumor microenvironmental features.
[RESULTS] We identified 767 differentially expressed NRGs enriched in extracellular matrix (ECM) organization and signaling pathways. A nine‑gene signature () was derived; this risk model independently predicted overall survival and showed robust performance across multiple datasets. Single‑cell analyses confirmed cell‑type‑specific expression of these genes. High‑risk scores were associated with altered immune cell infiltration, distinct mutation profiles, and differential drug sensitivity patterns.
[CONCLUSIONS] We propose a novel NRG-based prognostic signature that accurately predicts outcomes in COAD. The model highlights the interplay between nicotine metabolism, ECM remodeling, and immune responses, highlighting genes and pathways that may inform future therapeutic stratification.
[METHODS] Using the Cancer Genome Atlas COAD cohort, we performed differential expression analysis and weighted gene co-expression network analysis to identify NRGs. Functional enrichment, Cox regression, and least absolute shrinkage and selection operator (LASSO) modelling were applied to build a multigene signature, which was validated in external cohorts. Single-cell transcriptomic data and immune infiltration analyses were used to evaluate gene expression patterns and tumor microenvironmental features.
[RESULTS] We identified 767 differentially expressed NRGs enriched in extracellular matrix (ECM) organization and signaling pathways. A nine‑gene signature () was derived; this risk model independently predicted overall survival and showed robust performance across multiple datasets. Single‑cell analyses confirmed cell‑type‑specific expression of these genes. High‑risk scores were associated with altered immune cell infiltration, distinct mutation profiles, and differential drug sensitivity patterns.
[CONCLUSIONS] We propose a novel NRG-based prognostic signature that accurately predicts outcomes in COAD. The model highlights the interplay between nicotine metabolism, ECM remodeling, and immune responses, highlighting genes and pathways that may inform future therapeutic stratification.
🏷️ 키워드 / MeSH 📖 같은 키워드 OA만
같은 제1저자의 인용 많은 논문 (5)
- Ultrasound guided transabdominal botulinum toxin injection for refractory overactive bladder treatment.
- Short Nose Lengthening in Primary and Revision Rhinoplasty in Asians.
- Histopathological changes of fibrosis in human extra-ocular muscle caused by botulinum toxin A.
- Corrigendum to "Dual-energy CT radiomics for predicting neoadjuvant chemotherapy response in locally advanced gastric cancer: A dual-vendor validation study" [Eur. J. Surg. Oncol. 51 (2025) 110548].
- A case report of breast cancer recurrence with cystitis: the impact of immune checkpoint inhibitor therapy on the incidence of cystitis.
📖 전문 본문 읽기 PMC JATS · ~71 KB · 영문
Introduction
Introduction
Colorectal cancer (CRC) remains one of the most common malignancies and a leading cause of cancer-related death worldwide. Despite advances in screening and therapy, the global burden continues to grow. In 2023, about 153,020 people were diagnosed with CRC, and 52,550 succumbed to the disease in the United States, with an alarming increase in early-onset cases and a shift toward more advanced stages (1). Globally, over 1.9 million new cases and 930,000 deaths were estimated in 2020, and this burden is expected to reach 3.2 million cases and 1.6 million deaths by 2040 (2). Established risk factors for CRC include advancing age, diet, obesity, and hereditary syndromes, but modifiable exposures such as tobacco use play a significant role (3). Smoking has been identified as a major preventable cause of CRC. A recent meta-analysis of 26 studies reported that smokers have a 40% higher risk of CRC compared to nonsmokers (4). A large multi-ethnic cohort further demonstrated sex-specific and anatomical differences, with male ever-smokers showing a 39% increased risk of left-sided colon cancer and female ever-smokers having a 20% higher risk of right-sided colon cancer (5). These observations highlight that lifestyle-related exposures continue to influence CRC incidence, and the molecular mechanisms by which smoking encourages tumor initiation and progression remain not fully understood.
The addictive alkaloid in tobacco and its metabolites have emerged as key mediators of smoking-associated carcinogenesis. Experimental studies show that nicotine enhances colon cancer cell migration through activation of α7‑nicotinic acetylcholine receptors and upregulation of fibronectin via cyclooxygenase2 signaling (6), providing mechanistic evidence that nicotine itself can promote metastatic behavior. Beyond direct cell stimulation, nicotine is primarily metabolized by cytochrome P450 enzymes, such as CYP2A6 and CYP2B6, into cotinine and tobacco-specific N-nitrosamines (TSNAs). These TSNAs are potent carcinogens that activate oncogenic pathways, stimulate cell proliferation and angiogenesis, and undermine apoptosis (7). A comprehensive review emphasized that nicotine influences key steps in cancer development and may cause disease recurrence; it also noted that metabolic variations in CYP2A6 determine nicotine oxidation and can affect the activation of procarcinogens and the efficacy of chemopreventive agents (8). Within the colorectal tumor microenvironment, metabolic reprogramming and extracellular matrix (ECM) remodeling further promote tumor growth. The ECM, a major component of the tumor milieu, becomes disorganized during CRC progression. Type I collagen and matrix metalloproteinase‑2 increase with stage, while type IV collagen and tissue inhibitor of metalloproteinase3 decline, creating a permissive “soil” for malignant proliferation (9). Despite recognition that smoking and nicotine metabolism contribute to CRC, there remains a paucity of research linking these processes to specific gene expression programs and clinical outcomes. This gap underscores a pressing need to dissect the complex interplay between nicotine metabolism, metabolic reprogramming, and the tumor microenvironment.
Prognostic evaluation for CRC relies on anatomical staging, yet, the tumor-node-metastasis (TNM) system inadequately captures molecular heterogeneity and fails to stratify patients for targeted therapies. Even with improved screening, the 5year survival remains poor, and the decline in incidence has slowed down (2). Recent studies highlight that metabolic reprogramming is a hallmark of cancer cells; they outcompete neighboring cells for nutrients and generate immunosuppressive metabolites that remodel the microenvironment (10). Multiomics analyses have demonstrated that metabolic gene signatures correlate with survival across cancer types, yet, similar signatures for CRC are scarce (10). Concurrently, the tumor immune microenvironment (TIME) has a profound influence on disease progression and treatment response. Immunotherapy is effective in only a minority of CRC patients—those with deficient DNA mismatch repair or microsatellite instabilityhigh tumors—while most patients derive little benefit (11). The abundance of tumor‑infiltrating immune cells and expression of immune checkpoints have been shown to correlate with prognosis; for instance, CD8+ T‑cell infiltration portends better outcomes, whereas macrophage‑derived interleukin‑10 and transforming growth factorβ promote tumor growth (12). These findings suggest that integrating metabolic and immunological information could refine prognostic assessment and guide therapy, yet, how nicotine metabolism‑related genes (NRGs) interact with the immune landscape in colon adenocarcinoma (COAD) remains unknown.
Evidence from other malignancies supports the prognostic value of NRGs. In bladder cancer, nicotine and its metabolites accumulate at high concentrations in smokers’ urine, and continuous smoking is associated with higher incidence, recurrence, and chemoresistance. A recent study constructed an NRG‑based prognostic signature that predicted survival and immunotherapy response, emphasizing the importance of CYP2A6‑ and CYP2B6mediated nicotine metabolism and downstream pathways such as PI3K/AKT/mTOR in tumor progression (13). Such findings raise the possibility that similar gene signatures could stratify COAD patients. Genetic variation in CYP2A6 has been shown to influence colorectal tumor risk and modulate the chemopreventive effects of aspirin. At the same time, wholegene deletions or polymorphisms alter nicotine oxidation and the activation of procarcinogens (14). Despite these insights, there is still no comprehensive analysis of NRGs in CRC. Furthermore, smoking delivers thousands of chemicals, including polycyclic aromatic hydrocarbons and TSNAs, that damage DNA, disrupt cell cycle regulation, and impede apoptosis (15). A mechanistic study demonstrated that nicotine enhances colon cancer cell migration by inducing fibronectin expression through α7nicotinic acetylcholine receptor signaling (6). However, the broader transcriptional consequences and clinical relevance of NRG expression in COAD remain to be elucidated. Therefore, a critical gap exists in our understanding of how nicotine metabolism intersects with CRC heterogeneity, immune infiltration, and patient outcomes.
In this study, we aimed to bridge these gaps by systematically dissecting the role of NRGs in COAD. Using publicly available transcriptomic datasets, we first performed differential expression analysis and weighted gene co‑expression network analysis to identify candidate NRGs. We assessed their functional enrichment, highlighting their involvement in ECM organization, cytokine-cytokine receptor interaction, and PI3K-Akt signaling. We then constructed and validated a robust nine‑gene nicotine metabolism-related signature that accurately stratified patients into high‑ and low‑risk groups across multiple independent cohorts. This signature demonstrated superior prognostic performance compared with many existing models and was independently associated with clinicopathological features. Single‑cell sequencing data further delineated the cellular distribution of prognostic genes, while immune infiltration analyses revealed significant correlations between risk scores and diverse immune cell subsets, immune checkpoints, and predicted immunotherapy response. Finally, we examined mutation profiles and drug sensitivity patterns to explore the therapeutic implications of the signature. Together, these findings provide a comprehensive framework linking nicotine metabolism to COAD prognosis and the TIME, and they offer a novel tool for patient risk stratification and potential therapeutic targeting. We present this article in accordance with the TRIPOD reporting checklist (available at https://jgo.amegroups.com/article/view/10.21037/jgo-2025-aw-905/rc).
Colorectal cancer (CRC) remains one of the most common malignancies and a leading cause of cancer-related death worldwide. Despite advances in screening and therapy, the global burden continues to grow. In 2023, about 153,020 people were diagnosed with CRC, and 52,550 succumbed to the disease in the United States, with an alarming increase in early-onset cases and a shift toward more advanced stages (1). Globally, over 1.9 million new cases and 930,000 deaths were estimated in 2020, and this burden is expected to reach 3.2 million cases and 1.6 million deaths by 2040 (2). Established risk factors for CRC include advancing age, diet, obesity, and hereditary syndromes, but modifiable exposures such as tobacco use play a significant role (3). Smoking has been identified as a major preventable cause of CRC. A recent meta-analysis of 26 studies reported that smokers have a 40% higher risk of CRC compared to nonsmokers (4). A large multi-ethnic cohort further demonstrated sex-specific and anatomical differences, with male ever-smokers showing a 39% increased risk of left-sided colon cancer and female ever-smokers having a 20% higher risk of right-sided colon cancer (5). These observations highlight that lifestyle-related exposures continue to influence CRC incidence, and the molecular mechanisms by which smoking encourages tumor initiation and progression remain not fully understood.
The addictive alkaloid in tobacco and its metabolites have emerged as key mediators of smoking-associated carcinogenesis. Experimental studies show that nicotine enhances colon cancer cell migration through activation of α7‑nicotinic acetylcholine receptors and upregulation of fibronectin via cyclooxygenase2 signaling (6), providing mechanistic evidence that nicotine itself can promote metastatic behavior. Beyond direct cell stimulation, nicotine is primarily metabolized by cytochrome P450 enzymes, such as CYP2A6 and CYP2B6, into cotinine and tobacco-specific N-nitrosamines (TSNAs). These TSNAs are potent carcinogens that activate oncogenic pathways, stimulate cell proliferation and angiogenesis, and undermine apoptosis (7). A comprehensive review emphasized that nicotine influences key steps in cancer development and may cause disease recurrence; it also noted that metabolic variations in CYP2A6 determine nicotine oxidation and can affect the activation of procarcinogens and the efficacy of chemopreventive agents (8). Within the colorectal tumor microenvironment, metabolic reprogramming and extracellular matrix (ECM) remodeling further promote tumor growth. The ECM, a major component of the tumor milieu, becomes disorganized during CRC progression. Type I collagen and matrix metalloproteinase‑2 increase with stage, while type IV collagen and tissue inhibitor of metalloproteinase3 decline, creating a permissive “soil” for malignant proliferation (9). Despite recognition that smoking and nicotine metabolism contribute to CRC, there remains a paucity of research linking these processes to specific gene expression programs and clinical outcomes. This gap underscores a pressing need to dissect the complex interplay between nicotine metabolism, metabolic reprogramming, and the tumor microenvironment.
Prognostic evaluation for CRC relies on anatomical staging, yet, the tumor-node-metastasis (TNM) system inadequately captures molecular heterogeneity and fails to stratify patients for targeted therapies. Even with improved screening, the 5year survival remains poor, and the decline in incidence has slowed down (2). Recent studies highlight that metabolic reprogramming is a hallmark of cancer cells; they outcompete neighboring cells for nutrients and generate immunosuppressive metabolites that remodel the microenvironment (10). Multiomics analyses have demonstrated that metabolic gene signatures correlate with survival across cancer types, yet, similar signatures for CRC are scarce (10). Concurrently, the tumor immune microenvironment (TIME) has a profound influence on disease progression and treatment response. Immunotherapy is effective in only a minority of CRC patients—those with deficient DNA mismatch repair or microsatellite instabilityhigh tumors—while most patients derive little benefit (11). The abundance of tumor‑infiltrating immune cells and expression of immune checkpoints have been shown to correlate with prognosis; for instance, CD8+ T‑cell infiltration portends better outcomes, whereas macrophage‑derived interleukin‑10 and transforming growth factorβ promote tumor growth (12). These findings suggest that integrating metabolic and immunological information could refine prognostic assessment and guide therapy, yet, how nicotine metabolism‑related genes (NRGs) interact with the immune landscape in colon adenocarcinoma (COAD) remains unknown.
Evidence from other malignancies supports the prognostic value of NRGs. In bladder cancer, nicotine and its metabolites accumulate at high concentrations in smokers’ urine, and continuous smoking is associated with higher incidence, recurrence, and chemoresistance. A recent study constructed an NRG‑based prognostic signature that predicted survival and immunotherapy response, emphasizing the importance of CYP2A6‑ and CYP2B6mediated nicotine metabolism and downstream pathways such as PI3K/AKT/mTOR in tumor progression (13). Such findings raise the possibility that similar gene signatures could stratify COAD patients. Genetic variation in CYP2A6 has been shown to influence colorectal tumor risk and modulate the chemopreventive effects of aspirin. At the same time, wholegene deletions or polymorphisms alter nicotine oxidation and the activation of procarcinogens (14). Despite these insights, there is still no comprehensive analysis of NRGs in CRC. Furthermore, smoking delivers thousands of chemicals, including polycyclic aromatic hydrocarbons and TSNAs, that damage DNA, disrupt cell cycle regulation, and impede apoptosis (15). A mechanistic study demonstrated that nicotine enhances colon cancer cell migration by inducing fibronectin expression through α7nicotinic acetylcholine receptor signaling (6). However, the broader transcriptional consequences and clinical relevance of NRG expression in COAD remain to be elucidated. Therefore, a critical gap exists in our understanding of how nicotine metabolism intersects with CRC heterogeneity, immune infiltration, and patient outcomes.
In this study, we aimed to bridge these gaps by systematically dissecting the role of NRGs in COAD. Using publicly available transcriptomic datasets, we first performed differential expression analysis and weighted gene co‑expression network analysis to identify candidate NRGs. We assessed their functional enrichment, highlighting their involvement in ECM organization, cytokine-cytokine receptor interaction, and PI3K-Akt signaling. We then constructed and validated a robust nine‑gene nicotine metabolism-related signature that accurately stratified patients into high‑ and low‑risk groups across multiple independent cohorts. This signature demonstrated superior prognostic performance compared with many existing models and was independently associated with clinicopathological features. Single‑cell sequencing data further delineated the cellular distribution of prognostic genes, while immune infiltration analyses revealed significant correlations between risk scores and diverse immune cell subsets, immune checkpoints, and predicted immunotherapy response. Finally, we examined mutation profiles and drug sensitivity patterns to explore the therapeutic implications of the signature. Together, these findings provide a comprehensive framework linking nicotine metabolism to COAD prognosis and the TIME, and they offer a novel tool for patient risk stratification and potential therapeutic targeting. We present this article in accordance with the TRIPOD reporting checklist (available at https://jgo.amegroups.com/article/view/10.21037/jgo-2025-aw-905/rc).
Methods
Methods
Data sources
In this study, high-quality datasets from various public databases were utilized to ensure the reliability and broad applicability of the results. A dataset comprising 471 COAD samples and 41 normal control tissues was obtained from The Cancer Genome Atlas (TCGA) project via the UCSC Xena platform (https://xenabrowser.net/datapages/). Of these samples, 438 COAD cases provided available survival data, along with detailed clinical features, including patient age, T, N, M classifications, and overall tumor stage. To further assess the risk model and verify the expression of the identified biomarkers, the COAD datasets GSE17536 (n=177) and GSE39582 (n=562) were obtained from the Gene Expression Omnibus (GEO) database. Additionally, the single-cell RNA sequencing (scRNA-seq) dataset GSE132257, with 10 colon rectal cancer samples, was downloaded to highlight the single-cell transcriptomic expression of prognostic features. The selection of NRGs was gathered from MSigDB (http://gsea-msigdb.org) (Table S1). The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments.
Differential analysis and construction of the weighted gene co-expression network analysis (WGCNA) network
The “DESeq2” package (v.1.44.0) (16) was applied to identify differentially expressed genes (DEGs) between COAD samples and normal tissues in the TCGA-COAD cohort, using log2 fold change (FC) >1 and adjusted P<0.05 as the selection criteria. To further pinpoint module genes linked to NRGs in COAD, NRG scores for tumor and control samples in the TCGA-COAD dataset were computed using the single-sample gene set enrichment analysis (ssGSEA) method implemented in the “Gene Set Variation Analysis (GSVA)” package (v.1.52.3) (17), and the outcomes were visualized with the “ggstatsplot” package (18). Subsequently, to screen for module genes correlated with NRGs scores, we employed WGCNA (v.1.73) (19) to build the corresponding co-expression network. These samples in the TCGA-COAD dataset were subjected to clustering analysis to identify and exclude any outlier samples, thereby ensuring the accuracy of subsequent analyses. To balance the feasibility of the WGCNA analysis and focus on biologically informative signals, we excluded genes with low expression levels—mean expression across all TCGA-COAD samples ≤0.5 {log2 [transcripts per million (TPM) + 1]}—and then calculated the standard deviation (SD) of expression for each remaining gene. The top 35% most variable genes (n=5,116) were selected for WGCNA to ensure the expression matrix includes genes that are dynamically regulated across the heterogeneous COAD samples, which is central to co-expression network discovery as input data. We calculated the optimal soft-thresholding power, followed by computing gene adjacency and generating a dissimilarity matrix to perform hierarchical clustering, which produced the gene dendrogram. Subsequently, all samples were regarded as individual traits and were correlated with the WGCNA-derived modules using Pearson correlation to identify the module that showed the strongest association. The overlap between DEGs and the key module genes was then visualized with the “ggvenn” package (v.0.1.10), yielding the final set of differentially expressed NRGs (DE-NRGs).
Enrichment analysis
To gain deeper insight into the biological roles and signaling pathways linked to the DE-NRGs, we carried out enrichment analyses using the “clusterProfiler” package (v4.12.6) (20), incorporating Gene Ontology (GO) terms and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways, with statistical significance defined as P<0.05.
Construction and assessment of a risk model
To improve feature selection robustness and reduce overfitting, we used a multi-stage analysis approach to construct the risk model. Univariate Cox regression using the “survival” package (v.3.7.0) (21) was performed on DE-NRGs. Genes that met the permissive criteria of hazard ratio (HR) ≠1 and P<0.10 were categorized as initial candidate genes—a strategic decision to mitigate false-negative exclusion of biologically relevant features. All candidate genes then underwent proportional hazards (PH) assumption testing via Schoenfeld residuals. Genes that satisfied the PH assumption were subsequently analyzed using the least absolute shrinkage and selection operator (LASSO) method implemented in the “glmnet” package (v.4.1.8) (22) to address high dimensionality and mitigate overfitting in the candidate gene set. To address LASSO’s inherent coefficient shrinkage bias, verify the independent prognostic value of each gene, and balance predictive accuracy with model complexity, the LASSO-selected gene subset was further refined using bidirectional stepwise multivariate Cox regression (survival package v.3.7.0). This iterative process involves alternately applying forward selection (adding the most significant predictors at α=0.05) and backward elimination (removing the least significant variables at α=0.10), continuing until the Akaike Information Criterion (AIC) is minimized. The final prognostic model was subsequently constructed, and the corresponding risk score was computed according to the following equation:
Here, Xi indicates the normalized expression level of the i-th prognostic gene, and coefi refers to its corresponding Cox regression coefficient. Using the median value of the calculated risk score, patients were stratified into high- and low-risk categories. The receiver operating characteristic (ROC) curve illustrates model performance by plotting sensitivity (true-positive rate) on the y-axis against 1-specificity (false-positive rate) on the x-axis across various decision thresholds used for binary classification. The area under the ROC curve (AUC), which approaches 1 with increasing diagnostic accuracy, was computed to evaluate the model’s discriminative ability in terms of sensitivity and specificity. The model’s predictive accuracy in 1-, 3-, and 5-year was further evaluated by plotting ROC curves using the “survival ROC” package (v.1.0.3) (23). Patients were further divided into distinct expression subgroups according to the optimal cutoff derived from each prognostic gene’s expression level combined with survival outcomes. Kaplan-Meier (K-M) survival curves were plotted using the “survminer” package (v.0.5.0) in R (24). To determine whether the risk signature provides clinically independent prognostic value, we also performed univariate and multivariate Cox regression analyses in conjunction with the prognostic model and other clinicopathological parameters. Finally, the validity of the risk model was tested using the GSE17536 and GSE39582 datasets, and the HRs of the developed prognostic signature were also compared with those of seven published signatures.
Clinical characteristic analysis
To investigate how various clinical features relate to the calculated risk scores, COAD patients were grouped according to T stage, M stage, tumor stage, and other relevant clinical factors. Differences in risk scores across these subgroups were evaluated using the Wilcoxon test, and the results were illustrated with the “ggstatsplot” package.
Circos plotting and single cell sequencing analysis
Chromosomal locations of the prognostic signature genes were mapped using the human reference genome build hg38 (25). Visualization was performed with the “circlize” package (v.0.4.16) (26). Single-cell sequencing is a next-generation high-throughput approach that enables comprehensive transcriptomic profiling at the resolution of individual cells. This technique allows us to examine how prognostic genes are expressed across diverse cellular populations. At the beginning of the analysis, we conducted meticulous screening of the sequencing data (GSE132257) to ensure data quality. Subsequently, the “Seurat” package (v.5.1.0) was used to analyze the filtered dataset further (27), and cell samples were reasonably clustered using principal component analysis (PCA) and t-distributed stochastic neighbor embedding (t-SNE) dimensionality reduction techniques. To further resolve cell types, currently known cell markers were used for manual cell type annotation. These cell markers are detailed in Table S2.
GSEA
To explore the biological functions and signaling pathways linked to genes differing across the various risk groups, we selected the background gene sets “c2.cp.kegg_legacy.v2024.1.Hs.symbols.gmt” and “c5.go.v2024.1.Hs.symbols.gmt” from MSigDB. Differential expression analyses were performed to identify genes showing significant variation among the risk categories, and their log2 FC values were computed and ranked in descending order. These ranked differential genes were subsequently analyzed using GSEA in the “clusterProfiler” package, applying a significance threshold of P<0.05.
Tumor immunoassays
The influence of the prognostic genes on the TIME in COAD was assessed by applying ssGSEA to estimate the infiltration levels of 23 immune cell populations within tumor tissues. Immune cell differences between the high- and low-risk groups were compared using the Wilcoxon test, and associations between immune cell variations and prognostic characteristics were further examined. Moreover, disparities in the expression of 14 immune checkpoint molecules and 13 immune-related functional signatures were evaluated across the two risk groups. To predict potential differences in immunotherapy responsiveness, tumor immune dysfunction and exclusion (TIDE) scores and immunophenoscores (IPS) were also analyzed. Lastly, mutation profiles of TCGA-COAD samples were explored using the “maftools” package (v.2.20.0) (28) to identify gene mutations characteristic of each risk group.
Cluster analysis and GSVA
To elucidate the associations and distinctions among different cancer subtypes, tumor samples from TCGA-COAD were clustered using the “ConsensusClusterPlus” (v.1.70.0) (29), with characterization based on cumulative distribution function (CDF) curves. K-M survival curves were generated to validate the survival differences among the defined subtypes and to conduct differential survival comparisons. PCA was then applied to assess how distinctly these subtypes were separated. To further elucidate the biological mechanisms underlying the observed survival discrepancies, enrichment analyses were performed for subtypes showing significant survival variation. Using c2.cp.kegg_legacy.v2024.1.Hs.symbols.gmt as the reference gene set, scores for each pathway were computed across samples to identify gene sets that differed markedly between the subgroups.
Drug sensitivity analysis and establishment of molecular network
To assess how patients in different risk categories respond to commonly used antineoplastic agents, the half-maximal inhibitory concentrations (IC50) of 198 drugs were estimated for TCGA-COAD tumor samples using the “oncoPredict” package (v.1.2) (30). The training set is derived from the Genomics of Drug Sensitivity in Cancer (GDSC) database (https://www.cancerrxgene.org/). Subsequently, variations in IC50 values among the different risk groups were examined to uncover distinct patterns of drug sensitivity. Additionally, the regulatory role of prognostic features in COAD was investigated by constructing the transcription factor-messenger RNA-microRNA (TF-mRNA-miRNA) regulatory network using the NetworkAnalyst platform. These interactions, obtained from the RegNetwork database, were visualized with Cytoscape, providing deeper insights into the regulatory mechanisms affecting COAD.
Expression analysis of prognostic characteristics
The differential expression of the prognostic features between TCGA-COAD tumor tissues and normal controls was additionally verified using quantitative real-time polymerase chain reaction (qRT-PCR). A CRC cell line (Caco-2, human CRC cells) and a control cell line (NCM460, human colon mucosal epithelial cells) were used to assess the expression levels of genes that comprise the nicotine metabolism-related signature. All cell lines were cultured in Dulbecco’s modified Eagle’s medium (DMEM, Procell, PM150210, Wuhan, China) containing 10% fetal bovine serum (FBS, Procell, 164210), 1% Penicillin and 1% streptomycin (Procell, PB180120) under standard conditions (5% CO2, 37 ℃). Total RNA was extracted from the cells using FastPure Cell/Tissue Total RNA Isolation Kit V2 (Vazyme, RC112-01, Nanjing, China), and reverse transcription was performed using the HiScript III RT SuperMix for qPCR (+gDNA wiper) (Vazyme, R323-01). qRT-PCR was performed using a ChamQ Universal SYBR qPCR Master Mix (Vazyme, Q711-02) on LightCycler® 480 Instrument II (Roche, Basel, Switzerland). The primer sequences are listed in Table S3. The relative expression was computed using the 2−ΔΔCt approach, and GAPDH was used as the internal control gene.
Statistical analysis
All statistical analyses were conducted using R (v.4.4.1). Unless specified otherwise, differences between groups were considered statistically meaningful when the P value or adjusted P value was less than 0.05.
Data sources
In this study, high-quality datasets from various public databases were utilized to ensure the reliability and broad applicability of the results. A dataset comprising 471 COAD samples and 41 normal control tissues was obtained from The Cancer Genome Atlas (TCGA) project via the UCSC Xena platform (https://xenabrowser.net/datapages/). Of these samples, 438 COAD cases provided available survival data, along with detailed clinical features, including patient age, T, N, M classifications, and overall tumor stage. To further assess the risk model and verify the expression of the identified biomarkers, the COAD datasets GSE17536 (n=177) and GSE39582 (n=562) were obtained from the Gene Expression Omnibus (GEO) database. Additionally, the single-cell RNA sequencing (scRNA-seq) dataset GSE132257, with 10 colon rectal cancer samples, was downloaded to highlight the single-cell transcriptomic expression of prognostic features. The selection of NRGs was gathered from MSigDB (http://gsea-msigdb.org) (Table S1). The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments.
Differential analysis and construction of the weighted gene co-expression network analysis (WGCNA) network
The “DESeq2” package (v.1.44.0) (16) was applied to identify differentially expressed genes (DEGs) between COAD samples and normal tissues in the TCGA-COAD cohort, using log2 fold change (FC) >1 and adjusted P<0.05 as the selection criteria. To further pinpoint module genes linked to NRGs in COAD, NRG scores for tumor and control samples in the TCGA-COAD dataset were computed using the single-sample gene set enrichment analysis (ssGSEA) method implemented in the “Gene Set Variation Analysis (GSVA)” package (v.1.52.3) (17), and the outcomes were visualized with the “ggstatsplot” package (18). Subsequently, to screen for module genes correlated with NRGs scores, we employed WGCNA (v.1.73) (19) to build the corresponding co-expression network. These samples in the TCGA-COAD dataset were subjected to clustering analysis to identify and exclude any outlier samples, thereby ensuring the accuracy of subsequent analyses. To balance the feasibility of the WGCNA analysis and focus on biologically informative signals, we excluded genes with low expression levels—mean expression across all TCGA-COAD samples ≤0.5 {log2 [transcripts per million (TPM) + 1]}—and then calculated the standard deviation (SD) of expression for each remaining gene. The top 35% most variable genes (n=5,116) were selected for WGCNA to ensure the expression matrix includes genes that are dynamically regulated across the heterogeneous COAD samples, which is central to co-expression network discovery as input data. We calculated the optimal soft-thresholding power, followed by computing gene adjacency and generating a dissimilarity matrix to perform hierarchical clustering, which produced the gene dendrogram. Subsequently, all samples were regarded as individual traits and were correlated with the WGCNA-derived modules using Pearson correlation to identify the module that showed the strongest association. The overlap between DEGs and the key module genes was then visualized with the “ggvenn” package (v.0.1.10), yielding the final set of differentially expressed NRGs (DE-NRGs).
Enrichment analysis
To gain deeper insight into the biological roles and signaling pathways linked to the DE-NRGs, we carried out enrichment analyses using the “clusterProfiler” package (v4.12.6) (20), incorporating Gene Ontology (GO) terms and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways, with statistical significance defined as P<0.05.
Construction and assessment of a risk model
To improve feature selection robustness and reduce overfitting, we used a multi-stage analysis approach to construct the risk model. Univariate Cox regression using the “survival” package (v.3.7.0) (21) was performed on DE-NRGs. Genes that met the permissive criteria of hazard ratio (HR) ≠1 and P<0.10 were categorized as initial candidate genes—a strategic decision to mitigate false-negative exclusion of biologically relevant features. All candidate genes then underwent proportional hazards (PH) assumption testing via Schoenfeld residuals. Genes that satisfied the PH assumption were subsequently analyzed using the least absolute shrinkage and selection operator (LASSO) method implemented in the “glmnet” package (v.4.1.8) (22) to address high dimensionality and mitigate overfitting in the candidate gene set. To address LASSO’s inherent coefficient shrinkage bias, verify the independent prognostic value of each gene, and balance predictive accuracy with model complexity, the LASSO-selected gene subset was further refined using bidirectional stepwise multivariate Cox regression (survival package v.3.7.0). This iterative process involves alternately applying forward selection (adding the most significant predictors at α=0.05) and backward elimination (removing the least significant variables at α=0.10), continuing until the Akaike Information Criterion (AIC) is minimized. The final prognostic model was subsequently constructed, and the corresponding risk score was computed according to the following equation:
Here, Xi indicates the normalized expression level of the i-th prognostic gene, and coefi refers to its corresponding Cox regression coefficient. Using the median value of the calculated risk score, patients were stratified into high- and low-risk categories. The receiver operating characteristic (ROC) curve illustrates model performance by plotting sensitivity (true-positive rate) on the y-axis against 1-specificity (false-positive rate) on the x-axis across various decision thresholds used for binary classification. The area under the ROC curve (AUC), which approaches 1 with increasing diagnostic accuracy, was computed to evaluate the model’s discriminative ability in terms of sensitivity and specificity. The model’s predictive accuracy in 1-, 3-, and 5-year was further evaluated by plotting ROC curves using the “survival ROC” package (v.1.0.3) (23). Patients were further divided into distinct expression subgroups according to the optimal cutoff derived from each prognostic gene’s expression level combined with survival outcomes. Kaplan-Meier (K-M) survival curves were plotted using the “survminer” package (v.0.5.0) in R (24). To determine whether the risk signature provides clinically independent prognostic value, we also performed univariate and multivariate Cox regression analyses in conjunction with the prognostic model and other clinicopathological parameters. Finally, the validity of the risk model was tested using the GSE17536 and GSE39582 datasets, and the HRs of the developed prognostic signature were also compared with those of seven published signatures.
Clinical characteristic analysis
To investigate how various clinical features relate to the calculated risk scores, COAD patients were grouped according to T stage, M stage, tumor stage, and other relevant clinical factors. Differences in risk scores across these subgroups were evaluated using the Wilcoxon test, and the results were illustrated with the “ggstatsplot” package.
Circos plotting and single cell sequencing analysis
Chromosomal locations of the prognostic signature genes were mapped using the human reference genome build hg38 (25). Visualization was performed with the “circlize” package (v.0.4.16) (26). Single-cell sequencing is a next-generation high-throughput approach that enables comprehensive transcriptomic profiling at the resolution of individual cells. This technique allows us to examine how prognostic genes are expressed across diverse cellular populations. At the beginning of the analysis, we conducted meticulous screening of the sequencing data (GSE132257) to ensure data quality. Subsequently, the “Seurat” package (v.5.1.0) was used to analyze the filtered dataset further (27), and cell samples were reasonably clustered using principal component analysis (PCA) and t-distributed stochastic neighbor embedding (t-SNE) dimensionality reduction techniques. To further resolve cell types, currently known cell markers were used for manual cell type annotation. These cell markers are detailed in Table S2.
GSEA
To explore the biological functions and signaling pathways linked to genes differing across the various risk groups, we selected the background gene sets “c2.cp.kegg_legacy.v2024.1.Hs.symbols.gmt” and “c5.go.v2024.1.Hs.symbols.gmt” from MSigDB. Differential expression analyses were performed to identify genes showing significant variation among the risk categories, and their log2 FC values were computed and ranked in descending order. These ranked differential genes were subsequently analyzed using GSEA in the “clusterProfiler” package, applying a significance threshold of P<0.05.
Tumor immunoassays
The influence of the prognostic genes on the TIME in COAD was assessed by applying ssGSEA to estimate the infiltration levels of 23 immune cell populations within tumor tissues. Immune cell differences between the high- and low-risk groups were compared using the Wilcoxon test, and associations between immune cell variations and prognostic characteristics were further examined. Moreover, disparities in the expression of 14 immune checkpoint molecules and 13 immune-related functional signatures were evaluated across the two risk groups. To predict potential differences in immunotherapy responsiveness, tumor immune dysfunction and exclusion (TIDE) scores and immunophenoscores (IPS) were also analyzed. Lastly, mutation profiles of TCGA-COAD samples were explored using the “maftools” package (v.2.20.0) (28) to identify gene mutations characteristic of each risk group.
Cluster analysis and GSVA
To elucidate the associations and distinctions among different cancer subtypes, tumor samples from TCGA-COAD were clustered using the “ConsensusClusterPlus” (v.1.70.0) (29), with characterization based on cumulative distribution function (CDF) curves. K-M survival curves were generated to validate the survival differences among the defined subtypes and to conduct differential survival comparisons. PCA was then applied to assess how distinctly these subtypes were separated. To further elucidate the biological mechanisms underlying the observed survival discrepancies, enrichment analyses were performed for subtypes showing significant survival variation. Using c2.cp.kegg_legacy.v2024.1.Hs.symbols.gmt as the reference gene set, scores for each pathway were computed across samples to identify gene sets that differed markedly between the subgroups.
Drug sensitivity analysis and establishment of molecular network
To assess how patients in different risk categories respond to commonly used antineoplastic agents, the half-maximal inhibitory concentrations (IC50) of 198 drugs were estimated for TCGA-COAD tumor samples using the “oncoPredict” package (v.1.2) (30). The training set is derived from the Genomics of Drug Sensitivity in Cancer (GDSC) database (https://www.cancerrxgene.org/). Subsequently, variations in IC50 values among the different risk groups were examined to uncover distinct patterns of drug sensitivity. Additionally, the regulatory role of prognostic features in COAD was investigated by constructing the transcription factor-messenger RNA-microRNA (TF-mRNA-miRNA) regulatory network using the NetworkAnalyst platform. These interactions, obtained from the RegNetwork database, were visualized with Cytoscape, providing deeper insights into the regulatory mechanisms affecting COAD.
Expression analysis of prognostic characteristics
The differential expression of the prognostic features between TCGA-COAD tumor tissues and normal controls was additionally verified using quantitative real-time polymerase chain reaction (qRT-PCR). A CRC cell line (Caco-2, human CRC cells) and a control cell line (NCM460, human colon mucosal epithelial cells) were used to assess the expression levels of genes that comprise the nicotine metabolism-related signature. All cell lines were cultured in Dulbecco’s modified Eagle’s medium (DMEM, Procell, PM150210, Wuhan, China) containing 10% fetal bovine serum (FBS, Procell, 164210), 1% Penicillin and 1% streptomycin (Procell, PB180120) under standard conditions (5% CO2, 37 ℃). Total RNA was extracted from the cells using FastPure Cell/Tissue Total RNA Isolation Kit V2 (Vazyme, RC112-01, Nanjing, China), and reverse transcription was performed using the HiScript III RT SuperMix for qPCR (+gDNA wiper) (Vazyme, R323-01). qRT-PCR was performed using a ChamQ Universal SYBR qPCR Master Mix (Vazyme, Q711-02) on LightCycler® 480 Instrument II (Roche, Basel, Switzerland). The primer sequences are listed in Table S3. The relative expression was computed using the 2−ΔΔCt approach, and GAPDH was used as the internal control gene.
Statistical analysis
All statistical analyses were conducted using R (v.4.4.1). Unless specified otherwise, differences between groups were considered statistically meaningful when the P value or adjusted P value was less than 0.05.
Results
Results
Identification of DE-NRGs in the TCGA-COAD dataset
Differential expression analysis of the TCGA-COAD dataset identified 5,334 DEGs, including 2,492 downregulated and 2,842 upregulated genes. A volcano plot illustrated the overall DEG distribution, whereas a heatmap highlighted the top 20 most prominently up- and downregulated genes between normal controls and COAD tissues (Figure 1A,1B). Comparison of NRG scores between COAD samples and healthy controls using the Wilcoxon test demonstrated a significant difference (P<0.001; Figure 2A). To further pinpoint genes strongly associated with NRG patterns, a WGCNA network was constructed using the calculated NRG scores. Clustering of all TCGA-COAD samples was initially performed, with outliers excluded. The optimal soft thresholds (β) for constructing the scale-free network were determined by selecting those with an R2 value close to 0.85 and average connectivity near zero (Figure 2B). Accordingly, the optimal soft-thresholding power was identified as 4. With the minimum module size set to 100 genes and the mergeCutHeight parameter defined as 0.2 (Figure 2C), a total of 7 modules were generated (Figure 2D). Among these modules, the MEturquoise cluster exhibited the highest correlation (cor) with the NRG score (cor =0.52, P<0.001) (Figure 2E). This module was identified as the key module, containing 1,443 genes. Finally, 767 DE-NRGs were identified as candidate genes by overlapping 5,334 DEGs with 1,453 NRG-related genes (Figure 2F).
Functional enrichment of candidate genes
GO analysis indicated that the 767 candidate genes were primarily upregulated in the ECM organization, endoplasmic reticulum lumen process, and ECM structural constituent, and downregulated in the muscle system process, and G protein-coupled receptor binding (Figure 3A-3C). KEGG pathway analysis further revealed upregulation in protein digestion and absorption, as well as cytokine-cytokine receptor interaction, and downregulation in the cGMP-PKG signaling pathway and the PI3K-Akt signaling pathway (Figure 3D).
Construction of a risk model based on nine prognostic characteristics
To screen for prognostic biomarkers in COAD, we conducted a series of regression analyses on the 767 candidate genes. Univariate Cox regression first identified 124 genes associated with patient survival, all of which met the PH assumption (Table S4). Next, 27 prognostic markers were selected through 10-fold cross-validation using the optimal model parameter λ = 0.02705889 (Figure 4A, Table S5). Finally, nine prognostic features were confirmed through bidirectional stepwise multivariate Cox regression to develop a risk model (Figure 4B). The risk score was calculated as follows: RiskScore = FOXC1 × (0.195) + TRIP6 × (0.207) + NRCAM × (0.248) + TIMP1 × (0.426) + TSPAN11 × (−1.076) + STC2 × (0.187) + CST2 × (0.178) + SIX2 × (0.258) + GPRASP1 × (0.517). Patients from the TCGA-COAD dataset were then stratified into risk groups based on the median score. In addition, patients were stratified based on the expression levels of each prognostic gene together with their associated survival outcomes. K-M survival curves showed the direction of survival differences between high- and low-expression subgroups of prognostic features, which aligned with the signs of the corresponding coefficients in the risk model (Figure 4C).
Excellent predictive ability of the COAD risk model and correlation with clinical characteristics
Univariate and multivariate Cox regression analyses, which included the prognostic model and other clinicopathological parameters, verified that our model provides independent prognostic value (Figure S1). To further verify the robustness of the model, we performed survival analyses and ROC evaluations in the TCGA-COAD, GSE17536, and GSE39582 cohorts. The initial visualizations of survival distributions showed that risk scores gradually increased from left to right within each dataset (the left of Figure 5A-5C). The expression patterns of the prognostic genes across the different risk categories were also displayed for the three cohorts (the left of Figure 5A-5C). K-M survival curves revealed significant stratification, consistently indicating that individuals classified into the high-risk group exhibited substantially poorer survival outcomes (the middle of Figure 5A-5C). Moreover, consistent with Figure 5A-5C, the AUC values for predicting 1-, 3-, and 5-year survival exceeded 0.74 in TCGA-COAD, were greater than 0.66 in GSE17536, and surpassed 0.60 in GSE39582, demonstrating stable prognostic performance across cohorts (the right of Figure 5A-5C). Time-dependent ROC curves also proved that the predictive ability of the risk model is superior to a single clinicopathological factor (Figure S2A-S2C). It has been reported that several prognostic signatures have been developed to predict the prognosis of COAD patients. We collected seven published prognostic signatures for COAD and calculated the HR values. Interestingly, our model demonstrated a significant superiority compared to other published signatures (Figure S2D). All these results collectively underscore the model’s strong predictive capability for COAD. In the subsequent analysis examining associations between the risk score and clinical features, patients with T3/T4 disease, M1 status, or stage III/IV tumors exhibited significantly higher risk scores (Figure 5D).
Analysis of prognostic gene genomic localization and expression level in single-cell
The chromosomal distribution of the nine prognostic characteristics was depicted in the circos plot (Figure 6A). Single-cell RNA-sequencing data from CRC patients (GSE132257) were retrieved from the GEO database. We first examined the nFeature and nCount distributions for each sample and determined the proportion of mitochondrial gene expression per cell. Using these quality metrics, cells were filtered according to the criteria 200≤ nFeature_RNA ≤5,000, 200≤ nCount_RNA ≤50,000, and percent_MT <20%, ensuring high-quality data for downstream analyses (Figure S3). Next, we identified 2,000 highly variable genes showing substantial variation across cells, which were used for PCA and subsequent t-SNE dimensionality reduction. Clustering based on these features successfully partitioned the dataset into 29 distinct cellular clusters (Figure 6B). Subsequently, we systematically annotated tumor cell identities in each cluster using established cell marker expression profiles (Figure 6C). The analysis revealed that the cells could be clearly assigned to nine major cell types, comprising T cells, B cells, fibroblasts, plasma cells, epithelial cells, myeloid cells, mast cells, endothelial cells, and smooth muscle cells (Figure 6D). Finally, we analyzed the expression levels of 9 prognostic genes at the single-cell level (Figure 6E). By examining these single-cell expression profiles, a more detailed and accurate comprehension of the expression patterns of target genes across different cellular components (CCs) was achieved.
Prognostic characteristics are involved in complex functions and pathways
To clarify the biological pathways and functional processes linked to the prognostic genes, GSEA was conducted. The CC category of the GO enrichment analysis indicated that these genes were predominantly associated with cellular structures such as the basement membrane, endoplasmic reticulum lumen, collagen-containing ECM, immunoglobulin complex, chromosome centromeric core domain, and nucleosome. The biological process (BP) category of the GO analysis showed that these genes were enriched in biological pathways associated with the organization of external encapsulating structures, intermediate filament-based processes, collagen fibril organization, sensory perception of chemical stimuli, sensory perception of smell, and nucleosome organization. The molecular function (MF) component of the GO analysis suggested that these genes were involved in heparin binding, ECM structural constituent, ECM binding, taste receptor activity, olfactory receptor activity, and structural constituent of chromatin (Figure 7A-7C). In addition, KEGG pathway enrichment revealed that these genes were associated with the TGF-β signaling pathway (Figure 7D). Together, these findings indicate that these genes contribute substantially to COAD progression by participating in a wide range of biological functions and signaling pathways.
Certain correlations of risk score with TIME
The involvement of the prognostic genes in shaping the TIME of COAD was further explored. Clear disparities in immune cell infiltration were detected between the two risk categories, with 9 immune cell types displaying significantly elevated abundance in the high-risk group (Figure 8A). Moreover, these differentially enriched immune cells were positively associated with each other, and risk scores demonstrated prominent correlations with myeloid-derived suppressor cells (MDSCs), macrophages, natural killer T cells, natural killer cells, regulatory T cells, T follicular helper cells, and type 1 T helper cells (Figure 8B). Furthermore, 14 immune checkpoints displayed significant differences between risk groups (Figure 8C), with notable positive correlations between risk scores and CD276, TNFSF4, NRP1, and VTCN1 (Figure 8D). Notably, there is also a significant positive correlation between risk scores and three types of immune functions that show significant differences between the high- and low-risk groups (Figure 8E,8F). The high-risk group displayed significantly elevated TIDE and exclusion scores, whereas no meaningful difference was observed in dysfunction scores (Figure 8G). In addition, IPS confirmed that the immunotherapy response in the low-risk group is significantly better than in the high-risk group (Figure 8H).
Genetic mutation landscape of high- and low-risk groups
Mutation profiles across different risk groups were analyzed, showing that KRAS (50%) and PIK3CA (39%) had higher mutation frequency in the high-risk group, while BRAF (16%) was more frequent in the low-risk group. Across all groups, missense mutations represented the most frequently occurring mutation type (Figure 9A,9B). Among the prognostic genes, GPRASP1 exhibited the highest overall mutation frequency, and NRCAM matched that frequency in the high-risk group (Figure 9C,9D).
DEGs between subtypes are associated with multiple pathways
Cluster analysis was performed on TCGA-COAD tumor samples (Figure 10A), and the CDF curves identified two distinct molecular subtypes, designated Cluster A and Cluster B (Figure 10B,10C). The expression patterns of the prognostic genes within these subtypes are illustrated in Figure 10D. K-M survival analysis indicated a significant divergence in overall survival between the two clusters (P=0.007) (Figure 10E). Furthermore, PCA demonstrated clear separation between Clusters A and B (Figure 10F), supporting the suitability of these subtypes for subsequent analyses. The exploration of pathways linked to differential gene expression among the subtypes identified several significant findings. Cluster B, associated with a worse prognosis, exhibited DEGs when compared to Cluster A. These genes were significantly involved in neuroactive ligand-receptor interaction, PI3K-Akt, MAPK, cAMP, calcium, and Ras signaling pathways. Also, pathways such as human papillomavirus infection, cytokine-cytokine receptor interaction, and hormone signaling were notably enriched (Figure 10G).
Prognostic characteristics play a vital role in COAD
Drug sensitivity analysis across the risk groups identified significant response differences for 56 of the 198 evaluated agents (Figure 11A). Specifically, high-risk patients showed the greatest sensitivity to IGF1R and AZD1332, whereas low-risk individuals were more responsive to AZD5991 and ABT737 (Figure 11B). Moreover, to investigate regulatory mechanisms operating in COAD, a TF-mRNA-miRNA network was constructed to predict transcription factors and miRNAs associated with the prognostic genes. This analysis uncovered several complex regulatory axes, such as ATF1-FOXC1-has-miR-921 and MZF1-NRCAM-has-miR-517c (Figure 11C).
Validation of the expression of prognostic characteristics
All nine prognostic characteristics were prominently expressed in TCGA-COAD. Except for TSPAN11 and GPRASP1, which exhibited significantly higher expression levels in the normal group, the other seven genes had markedly higher levels in the tumor group (Figure 12A). Additionally, qRT-PCR analysis confirmed the same expression pattern of prognostic characteristics (Figure 12B).
Identification of DE-NRGs in the TCGA-COAD dataset
Differential expression analysis of the TCGA-COAD dataset identified 5,334 DEGs, including 2,492 downregulated and 2,842 upregulated genes. A volcano plot illustrated the overall DEG distribution, whereas a heatmap highlighted the top 20 most prominently up- and downregulated genes between normal controls and COAD tissues (Figure 1A,1B). Comparison of NRG scores between COAD samples and healthy controls using the Wilcoxon test demonstrated a significant difference (P<0.001; Figure 2A). To further pinpoint genes strongly associated with NRG patterns, a WGCNA network was constructed using the calculated NRG scores. Clustering of all TCGA-COAD samples was initially performed, with outliers excluded. The optimal soft thresholds (β) for constructing the scale-free network were determined by selecting those with an R2 value close to 0.85 and average connectivity near zero (Figure 2B). Accordingly, the optimal soft-thresholding power was identified as 4. With the minimum module size set to 100 genes and the mergeCutHeight parameter defined as 0.2 (Figure 2C), a total of 7 modules were generated (Figure 2D). Among these modules, the MEturquoise cluster exhibited the highest correlation (cor) with the NRG score (cor =0.52, P<0.001) (Figure 2E). This module was identified as the key module, containing 1,443 genes. Finally, 767 DE-NRGs were identified as candidate genes by overlapping 5,334 DEGs with 1,453 NRG-related genes (Figure 2F).
Functional enrichment of candidate genes
GO analysis indicated that the 767 candidate genes were primarily upregulated in the ECM organization, endoplasmic reticulum lumen process, and ECM structural constituent, and downregulated in the muscle system process, and G protein-coupled receptor binding (Figure 3A-3C). KEGG pathway analysis further revealed upregulation in protein digestion and absorption, as well as cytokine-cytokine receptor interaction, and downregulation in the cGMP-PKG signaling pathway and the PI3K-Akt signaling pathway (Figure 3D).
Construction of a risk model based on nine prognostic characteristics
To screen for prognostic biomarkers in COAD, we conducted a series of regression analyses on the 767 candidate genes. Univariate Cox regression first identified 124 genes associated with patient survival, all of which met the PH assumption (Table S4). Next, 27 prognostic markers were selected through 10-fold cross-validation using the optimal model parameter λ = 0.02705889 (Figure 4A, Table S5). Finally, nine prognostic features were confirmed through bidirectional stepwise multivariate Cox regression to develop a risk model (Figure 4B). The risk score was calculated as follows: RiskScore = FOXC1 × (0.195) + TRIP6 × (0.207) + NRCAM × (0.248) + TIMP1 × (0.426) + TSPAN11 × (−1.076) + STC2 × (0.187) + CST2 × (0.178) + SIX2 × (0.258) + GPRASP1 × (0.517). Patients from the TCGA-COAD dataset were then stratified into risk groups based on the median score. In addition, patients were stratified based on the expression levels of each prognostic gene together with their associated survival outcomes. K-M survival curves showed the direction of survival differences between high- and low-expression subgroups of prognostic features, which aligned with the signs of the corresponding coefficients in the risk model (Figure 4C).
Excellent predictive ability of the COAD risk model and correlation with clinical characteristics
Univariate and multivariate Cox regression analyses, which included the prognostic model and other clinicopathological parameters, verified that our model provides independent prognostic value (Figure S1). To further verify the robustness of the model, we performed survival analyses and ROC evaluations in the TCGA-COAD, GSE17536, and GSE39582 cohorts. The initial visualizations of survival distributions showed that risk scores gradually increased from left to right within each dataset (the left of Figure 5A-5C). The expression patterns of the prognostic genes across the different risk categories were also displayed for the three cohorts (the left of Figure 5A-5C). K-M survival curves revealed significant stratification, consistently indicating that individuals classified into the high-risk group exhibited substantially poorer survival outcomes (the middle of Figure 5A-5C). Moreover, consistent with Figure 5A-5C, the AUC values for predicting 1-, 3-, and 5-year survival exceeded 0.74 in TCGA-COAD, were greater than 0.66 in GSE17536, and surpassed 0.60 in GSE39582, demonstrating stable prognostic performance across cohorts (the right of Figure 5A-5C). Time-dependent ROC curves also proved that the predictive ability of the risk model is superior to a single clinicopathological factor (Figure S2A-S2C). It has been reported that several prognostic signatures have been developed to predict the prognosis of COAD patients. We collected seven published prognostic signatures for COAD and calculated the HR values. Interestingly, our model demonstrated a significant superiority compared to other published signatures (Figure S2D). All these results collectively underscore the model’s strong predictive capability for COAD. In the subsequent analysis examining associations between the risk score and clinical features, patients with T3/T4 disease, M1 status, or stage III/IV tumors exhibited significantly higher risk scores (Figure 5D).
Analysis of prognostic gene genomic localization and expression level in single-cell
The chromosomal distribution of the nine prognostic characteristics was depicted in the circos plot (Figure 6A). Single-cell RNA-sequencing data from CRC patients (GSE132257) were retrieved from the GEO database. We first examined the nFeature and nCount distributions for each sample and determined the proportion of mitochondrial gene expression per cell. Using these quality metrics, cells were filtered according to the criteria 200≤ nFeature_RNA ≤5,000, 200≤ nCount_RNA ≤50,000, and percent_MT <20%, ensuring high-quality data for downstream analyses (Figure S3). Next, we identified 2,000 highly variable genes showing substantial variation across cells, which were used for PCA and subsequent t-SNE dimensionality reduction. Clustering based on these features successfully partitioned the dataset into 29 distinct cellular clusters (Figure 6B). Subsequently, we systematically annotated tumor cell identities in each cluster using established cell marker expression profiles (Figure 6C). The analysis revealed that the cells could be clearly assigned to nine major cell types, comprising T cells, B cells, fibroblasts, plasma cells, epithelial cells, myeloid cells, mast cells, endothelial cells, and smooth muscle cells (Figure 6D). Finally, we analyzed the expression levels of 9 prognostic genes at the single-cell level (Figure 6E). By examining these single-cell expression profiles, a more detailed and accurate comprehension of the expression patterns of target genes across different cellular components (CCs) was achieved.
Prognostic characteristics are involved in complex functions and pathways
To clarify the biological pathways and functional processes linked to the prognostic genes, GSEA was conducted. The CC category of the GO enrichment analysis indicated that these genes were predominantly associated with cellular structures such as the basement membrane, endoplasmic reticulum lumen, collagen-containing ECM, immunoglobulin complex, chromosome centromeric core domain, and nucleosome. The biological process (BP) category of the GO analysis showed that these genes were enriched in biological pathways associated with the organization of external encapsulating structures, intermediate filament-based processes, collagen fibril organization, sensory perception of chemical stimuli, sensory perception of smell, and nucleosome organization. The molecular function (MF) component of the GO analysis suggested that these genes were involved in heparin binding, ECM structural constituent, ECM binding, taste receptor activity, olfactory receptor activity, and structural constituent of chromatin (Figure 7A-7C). In addition, KEGG pathway enrichment revealed that these genes were associated with the TGF-β signaling pathway (Figure 7D). Together, these findings indicate that these genes contribute substantially to COAD progression by participating in a wide range of biological functions and signaling pathways.
Certain correlations of risk score with TIME
The involvement of the prognostic genes in shaping the TIME of COAD was further explored. Clear disparities in immune cell infiltration were detected between the two risk categories, with 9 immune cell types displaying significantly elevated abundance in the high-risk group (Figure 8A). Moreover, these differentially enriched immune cells were positively associated with each other, and risk scores demonstrated prominent correlations with myeloid-derived suppressor cells (MDSCs), macrophages, natural killer T cells, natural killer cells, regulatory T cells, T follicular helper cells, and type 1 T helper cells (Figure 8B). Furthermore, 14 immune checkpoints displayed significant differences between risk groups (Figure 8C), with notable positive correlations between risk scores and CD276, TNFSF4, NRP1, and VTCN1 (Figure 8D). Notably, there is also a significant positive correlation between risk scores and three types of immune functions that show significant differences between the high- and low-risk groups (Figure 8E,8F). The high-risk group displayed significantly elevated TIDE and exclusion scores, whereas no meaningful difference was observed in dysfunction scores (Figure 8G). In addition, IPS confirmed that the immunotherapy response in the low-risk group is significantly better than in the high-risk group (Figure 8H).
Genetic mutation landscape of high- and low-risk groups
Mutation profiles across different risk groups were analyzed, showing that KRAS (50%) and PIK3CA (39%) had higher mutation frequency in the high-risk group, while BRAF (16%) was more frequent in the low-risk group. Across all groups, missense mutations represented the most frequently occurring mutation type (Figure 9A,9B). Among the prognostic genes, GPRASP1 exhibited the highest overall mutation frequency, and NRCAM matched that frequency in the high-risk group (Figure 9C,9D).
DEGs between subtypes are associated with multiple pathways
Cluster analysis was performed on TCGA-COAD tumor samples (Figure 10A), and the CDF curves identified two distinct molecular subtypes, designated Cluster A and Cluster B (Figure 10B,10C). The expression patterns of the prognostic genes within these subtypes are illustrated in Figure 10D. K-M survival analysis indicated a significant divergence in overall survival between the two clusters (P=0.007) (Figure 10E). Furthermore, PCA demonstrated clear separation between Clusters A and B (Figure 10F), supporting the suitability of these subtypes for subsequent analyses. The exploration of pathways linked to differential gene expression among the subtypes identified several significant findings. Cluster B, associated with a worse prognosis, exhibited DEGs when compared to Cluster A. These genes were significantly involved in neuroactive ligand-receptor interaction, PI3K-Akt, MAPK, cAMP, calcium, and Ras signaling pathways. Also, pathways such as human papillomavirus infection, cytokine-cytokine receptor interaction, and hormone signaling were notably enriched (Figure 10G).
Prognostic characteristics play a vital role in COAD
Drug sensitivity analysis across the risk groups identified significant response differences for 56 of the 198 evaluated agents (Figure 11A). Specifically, high-risk patients showed the greatest sensitivity to IGF1R and AZD1332, whereas low-risk individuals were more responsive to AZD5991 and ABT737 (Figure 11B). Moreover, to investigate regulatory mechanisms operating in COAD, a TF-mRNA-miRNA network was constructed to predict transcription factors and miRNAs associated with the prognostic genes. This analysis uncovered several complex regulatory axes, such as ATF1-FOXC1-has-miR-921 and MZF1-NRCAM-has-miR-517c (Figure 11C).
Validation of the expression of prognostic characteristics
All nine prognostic characteristics were prominently expressed in TCGA-COAD. Except for TSPAN11 and GPRASP1, which exhibited significantly higher expression levels in the normal group, the other seven genes had markedly higher levels in the tumor group (Figure 12A). Additionally, qRT-PCR analysis confirmed the same expression pattern of prognostic characteristics (Figure 12B).
Discussion
Discussion
The present study establishes a nine‑gene nicotine‑metabolism‑related risk signature (FOXC1, TRIP6, NRCAM, TIMP1, TSPAN11, STC2, CST2, SIX2, and GPRASP1) that stratifies COAD patients into high‑ and low‑risk subgroups. We show that the risk score is an independent predictor of overall survival in multiple cohorts and outperforms clinicopathological variables. High‑risk patients exhibit increased gene expression associated with nicotine metabolism and a suppressed immune microenvironment, whereas low‑risk patients display immune activation and better prognoses.
Our work supports the concept that cigarette smoking promotes colon cancer progression by modulating metabolic and immune pathways. Nicotine enhances colon cancer cell migration through the α7‑nicotinic acetylcholine receptor and fibronectin/COX2 signaling (31), and epidemiologic studies show that smoking increases left‑sided colon cancer risk by 39% in men and rightsided cancer by 20% in women (5). Genetic polymorphisms in CYP2A6 alter nicotine metabolism, influence smoking behavior, and modulate aspirin chemoprevention of colorectal neoplasia (14). These observations suggest that nicotine metabolism genes may serve as biomarkers. Our nine‑gene signature includes FOXC1 and TRIP6, which have emerging roles in tumor metabolism. FOXC1 upregulates serine synthesis enzymes (PHGDH, PSAT1, and PSPH), promoting CRC growth and 5‑fluorouracil resistance via the ERK1/2‑pELK1 axis (32). TRIP6 is overexpressed in CRC, correlates with advanced stage, poor survival, and glycolysis, and modulates immune infiltration by increasing immature dendritic cells and reducing Th1 cells (33). High expression of TIMP1 and MMP‑8 is linked to poor prognosis (34). Although NRCAM (NCAM180) loss promotes cell detachment and metastasis (35), NRCAM’s relatively high expression is also significantly associated with lymph node and distant metastasis in colon cancer (36). STC2 overexpression predicts poor overall survival and correlates with immune checkpoint genes (37), and SIX2 is implicated in epithelial-mesenchymal transition (EMT) and metastasis (38), whereas CST1 (cystatin SN) is upregulated in CRC and independently predicts poor outcomes (39). Our signature therefore integrates genes with established roles in metabolism, ECM remodeling, and cell adhesion, linking nicotine metabolism to multiple oncogenic pathways. Although the nine genes in our risk signature are not canonical nicotine-metabolizing enzymes, they were enriched in modules associated with nicotine metabolism scores and may represent downstream effectors regulated by nicotine exposure. FOXC1 and SIX2 are transcription factors that drive metabolic reprogramming and EMT (32), and their expression can be upregulated by nicotine-triggered α7-nAChR/COX-2 and TGF-β pathways (40). TRIP6 and TIMP1 are involved in cytoskeletal rearrangement and ECM remodeling, processes enhanced by tobacco carcinogen-induced signaling (33,41). NRCAM and TSPAN11 are adhesion molecules whose expression correlates with nicotine-induced EGF signaling and cell migration (40). STC2, CST2, and GPRASP1 regulate calcium homeostasis, protease inhibition, and G-protein-coupled receptor (GPCR) trafficking, respectively (42-44); evidence suggests nicotine exposure can modulate these proteins via endoplasmic reticulum stress or dopaminergic signaling (45). Therefore, our signature captures genes that are responsive to nicotine metabolism-related pathways rather than enzymes that metabolize nicotine directly.
Notably, our functional enrichment analysis revealed that DEGs in the high‑risk group were predominantly enriched in ECM organization and transforming growth factor beta (TGFβ) signaling. ECM remodeling drives colorectal carcinogenesis: type I collagen, MMP-2, and MMP-9 increase with stage, while type IV collagen and TIMP-3 decrease (9). High circulating MMP-8 and TIMP-1 levels predict poor outcomes (34). Our finding that TIMP1 is a risk gene aligns with these observations. The risk score also correlated with KRAS and PIK3CA mutations, as well as low BRAF mutation rates. A population‑based study found KRAS mutations in 41% and PIK3CA mutations in 13.2% of CRCs and identified these mutations as independent predictors of poor survival (46). Emerging targeted therapies, such as KRAS G12C inhibitors, show limited single‑agent efficacy [objective response rate (ORR) 19.2%] but improved responses when combined with EGFR inhibition [ORR 45%, progressionfree survival (PFS) 7.5 months] (47). Similarly, dual BRAF/EGFR inhibition yields modest benefit in BRAFV600E-mutant CRC (ORR 16%, median survival 6.3 months) (48). Our model identifies patients with high mutational burden who might benefit from such combinations.
Our immune infiltration analysis reveals a complex tumor microenvironment that differs markedly between risk groups. High-risk tumors exhibit significantly higher infiltration of MDSCs, macrophages, and regulatory T cells. Specifically, MDSCs suppress anti‑tumor immunity by depleting l‑arginine and producing nitric oxide, blocking T‑cell proliferation and promoting Treg development. They also secrete TGF‑β and vascular endothelial growth factor (VEGF) to induce epithelial‑tomesenchymal transition and angiogenesis (49). Similarly, tumorassociated macrophages often polarize toward an M2 phenotype and secrete interleukin-10 (IL-10) and VEGF, supporting angiogenesis and immunosuppression (50). Notably, the accumulation of M2 macrophages increases as tumors progress (51). The enrichment of immunosuppressive myeloid cells and regulatory T cells in the high‑risk group, therefore, points to a hostile microenvironment for effector lymphocytes. Complementing this, Immune‑checkpoint analysis further delineates an immunosuppressive milieu. High‑risk tumors upregulate neuropilin‑1 (NRP1), V‑set domain‑containing T‑cell activation inhibitor 1 (VTCN1/B7‑H4) and CD276 (B7-H3) while downregulating costimulatory receptors including CD160, CD244, CD27, CD28 and CD48. B7-H3 modulates macrophage plasticity by inducing receptor-mediated M2 polarization and driving M1-to-M2 transition (52), while NRP1 and B7H4 deliver inhibitory signals to T cells (53,54). In contrast, CD27 and CD28 provide essential co‑stimulation for Tcell activation (55,56). The positive correlation between the risk score and these immunoinhibitory checkpoints indicates that high‑risk tumors exploit multiple pathways of immune evasion. Supporting this, the TIDE algorithm shows that high‑risk tumors have higher TIDE and exclusion scores, suggesting greater immune evasion driven by Tcell exclusion. Consequently, this pattern suggests these tumors are less likely to benefit from immune checkpoint blockade (57). Consistently, predicted responses to programmed death 1 (PD-1) or CTLA‑4 blockade were lower in the high‑risk group. Furthermore, these immune signatures mirror broader metabolic reprogramming in CRC. Tumor cells switch to aerobic glycolysis, glutamine and lipid metabolism, producing lactate that acidifies the microenvironment and suppresses Tcell proliferation while promoting M2 macrophage infiltration (58,59). Consistent with our observation that metabolic reprogramming shapes the tumor immune milieu, Chen et al. reported that a lipid-metabolism-related gene signature reveals dynamic shifts in immune infiltration across the colorectal adenoma-carcinoma sequence and noted that high-fat diets—a driver of lipid metabolic dysregulation—are major environmental risk factors for CRC and are associated with nearly 80% of cases (60). Thus, nicotine‑metabolism dysregulation captured by our signature may interact with metabolic and inflammatory pathways to shape the tumor immune landscape. In summary, the high-risk group identified by our nine-gene signature is characterized by a “hot but suppressed” tumor microenvironment enriched with MDSCs, M2‑like macrophages, and regulatory T cells, coupled with the upregulation of inhibitory checkpoints. These findings have therapeutic implications: targeting MDSCs or reprogramming tumor-associated macrophages (TAMs) toward an M1 phenotype, combined with metabolic modulators, may synergize with immune checkpoint blockade. Our risk signature may therefore help stratify patients for combined metabolic and immunotherapeutic strategies.
Despite the strengths of combining nicotine metabolism with immune and mutational landscapes, there are limitations in this study. It primarily used bioinformatics approaches to investigate the potential molecular mechanisms of nicotine-metabolism-related genes in COAD. However, it is important to recognize the constraints imposed by the limited sample size and the methodological restrictions inherent to computational analyses. Moreover, the mechanistic exploration was based solely on gene-expression data, which may not fully capture actual BPs. Consequently, additional validation through animal experiments and clinical studies is required to substantiate these findings under diverse experimental conditions. Furthermore, the clinical applicability of CRC-specific tumor markers requires confirmation with larger sample sizes, and the application of therapies for stratified patients requires additional clinical validation. Follow-up experiments are planned to strengthen the theoretical support for these findings.
For future research, we should investigate mechanistic connections between nicotine metabolism and metabolic reprogramming. FOXC1 upregulation of serine synthesis suggests that dietary serine restriction combined with FOXC1 or ERK inhibitors may improve chemotherapy (32). TRIP6’s role in glycolysis and angiogenesis provides a rationale for targeting TRIP6 to modulate metabolic pathways and immune infiltration (33). STC2 and CST1 are associated with immune checkpoints and mismatch repair deficiency (37,39); they could serve as biomarkers for immunotherapy. Clinical trials exploring combination therapies, such as KRAS G12C inhibitors with EGFR or BRAF inhibitors (47,48), and immunotherapy plus metabolic modulators can stratify patients based on our risk signature. Moreover, experimental studies should delineate how nicotine metabolism influences ECM remodeling and TGF‑β signaling. As TAMs switch from M1 to M2 phenotypes, interventions that repolarize macrophages or block IL‑10/VEGF signaling may benefit highrisk patients (50). Integrating multi‑omics data (genomics, transcriptomics, metabolomics, and proteomics) will refine risk models and reveal therapeutic vulnerabilities.
The present study establishes a nine‑gene nicotine‑metabolism‑related risk signature (FOXC1, TRIP6, NRCAM, TIMP1, TSPAN11, STC2, CST2, SIX2, and GPRASP1) that stratifies COAD patients into high‑ and low‑risk subgroups. We show that the risk score is an independent predictor of overall survival in multiple cohorts and outperforms clinicopathological variables. High‑risk patients exhibit increased gene expression associated with nicotine metabolism and a suppressed immune microenvironment, whereas low‑risk patients display immune activation and better prognoses.
Our work supports the concept that cigarette smoking promotes colon cancer progression by modulating metabolic and immune pathways. Nicotine enhances colon cancer cell migration through the α7‑nicotinic acetylcholine receptor and fibronectin/COX2 signaling (31), and epidemiologic studies show that smoking increases left‑sided colon cancer risk by 39% in men and rightsided cancer by 20% in women (5). Genetic polymorphisms in CYP2A6 alter nicotine metabolism, influence smoking behavior, and modulate aspirin chemoprevention of colorectal neoplasia (14). These observations suggest that nicotine metabolism genes may serve as biomarkers. Our nine‑gene signature includes FOXC1 and TRIP6, which have emerging roles in tumor metabolism. FOXC1 upregulates serine synthesis enzymes (PHGDH, PSAT1, and PSPH), promoting CRC growth and 5‑fluorouracil resistance via the ERK1/2‑pELK1 axis (32). TRIP6 is overexpressed in CRC, correlates with advanced stage, poor survival, and glycolysis, and modulates immune infiltration by increasing immature dendritic cells and reducing Th1 cells (33). High expression of TIMP1 and MMP‑8 is linked to poor prognosis (34). Although NRCAM (NCAM180) loss promotes cell detachment and metastasis (35), NRCAM’s relatively high expression is also significantly associated with lymph node and distant metastasis in colon cancer (36). STC2 overexpression predicts poor overall survival and correlates with immune checkpoint genes (37), and SIX2 is implicated in epithelial-mesenchymal transition (EMT) and metastasis (38), whereas CST1 (cystatin SN) is upregulated in CRC and independently predicts poor outcomes (39). Our signature therefore integrates genes with established roles in metabolism, ECM remodeling, and cell adhesion, linking nicotine metabolism to multiple oncogenic pathways. Although the nine genes in our risk signature are not canonical nicotine-metabolizing enzymes, they were enriched in modules associated with nicotine metabolism scores and may represent downstream effectors regulated by nicotine exposure. FOXC1 and SIX2 are transcription factors that drive metabolic reprogramming and EMT (32), and their expression can be upregulated by nicotine-triggered α7-nAChR/COX-2 and TGF-β pathways (40). TRIP6 and TIMP1 are involved in cytoskeletal rearrangement and ECM remodeling, processes enhanced by tobacco carcinogen-induced signaling (33,41). NRCAM and TSPAN11 are adhesion molecules whose expression correlates with nicotine-induced EGF signaling and cell migration (40). STC2, CST2, and GPRASP1 regulate calcium homeostasis, protease inhibition, and G-protein-coupled receptor (GPCR) trafficking, respectively (42-44); evidence suggests nicotine exposure can modulate these proteins via endoplasmic reticulum stress or dopaminergic signaling (45). Therefore, our signature captures genes that are responsive to nicotine metabolism-related pathways rather than enzymes that metabolize nicotine directly.
Notably, our functional enrichment analysis revealed that DEGs in the high‑risk group were predominantly enriched in ECM organization and transforming growth factor beta (TGFβ) signaling. ECM remodeling drives colorectal carcinogenesis: type I collagen, MMP-2, and MMP-9 increase with stage, while type IV collagen and TIMP-3 decrease (9). High circulating MMP-8 and TIMP-1 levels predict poor outcomes (34). Our finding that TIMP1 is a risk gene aligns with these observations. The risk score also correlated with KRAS and PIK3CA mutations, as well as low BRAF mutation rates. A population‑based study found KRAS mutations in 41% and PIK3CA mutations in 13.2% of CRCs and identified these mutations as independent predictors of poor survival (46). Emerging targeted therapies, such as KRAS G12C inhibitors, show limited single‑agent efficacy [objective response rate (ORR) 19.2%] but improved responses when combined with EGFR inhibition [ORR 45%, progressionfree survival (PFS) 7.5 months] (47). Similarly, dual BRAF/EGFR inhibition yields modest benefit in BRAFV600E-mutant CRC (ORR 16%, median survival 6.3 months) (48). Our model identifies patients with high mutational burden who might benefit from such combinations.
Our immune infiltration analysis reveals a complex tumor microenvironment that differs markedly between risk groups. High-risk tumors exhibit significantly higher infiltration of MDSCs, macrophages, and regulatory T cells. Specifically, MDSCs suppress anti‑tumor immunity by depleting l‑arginine and producing nitric oxide, blocking T‑cell proliferation and promoting Treg development. They also secrete TGF‑β and vascular endothelial growth factor (VEGF) to induce epithelial‑tomesenchymal transition and angiogenesis (49). Similarly, tumorassociated macrophages often polarize toward an M2 phenotype and secrete interleukin-10 (IL-10) and VEGF, supporting angiogenesis and immunosuppression (50). Notably, the accumulation of M2 macrophages increases as tumors progress (51). The enrichment of immunosuppressive myeloid cells and regulatory T cells in the high‑risk group, therefore, points to a hostile microenvironment for effector lymphocytes. Complementing this, Immune‑checkpoint analysis further delineates an immunosuppressive milieu. High‑risk tumors upregulate neuropilin‑1 (NRP1), V‑set domain‑containing T‑cell activation inhibitor 1 (VTCN1/B7‑H4) and CD276 (B7-H3) while downregulating costimulatory receptors including CD160, CD244, CD27, CD28 and CD48. B7-H3 modulates macrophage plasticity by inducing receptor-mediated M2 polarization and driving M1-to-M2 transition (52), while NRP1 and B7H4 deliver inhibitory signals to T cells (53,54). In contrast, CD27 and CD28 provide essential co‑stimulation for Tcell activation (55,56). The positive correlation between the risk score and these immunoinhibitory checkpoints indicates that high‑risk tumors exploit multiple pathways of immune evasion. Supporting this, the TIDE algorithm shows that high‑risk tumors have higher TIDE and exclusion scores, suggesting greater immune evasion driven by Tcell exclusion. Consequently, this pattern suggests these tumors are less likely to benefit from immune checkpoint blockade (57). Consistently, predicted responses to programmed death 1 (PD-1) or CTLA‑4 blockade were lower in the high‑risk group. Furthermore, these immune signatures mirror broader metabolic reprogramming in CRC. Tumor cells switch to aerobic glycolysis, glutamine and lipid metabolism, producing lactate that acidifies the microenvironment and suppresses Tcell proliferation while promoting M2 macrophage infiltration (58,59). Consistent with our observation that metabolic reprogramming shapes the tumor immune milieu, Chen et al. reported that a lipid-metabolism-related gene signature reveals dynamic shifts in immune infiltration across the colorectal adenoma-carcinoma sequence and noted that high-fat diets—a driver of lipid metabolic dysregulation—are major environmental risk factors for CRC and are associated with nearly 80% of cases (60). Thus, nicotine‑metabolism dysregulation captured by our signature may interact with metabolic and inflammatory pathways to shape the tumor immune landscape. In summary, the high-risk group identified by our nine-gene signature is characterized by a “hot but suppressed” tumor microenvironment enriched with MDSCs, M2‑like macrophages, and regulatory T cells, coupled with the upregulation of inhibitory checkpoints. These findings have therapeutic implications: targeting MDSCs or reprogramming tumor-associated macrophages (TAMs) toward an M1 phenotype, combined with metabolic modulators, may synergize with immune checkpoint blockade. Our risk signature may therefore help stratify patients for combined metabolic and immunotherapeutic strategies.
Despite the strengths of combining nicotine metabolism with immune and mutational landscapes, there are limitations in this study. It primarily used bioinformatics approaches to investigate the potential molecular mechanisms of nicotine-metabolism-related genes in COAD. However, it is important to recognize the constraints imposed by the limited sample size and the methodological restrictions inherent to computational analyses. Moreover, the mechanistic exploration was based solely on gene-expression data, which may not fully capture actual BPs. Consequently, additional validation through animal experiments and clinical studies is required to substantiate these findings under diverse experimental conditions. Furthermore, the clinical applicability of CRC-specific tumor markers requires confirmation with larger sample sizes, and the application of therapies for stratified patients requires additional clinical validation. Follow-up experiments are planned to strengthen the theoretical support for these findings.
For future research, we should investigate mechanistic connections between nicotine metabolism and metabolic reprogramming. FOXC1 upregulation of serine synthesis suggests that dietary serine restriction combined with FOXC1 or ERK inhibitors may improve chemotherapy (32). TRIP6’s role in glycolysis and angiogenesis provides a rationale for targeting TRIP6 to modulate metabolic pathways and immune infiltration (33). STC2 and CST1 are associated with immune checkpoints and mismatch repair deficiency (37,39); they could serve as biomarkers for immunotherapy. Clinical trials exploring combination therapies, such as KRAS G12C inhibitors with EGFR or BRAF inhibitors (47,48), and immunotherapy plus metabolic modulators can stratify patients based on our risk signature. Moreover, experimental studies should delineate how nicotine metabolism influences ECM remodeling and TGF‑β signaling. As TAMs switch from M1 to M2 phenotypes, interventions that repolarize macrophages or block IL‑10/VEGF signaling may benefit highrisk patients (50). Integrating multi‑omics data (genomics, transcriptomics, metabolomics, and proteomics) will refine risk models and reveal therapeutic vulnerabilities.
Conclusions
Conclusions
In summary, this study introduces a comprehensive risk signature that links nicotine metabolism to metabolic reprogramming, immune evasion, and mutational landscapes in COAD. The nine‑gene signature provides robust prognostic value, clarifies the influence of smoking on tumor biology, and highlights key genes and pathways that warrant further investigation as potential biomarkers or drivers of therapy selection. Future prospective validation and mechanistic studies will help translate these findings into personalized treatment strategies.
In summary, this study introduces a comprehensive risk signature that links nicotine metabolism to metabolic reprogramming, immune evasion, and mutational landscapes in COAD. The nine‑gene signature provides robust prognostic value, clarifies the influence of smoking on tumor biology, and highlights key genes and pathways that warrant further investigation as potential biomarkers or drivers of therapy selection. Future prospective validation and mechanistic studies will help translate these findings into personalized treatment strategies.
Supplementary
Supplementary
The article’s supplementary files as
The article’s supplementary files as
출처: PubMed Central (JATS). 라이선스는 원 publisher 정책을 따릅니다 — 인용 시 원문을 표기해 주세요.