본문으로 건너뛰기
← 뒤로

Identification of an E2Fs-based gene signature for predicting prognosis and therapeutic response in colorectal cancer.

1/5 보강
Discover oncology 📖 저널 OA 99.2% 2022: 2/2 OA 2023: 3/3 OA 2024: 36/36 OA 2025: 546/546 OA 2026: 337/344 OA 2022~2026 2025 Vol.16(1) p. 1893
Retraction 확인
출처

Zhang F, Sun Z, Zhang Z, Jiang K, Wei B, Yu X

📝 환자 설명용 한 줄

E2F family genes are common transcription factors, abnormal in several malignant tumors.

이 논문을 인용하기

↓ .bib ↓ .ris
APA Zhang F, Sun Z, et al. (2025). Identification of an E2Fs-based gene signature for predicting prognosis and therapeutic response in colorectal cancer.. Discover oncology, 16(1), 1893. https://doi.org/10.1007/s12672-025-02615-y
MLA Zhang F, et al.. "Identification of an E2Fs-based gene signature for predicting prognosis and therapeutic response in colorectal cancer.." Discover oncology, vol. 16, no. 1, 2025, pp. 1893.
PMID 41091262 ↗

Abstract

E2F family genes are common transcription factors, abnormal in several malignant tumors. However, their complex involvement in colorectal cancer, particularly in prognosis, immune infiltration, and mutational landscape, remains unclear. We conducted a study using gene expression data from the TCGA and GEO datasets to examine the abnormal expression of E2Fs in colorectal cancer. And we performed consensus clustering and differential gene expression analyses to identify E2Fs-related genes. Then, we used Lasso regression and multivariate Cox regression to create a prognostic risk model for colorectal cancer. We analyzed the differences between the E2Fs-based gene risk and various clinical characteristics, gene mutations, immune cell infiltration, immunotherapy responses, and drug sensitivity using clinicopathological data, single-cell RNA sequences, multiple immune algorithms. Finally, we have developed a prognostic risk model that includes FMO5, NDUFA11, LIPG, FIGNL1, MOGAT2, and GZMB. We observed significant differences in clinical characteristics, immune cell infiltration, gene mutation landscapes, immunotherapy responses, and drug sensitivity between the high-risk and low-risk groups. The novel E2Fs-based gene risk model shows significant potential for contributing to the evaluation of prognosis and predicting immunotherapeutic outcomes for colorectal cancer patients.

🏷️ 키워드 / MeSH 📖 같은 키워드 OA만

같은 제1저자의 인용 많은 논문 (5)

📖 전문 본문 읽기 PMC JATS · ~53 KB · 영문

Introduction

Introduction
Although extensive research on cancer has been conducted both domestically and internationally, it is evident that cancer continues to pose a significant threat to human health. Recent statistical studies have revealed that nearly 9.7 million cancer-related deaths occurred in 2022, with colorectal cancer (CRC) being identified as the third most commonly diagnosed malignancy and the second leading cause of cancer-related mortality [1]. The American Cancer Society has reported CRC as the second most common cause of death in the United States [2]. Despite an overall decrease in CRC incidence, notable trends include a shift towards diagnoses occurring at a younger age and in more advanced stages [3]. While advancements in screening modalities such as colonoscopy and computed tomography have improved the efficacy of screening and prevention efforts, there exists an urgent need to identify more suitable biomarkers and innovative approaches for precision cancer screening to mitigate the incidence and mortality of CRC [4]. Genetic factors, in addition to environmental risk factors, play a pivotal role in an individual’s predisposition to colorectal cancer [5, 6]. Therefore, the development of effective targeted therapy based on the distinct gene mutation landscape of CRC patients is considered an ideal therapeutic approach. In recent years, immunotherapy has demonstrated initial success in treating certain tumors, with immune checkpoint inhibitors emerging as a novel treatment for various solid tumors [7–9]. Moreover, novel approaches focused on targeting the immune microenvironment hold promise for a broader spectrum of cancer patients [10]. Accordingly, the pursuit of advanced prediction tools and novel therapeutic approaches remains paramount in enhancing the diagnosis, treatment, and prognosis of CRC [11].
The E2F family, comprising eight members designated as E2F1-E2F8, is a pivotal group of transcription factors within the human genome that play a crucial role in governing cell proliferation and cell cycle progression [12, 13]. These factors exhibit varied functions in transcriptional regulation: E2F1 and E2F2 predominantly function as transcriptional activators, while E2F4 and E2F5 act as transcriptional repressors, and E2F6 and E2F8 serve as transcriptional inhibitors [14]. Notably, E2F1 is implicated in the development of colorectal cancer through the E2F1/RB/HDAC1 axis [15]. Studies by Wang et al. have demonstrated that E2F1 can heighten the migration, survival, and invasion capabilities of non-small cell lung cancer by activating NELL2 gene transcription, thereby influencing tumor progression and metastasis [16]. Additionally, E2F2 has been shown to activate the long non-coding RNA DLEU2, thereby promoting the progression of prostate cancer [17]. Furthermore, reports indicate that E2F1 promotes the proliferation and metastasis of clear cell renal cell carcinoma through the activation of SREBP1-dependent fatty acid biosynthesis [18]. Xiang et al. reported that E2F1, in conjunction with its DNA-binding partner DP1, binds to the promoter region of KPNA2, inducing KPNA2 expression to promote tumor development, while E2F7 competes against DP1 and obstructs E2F1-induced KPNA2 gene activation [19]. Despite evidence of the involvement of E2Fs in tumor progression, research on constructing prognostic signatures based on E2Fs for cancers remains scarce. Cancer is now acknowledged as a multifactorial and intricate dynamic process involving continuous and dynamic interactions between cancer cells and the tumor microenvironment [20]. The acquisition and maintenance of cancer hallmarks, such as the maintenance of proliferative signals, activation of invasion and metastasis, and immune escape, are contingent on the contributions of the tumor microenvironment. Consequently, extensive research and exploration of the tumor microenvironment, taking into account its high heterogeneity, has been undertaken. Notably, immunotherapy, including immune checkpoint therapy, has shown varying effects within the same tumor. Therefore, the study of diverse molecular subtypes of colorectal cancer and the exploration of potential, reliable biomarkers to guide targeted therapy are imperative to achieve clinical efficacy [21].
In our study, it was observed that E2Fs exhibited notably elevated expression levels in CRC tissues when compared to normal tissues in the TCGA cohort. Subsequently, CRC patients were categorized according to the mRNA expression of E2Fs sourced from public databases, and a prognostic risk model rooted in E2Fs, encompassing six genes, was formulated utilizing a spectrum of statistical algorithms. Following this, an evaluation of overall survival prediction, tumor immune cell infiltration, gene mutation landscapes, immunotherapeutic response, and drug sensitivity within distinct risk groups was conducted. The primary aim was to establish an E2Fs-based prognostic signature, with a view to furnishing robust underpinnings for appraising survival outcomes and identifying personalized potential therapeutic targets for CRC patients.

Material and methods

Material and methods

Data acquisition
In this study, gene expression profile, copy number variation (CNV), single-nucleotide variant (SNV) and clinicopathological data were obtained from the COADREAD cohort in The Cancer Genome Atlas (TCGA) database (https://portal.gdc.cancer.gov/), which contained 650 tumor tissue samples and 51 normal colorectal tissue samples. The gene expression profiles were collated and transformed into TPM format for subsequent analysis. And RNA-seq data and corresponding clinical data were obtained from the GSE39582 dataset (n = 585), GSE38832 dataset (n = 122) and GSE161158 dataset (n = 250) in the Gene Expression Omnibus (GEO) database (https://www.ncbi.nlm.nih.gov/geo).

Differential expression analysis and correlation analysis of E2F family genes
The R package “ggpubr” was used to perform the difference analysis for E2 F family genes in tumor and normal samples. The R package “corrplot” was used to perform the difference analysis with Pearson correlation analysis. The STRING database (https://cn.string-db.org) was used to analyze protein–protein interactions (PPI), and Cytoscape (version 3.10.2) was used to visualize PPI networks.

Consensus clustering analysis
We identified the GSE39582 dataset as the data source and classified patients into distinct E2Fs -related clusters based on the expression levels of E2F family genes with k-means algorithms by using R package “ConsensusClusterPlus’’. The cluster analysis followed the following principles, including smooth and stable cumulative distribution function (CDF) curves, adequate sample sizes in each group and incresed correlations within groups and descended correlations between groups. The R package"scatterplot3 d"were used to visualize principal component analysis (PCA). Subsequently, we analyzed differentially expressed genes (DEGs) in distinct E2Fs-related clusters by “limma” R package with criteria of |log2fold change|≥ 1.5 and p-value < 0.05.

Functional enrichment analysis
The Gene Ontology (GO) analysis and the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis were performed to explore the functional categories of the E2Fs-based DEGs by the “GOplot”, “enrichplot”, and “ggplot2” R packages (p < 0.05).

Construction of the E2Fs-related prognostic model
We obtained DEGs in distinct E2Fs-related clusters from the GSE39582 dataset, and then identified the overall survival-associated DEGs by univariate cox regression analysis. Subsequently, the least absolute shrinkage and selection operator (Lasso) regression was applied to further refine the gene selection and identify the hub genes which were strongly associated with prognosis. The best log lambda value was calculated for model fitting through tenfold cross-validation. The final hub genes were identified as potential prognostic signatures by using multivariate cox regression, the risk score was calculated using the following the formula: Risk score = expression of gene 1 × coefficient of gene 1 + expression of gene 2 × coefficient of gene 2 + expression of gene 3 × coefficient of gene 3 + … + expression of gene n × coefficient of gene n.

Evaluation and validation of the E2Fs‑related prognostic model
After the prognostic risk score model was established, colorectal cancer patients from the GSE39582 dataset were divided into the low-risk and high-risk groups by the median risk score. The Kaplan–Meier (KM) survival curves analysis was performed to compare the overall survival (OS) time between the two risk groups with log-rank tests by the R package “survival”. And the Receiver Operation Characteristic (ROC) curve was plotted to evaluate the sensitivity and specificity of the risk score by the R package “timeROC”. Univariate and multivariate cox regression analysis were used to evaluate the independence of the risk scores, and the results were shown in forest plots. According to the clinical characteristic parameters, the correlation analysis between the risk score and clinical characteristics were performed, and the nomogram was constructed to evaluate the consistency between predicted and actual survival rates by 1-year, 3-year and 5-year calibration maps. The COADREAD cohort from TCGA database and the cohort containing GSE38832 dataset and GSE161158 dataset from GEO database were utilized to validate the prognostic risk model by KM survival curves and ROC curves.

Gene mutation, DNA methylation, CNV and tumor mutation burden (TMB) in colorectal cancer
The frequency and the distribution of gene mutations in low-risk and high-risk groups were performed and visualized with the waterfall plots by the R package “maftools”. Then we analyzed the CNV frequency of six identified prognosis-associated genes including FMO5, NDUFA11, LIPG, FIGNL1, MOGAT2, GZMB in colorectal cancer. The DNA methylation data was available by the Shiny Methylation Analysis Resource Tool (SMART) database (https://www.bioinfo-zs.com/smartapp/) which was an interactive web server for analyzing DNA methylation of TCGA project. We calculated the TMB value for each sample according to gene mutation data to compare the difference between low-risk and high-risk groups.

Immune cell infiltration analysis and immunotherapy response prediction
CIBERSORT algorithm analysis was utilized to evaluate the abundance of 22 types of immune cells infiltration among the samples in the TCGA-COADREAD dataset. And we calculated the tumor microenvironment (TME) score including stromal score, immune score and estimate score by using ESTIMATE algorithm in the GSE38832 dataset. Meanwhile, single-sample gene set enrichment analysis (ssGSEA) algorithm was applied to calculate the enrichment scores of the 23 immune cells and 13 immune pathways for each tumor sample in the GSE38832 dataset. Different algorithms were used to compare the distribution of immune cell types and enrichment fraction of each immune cell between high-risk and low-risk groups. In addition, we utilized the Tumor Immune Dysfunction and Exclusion (TIDE) algorithm to assess potential response of patients to immunotherapy in between the high-risk and low-risk groups. Higher TIDE scores tended to indicate poorer immunotherapy response. Gene expression profiles were normalized We inputted normalized gene expression profiles and obtained TIDE values for each sample from TIDE online database (http://tide.dfci.harvard.edu/). Then different risk groups were compared to analyze the correlation between risk scores and prediction of immunotherapy response.

Single cell RNA sequence (ScRNA-seq) analysis of prognostic risk genes
Tumor Immune Single-cell Hub (TISCH, https://www.tisch1.comp-genomics.org/) was a scRNA-seq database focusing on TME. TISCH provided detailed cell-type annotation at the single-cell level, enabling the exploration of TME across different cancer types. We searched scRNA-seq datasets for colorectal cancer from TISCH and obtained the distribution of the six prognostic genes in the TME from the GSE108989 dataset.

Drug sensitivity analysis
The cancer gene expression data of different drugs sensitivity were downloaded from the Genomics of Drug Sensitivity in Cancer (GDSC, https://www.cancerrxgene.org/). The value of half maximal inhibitory concentration (IC50) was used to assess the chemotherapeutic response. We calculated the TCGA-COADREAD cohort’s medication susceptibility by the R package “oncoPredict” and the GSE39582 dataset cohort’s medication susceptibility by the R package “pRRophetic”. Then we examined the variations in the efficacy of chemotherapeutic medicines between the two risk groups of patients.

Statistical analysis
All statistical analyses and visualizations were conducted using R software (version 4.1.3). Correlations were assessed utilizing either Pearson’s or Spearman’s correlation analysis. Disparities between two groups were appraised employing the independent sample t-test or Wilcoxon rank test, while distinctions among three or more groups were evaluated through one-way ANOVA or Kruskal–Wallis test. Kaplan–Meier survival analysis was executed to compare overall survival between high-risk and low-risk groups, with the Log-rank test employed to evaluate variances in survival curves between the two groups. A p-value of less than 0.05 was deemed to indicate statistical significance.

Results

Results

The expression profile and correlation of E2Fs in colorectal cancer
The E2F family comprises 8 genes (E2F1-E2F8) closely involved in cell proliferation, apoptosis, and differentiation, all of which are fundamental in cancer onset and progression. Though abnormal E2F expression has been observed in various tumors, the specific association with colorectal cancer remains uncertain. In this study, we acquired gene expression profiles from the TCGA-COADREAD cohort and evaluated E2F expression in normal and tumor samples. Our analysis indicated significantly elevated E2F expression in colorectal cancer tissues compared to normal tissues (Fig. 1A). Despite the disparate genomic locations of E2F genes (Fig. 1B), substantial interactions were evident. Protein–protein interaction (PPI) analysis based on the STRING database illustrated known interactions from curated databases and experimentally determined and predicted interactions for E2Fs (Fig. 1C). Furthermore, correlation analysis revealed predominantly positive correlations among E2F expressions in colorectal cancer (Fig. 1D). Additionally, we observed numerous mutations, especially in colorectal cancer (Figure S1). The discernible correlations, mutation landscapes, and abnormal E2F expression profiles collectively suggested a potential contribution of all E2Fs to the initiation and progression of colorectal cancer.

Identification of E2Fs related clusters and construction of prognostic risk model
Subtyping based on gene expression is widely acknowledged as a method of disease classification. Integrating multi-omics data, including mutation, copy number variation, methylation, and proteomics, is beneficial for categorizing tumors and informing clinical treatment decisions [21, 22]. To investigate the profiles and characteristics of E2Fs in colorectal cancer, a consensus clustering analysis was conducted using the R package “ConsensusClusterPlus” to categorize colorectal cancer samples according to the expression of 8 E2F family genes in a specific dataset. The analysis involved calculating the consistency coefficient to identify the optimal clustering number (k value), and consensus heatmap matrices for k = 2–9 were presented in Fig. 2A–H. Notably, it was observed that k = 2 was the preferred selection for classifying the samples into distinct clusters, as illustrated in Fig. 2I and J. Furthermore, principal component analysis revealed that colorectal cancer patients were effectively distributed into two clusters using the E2Fs-based classification methods, as demonstrated in Fig. 2K and L.
The analysis of differentially expressed genes (DEGs) in two E2Fs-based clusters resulted in the identification of 516 DEGs, comprising 344 up-regulated genes and 172 down-regulated genes as visualized in the volcano plot (Fig. 3A). Subsequently, enrichment analysis of the DEGs was executed using Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis to elucidate the molecular characteristics of the E2 Fs-based clusters (Figure S2). The GO analysis underscored the involvement of DEGs in biological processes such as chromosome segregation, nuclear division, and organelle fission. Notably, the enriched cellular components included the chromosomal region, condensed chromosome, and chromosome centromeric region, while the molecular functions exhibited enrichment in DNA helicase activity, ATP-dependent activity acting on DNA, and ATP hydrolysis activity. Additionally, the KEGG analysis revealed enrichment in pathways associated with the cell cycle, oocyte meiosis, and DNA replication.
The dataset GSE39582 was subjected to univariate regression analysis of differentially expressed genes (DEGs) to identify prognostic genes associated with colorectal cancer. A total of 109 prognostic genes linked to overall survival time were discerned (Fig. 3B). To mitigate overfitting, Lasso regression analysis and multivariate Cox regression analysis were employed to pinpoint key prognostic genes. Lasso regression analysis yielded 18 potential genes, as depicted in Fig. 3C–E. Subsequent multivariate Cox regression analysis of the 18 potential prognostic genes resulted in the identification of 6 hub genes for the construction of the prognostic risk model: GZMB, FMO5, NDUFA11, LIPG, MOGAT2, and FIGNL1 (Fig. 3F). The risk score for each patient was computed using the formula: Risk score = (0.118 × NDUFA11 expression)—(0.203 × GZMB expression)—(0.234 × FMO5 expression)—(0.255 × LIPG expression)—(0.234 × MOGAT2 expression)—(0.225 × FIGNL1 expression). Subsequently, a predictive prognostic model for colorectal cancer patients, based on E2 Fs-related clusters, was established, and patients were stratified into high-risk and low-risk groups using the median risk score for further analysis.

Evaluation of the clinical characteristics and validation of the prognostic risk model
Following the establishment of the prognostic risk model, we conducted an analysis of the clinical characteristics of high-risk and low-risk groups in the GSE39582 dataset, serving as the training dataset. Initially, we examined the distribution of the risk score and patient survival outcomes. The analysis revealed a positive correlation between increased risk scores and elevated mortality rates among colorectal cancer patients (Fig. 4A). Subsequently, the expression levels of the six genes in the risk group were evaluated, demonstrating downregulation of five genes (GZMB, FMO5, LIPG, MOGAT2, and FIGNL1) alongside upregulation of the gene NDUFA11 in the high-risk group (Fig. 4C). Kaplan–Meier survival analysis further illustrated significantly improved prognostic outcomes for patients in the low-risk group as opposed to those in the high-risk group within the training dataset (p < 0.0001) (Fig. 4B). Subsequent validation using the TCGA-COADREAD dataset confirmed the unfavorable overall survival of patients in the high-risk group (p = 0.0357) (Fig. 4D). Additionally, univariate and multivariate prognostic analyses identified the risk score as an independent risk factor in the prognosis of colorectal cancer (Fig S3 A, B). Correlation analysis indicated a positive association between higher risk scores and unfavorable T stage, N stage, M stage, and clinical stage (Fig S3 C-F). Moreover, time-dependent ROC curves were constructed to assess the predictive capability of the prognostic risk model for colorectal cancer patients, with AUC values at 1-year, 3-year, and 5-year timeframes measuring 0.699, 0.694, and 0.705, respectively (Fig. 4F), indicating superior prognostic performance in predicting overall survival. Finally, a nomogram was developed based on the clinical characteristics and risk score to predict 1-year, 3-year, and 5-year OS outcomes, as depicted in Fig. 4E. Calibration curves were then designed, demonstrating close alignment with an ideal diagonal curve and affirming the high precision of the nomogram’s survival rate predictions for 1-year, 3-year, and 5-year periods (Fig. 4G).
To validate the prognostic risk model in colorectal cancer, the datasets labeled as GSE38832 and GSE161158 were obtained from the GEO database for use as validation cohorts. Kaplan–Meier survival curves were applied to both cohorts, demonstrating that patients classified in the high-risk group exhibited notably worse prognoses compared to their counterparts in the low-risk group (p = 0.0384; p = 0.0220) (Fig. 5A, B). Subsequently, the visualization of risk score distribution alongside patient survival outcomes reaffirmed the alignment between overall patient survival and the risk score (Fig. 5C, D). Moreover, the comprehensive ROC curves manifested skillful predictive capabilities for overall survival outcomes in colorectal cancer patients (Fig. 5E, F), mirroring the findings of the 1-year, 3-year, and 5-year ROC analyses conducted on the two GEO datasets (Fig. 5G, H).

Gene mutation landscape and DNA methylation in colorectal cancer
Somatic gene mutations are pivotal in the pathogenesis and progression of neoplastic diseases, and are fundamental in predicting the responsiveness to both immunotherapeutic and molecular targeted interventions. Our study aimed to delineate the molecular implications of genetic variances between cohorts characterized by high and low prognostic risk, as observed within the TCGA-COADREAD dataset. Our analysis revealed the prevalence of APC, TP53, TTN, KRAS, and SYNE1 mutations, exhibiting differential frequencies in the high-risk and low-risk groups (Fig. 6A, B). Notably, the CNV profiles of these genes displayed noteworthy alterations. Specifically, FMO5, MOGAT2, LIPG, GZMB, and FIGNL1 demonstrated a prevalent frequency of CNV gain, while NDUFA11 exhibited a widespread frequency of CNV loss (Fig. 6D). Assessing the distinction in tumor mutational burden (TMB) between the aforementioned groups indicated higher TMB values in the high-risk cohort, albeit without achieving statistical significance (p = 0.25) (Fig. 6E). Additionally, scrutiny of DNA methylation levels across the TCGA cohort unveiled higher methylation levels for FMO5, NDUFA11, LIPG, and FIGNL1 in tumor samples relative to normal samples, while MOGAT2 and GZMB displayed lower methylation levels in tumor samples compared to normal samples (Fig. 6C).

Immune cell infiltration analysis and immunotherapy response prediction in two risk groups
The tumor microenvironment is recognized for its pivotal role in tumor development, with immune cell infiltration being closely associated with the efficacy of chemotherapy and immunotherapy. In our study concerning the interplay between prognostic risk models and tumor microenvironment signatures, we utilized the CIBERSORT algorithm to determine the proportions of immune infiltration cells in each sample of the TCGA cohort (Fig. 7A). By comparing the immune cell proportions between the two risk groups, we found that T cells CD4 naive, T cells regulatory, macrophages M2, and neutrophils were notably enriched in the high-risk group, while B cell naive, B cell memory, T cells CD4 memory activated, NK cells resting, and dendritic cells activated showed greater enrichment in the low-risk group (p < 0.05) (Fig. 7B). Subsequently, leveraging the ssGSEA algorithm, we compared the proportions of infiltrating immune cells in the GSE cohort. Our results demonstrated that CD56 bright NK cell, immature dendritic cell, macrophage, regulatory T cell, and T help cell exhibited higher scores in the high-risk group, whereas activated CD4 T cell and activated CD8 T cell displayed higher scores in the low-risk group (Fig. 7C). Furthermore, our assessment of immune pathways between the two risk groups indicated elevated scores for APC-co-stimulation and Type II-IFN-response pathways in the high-risk group, while the Cytolytic-activity, Inflammation-promoting, and T cell-co-stimulation pathways had higher scores in the low-risk group (Fig. 7E). Notably, utilizing the ESTIMATE algorithm in the GSE38832 cohort, we computed TME scores and noted higher stromal and estimate scores in the high-risk group compared to those in the low-risk group (p < 0.01) (Fig. 7D).
To further investigate the association between immunotherapy response and distinct risk groups, we conducted an analysis of the TIDE scores for each sample within the GSE39582 cohort. Our findings revealed that samples in the high-risk group exhibited higher TIDE scores in comparison to those in the low-risk group, as depicted in Fig. 7F. This suggested that a higher risk score was indicative of a poorer immunotherapy response. In conclusion, the severity of the risk score had the potential to serve as an evaluative measure for both immune microenvironment infiltration and the immunotherapy responses among patients.

ScRNA-seq analysis of prognostic risk genes
In order to further investigate and identify the expression and distribution of prognostic risk genes in the tumor microenvironment, we analyzed a single-cell atlas (GSE108989) comprising 12 colorectal cancer samples retrieved from the TISCH online database. The T cell-related subgroups were segmented into five distinct cell types: CD4 Tconv, CD8 T, CD8 Tex, Tprolif, and Treg (Fig. 8A). The expression and distribution of the six prognostic risk genes are visually represented in this single-cell atlas (Fig. 8B–H). The gene NDUFA11 exhibited ubiquitous expression within the tumor microenvironment of CRC, while FMO5, MOGAT2, LIPG, and FIGNL1 demonstrated mild to moderate expression levels in CRC. Furthermore, GZMB predominantly manifested in CD8 T cells.

Drug sensitivity analysis of the two risk groups
In order to examine the response to chemotherapeutic treatment for CRC and assess the association between drug sensitivity and risk scores within two risk groups, we retrieved chemotherapeutic agent data from the GDSC database. Subsequently, we computed the half-maximal inhibitory concentration (IC50) of the samples within the TCGA cohort and cohort GSE39582. Notably, our analysis revealed significant variations in the sensitivity to various chemotherapy drugs, such as Cisplatin, Cytarabine, Dasatinib, Docetaxel, Epirubicin, Fludarabine, Lapatinib, Nilotinib, and Vincristine, between the two risk groups in the TCGA cohort (p < 0.01) (Fig. 9A), indicating that patients in the low-risk group exhibited heightened sensitivity to these chemotherapy drugs. Additionally, an analysis of commonly utilized chemotherapeutic and targeted therapeutic drugs in cohort GSE39582 showcased that the low-risk group displayed increased sensitivity to drugs including bicalutamide, bleomycin, doxorubicin, etoposide, methotrexate, midostaurin, obatociax, parthenolide, pazopanib, and roscovitine (Fig. 9B–C, E–H, J–M). Conversely, the high-risk group demonstrated elevated sensitivity to bortezomib and mitomycin C (Fig. 9D, I). These discernments suggested that these chemotherapy drugs might serve as potential candidates for patients with lower risk scores. Moreover, the risk scores derived from the E2Fs-related genes could serve as effective molecular markers for predicting responses to chemotherapeutic drugs and as potential therapeutic targets for CRC patients.

Discussion

Discussion
Numerous studies have indicated that E2Fs play a role in promoting cell proliferation and are involved in cell cycle regulation, thus contributing to cancer development [23–27]. Nevertheless, the clinical implications of E2Fs-based gene signatures remain elusive. This investigation sought to establish an E2Fs-based prognostic risk model and evaluate gene signatures associated with prognostic prediction, tumor mutation landscapes, and immune infiltration through the use of various public databases and advanced computational tools. In summary, our findings have led to the identification of a credible prognostic prediction signature based on E2Fs-based genes, with potential implications for clinical practice.
The development of effective models for predicting CRC risk holds significant promise and merits further exploration within the realm of precision medicine. Recent years have witnessed the emergence of numerous studies focused on the development of prediction models that furnish risk scores, thereby enhancing risk stratification and assessment among CRC patients [28, 29]. Additionally, machine learning-based genetic risk score models have proven instrumental in distinguishing individuals at high or low risk, thereby augmenting CRC patient screening and diagnosis [30, 31]. In this study, we conducted a comprehensive assessment of the clinical utility of the E2Fs-based gene signature in CRC through the application of multiple bioinformatics methods. Leveraging the expression profiles of E2Fs from the GEO database, we effectuated the delineation of two distinct clusters through consensus clustering analysis. The Lasso regression analysis, a widely employed machine learning algorithm, was utilized for the purpose of constructing a multi-gene prognostic signature [32]. Subsequently, we established the prognostic risk signature through the implementation of Lasso regression and multivariate Cox regression analysis. The validation cohort substantiated the validity of the prognostic signature via survival curve and ROC curve analyses, signifying that patients in the high-risk group exhibited poorer prognoses and shorter survival durations. Prior research has underscored the prognostic value of E2Fs in various cancers, including ovarian, lung, and gastric cancer [33–35]. Recognizing the inherent complexity and heterogeneity of tumors, the prognostic value of individual genes is inherently circumscribed, thus necessitating the construction of a prognostic model for CRC founded upon multiple genes. Our findings underscore that the established E2Fs-based prognostic signature effectively facilitates the evaluation of prognosis, thus representing a novel supplementary method for managing CRC patients.
The study presented a prognostic risk signature comprising six genes (GZMB, FMO5, NDUFA11, LIPG, MOGAT2, and FIGNL1). Notably, Granzyme B (GZMB) is a serine protease categorized within the granzyme subfamily of proteins [36]. Produced by natural killer (NK) cells and cytotoxic T lymphocytes (CTLs), GZMB prompts apoptosis in tumor cells. The impact of GZMB interference has been observed to enhance the metastatic capabilities of CRC cells while decreasing apoptosis [37]. Furthermore, high levels of GZMB expression have been associated with a favorable prognosis in skin cutaneous melanoma, indicating a potential negative role in diverse tumor development [38]. Additionally, GZMB, serving as an activation marker of cytotoxic lymphocytes, may reflect a patient’s immunity against cancer cells and can potentially serve as a useful prognostic marker for lung adenocarcinoma [39]. Our study findings also revealed a negative correlation between GZMB and CRC prognosis, with GZMB enrichment in CD8 + T cells as observed through single-cell sequencing analysis, suggesting a potential protective role in CRC outcomes.
Prior research has established a significant correlation between E2Fs and tumor immune cell infiltration as well as immune regulatory processes [40, 41]. We have developed a prognostic risk signature based on E2Fs expression and employed the CIBERSORT, ssGSEA, and ESTIMATE algorithms to investigate the distinct characteristics of the tumor immune microenvironment in different risk groups. Our investigation indicates that the high-risk group exhibits notable enrichment of T cells CD4 naive, T cells regulatory, macrophages M2, and neutrophils. In contrast, the low-risk group demonstrates stronger associations with B cells naive, B cells memory, T cells CD4 memory activated, NK cells resting, and activated dendritic cells. Furthermore, the ESTIMATE algorithm analysis reveals higher stromal and estimate scores in the high-risk group compared to the low-risk group. Additionally, our single-cell sequencing analysis reveals significant enrichment of GZMB in CD8 T cells, suggesting a close association between the identified prognostic gene signatures and tumor immune cell infiltration. These outcomes imply a strong correlation between the E2Fs-based prognostic risk signature and the tumor immune microenvironment, although further inquiry is essential to elucidate the specific molecular mechanisms involved.
Kim et al. demonstrated that breast cancer cells developed resistance to CDK4/6 inhibitors through a two-step E2F activation process involving Rb-protein degradation and c-Myc-mediated amplification of E2F activity [42]. Hamidi et al. found that silencing or inhibiting E2F1/E2F2 induced DNA damage during S phase and potentiated 5-FU-induced replication stress and cellular toxicity [43]. Several studies have indicated a potential connection between E2Fs and the development of drug resistance in tumors. Targeted treatment involving E2Fs shows promise for the treatment of tumors. Therefore, we conducted drug sensitivity analysis and prediction of potential immunotherapy response in patients to facilitate personalized treatment of colorectal cancer. We utilized the GDSC database to identify chemotherapeutic drugs. The results, based on IC50, indicated that the low-risk group is more responsive to most chemotherapy drugs, including Cisplatin, Cytarabine, Dasatinib, Fludarabine, Nilotinib, and Vincristine. Furthermore, our findings suggested that the E2Fs-based prognostic risk signature has the potential to predict treatment responses to chemotherapy drugs, with the low-risk group showing better responses. We also applied the TIDE algorithm to assess the response to immunotherapy in two distinct risk groups. The results revealed that the high-risk group had poorer immunotherapy efficacy. This study may help in the selection of specific molecular subtypes of patients for immunotherapy and in guiding the choice of individualized clinical therapy for CRC.
However, it is essential to acknowledge that our study has its limitations. All data obtained from TCGA database and the GEO database don’t include Chinese gene samples, and the conclusions may not fit all populations. The lack of relevant experimental data and our own clinical data are also shortcomings of this study. More in vitro and in vivo experiments and clinical studies are needed for future research.

Conclusion

Conclusion
In this study, we have identified six E2Fs-related genes to develop a prognostic risk model for predicting outcomes of colorectal cancer patients. Our integrative bioinformatic analysis encompassed multiple aspects and proved to be effective and accurate. Additionally, we observed significant differences in gene mutation landscape, tumor microenvironment, and drug sensitivity between distinct risk groups. These findings could greatly benefit colorectal cancer patients by guiding clinical chemotherapy and individualized targeted therapy. The six prognostic hub genes—FMO5, NDUFA11, LIPG, FIGNL1, MOGAT2, and GZMB—have the potential to serve as indicators for predicting prognosis and immunotherapeutic response in CRC patients.

Supplementary Information

Supplementary Information

출처: PubMed Central (JATS). 라이선스는 원 publisher 정책을 따릅니다 — 인용 시 원문을 표기해 주세요.

🏷️ 같은 키워드 · 무료전문 — 이 논문 MeSH/keyword 기반

🟢 PMC 전문 열기