Integrative machine learning identifies lactylation-related gene signature prognostic for chemotherapeutic efficacy in colorectal carcinoma.
1/5 보강
[BACKGROUND] Lactylation, a recently discovered post-translational modification, has emerged as a critical regulator in cancer biology.
APA
Hu H, Peng F, et al. (2025). Integrative machine learning identifies lactylation-related gene signature prognostic for chemotherapeutic efficacy in colorectal carcinoma.. Discover oncology, 17(1), 69. https://doi.org/10.1007/s12672-025-04233-0
MLA
Hu H, et al.. "Integrative machine learning identifies lactylation-related gene signature prognostic for chemotherapeutic efficacy in colorectal carcinoma.." Discover oncology, vol. 17, no. 1, 2025, pp. 69.
PMID
41353668 ↗
Abstract 한글 요약
[BACKGROUND] Lactylation, a recently discovered post-translational modification, has emerged as a critical regulator in cancer biology. Although chemotherapy remains the first-line treatment for metastatic colorectal cancer (CRC), only a subset of patients responds to it. This study aimed to identify key lactylation-related genes in CRC and evaluate their potential as predictive biomarkers for chemotherapy response.
[METHOD] Gene expression profiles and corresponding clinical data from CRC patients were obtained from The Cancer Genome Atlas (TCGA), Genotype-Tissue Expression (GTEx), and Gene Expression Omnibus (GEO) databases. Differentially expressed genes (DEGs) were identified using the limma R package, and key modules were selected through weighted gene co-expression network analysis (WGCNA). Intersecting genes were determined by aligning DEGs with WGCNA module genes. A predictive model was developed utilizing 11 machine learning algorithms and 92 algorithm combinations. Furthermore, the correlation between lactylation-related gene score and immune infiltration as well as drug sensitivity in CRC were also investigated with "CIBERSORT" and "oncoPredict" package.
[RESULTS] Eight lactylation-related genes in CRC were identified and used to construct a predictive model employing Random Forest (RF) and Gradient Boosting Machine (GBM) algorithms. The model demonstrated strong predictive efficacy for chemotherapy response in CRC patients. Using lactylation gene scores, we effectively stratified patients into high- and low-score groups, which showed distinct patterns in immune cell infiltration, tumor mutational profile, and response to conventional antitumor drugs. Notably, the high-lactylation score group exhibited reduced Treg immune characteristics and increased sensitivity to 5-Fluorouracil.
[CONCLUSIONS] In summary, our findings demonstrate that machine learning-driven analysis of lactylation biomarkers represents a promising approach for advancing personalized therapy and optimizing clinical management in CRC.
[METHOD] Gene expression profiles and corresponding clinical data from CRC patients were obtained from The Cancer Genome Atlas (TCGA), Genotype-Tissue Expression (GTEx), and Gene Expression Omnibus (GEO) databases. Differentially expressed genes (DEGs) were identified using the limma R package, and key modules were selected through weighted gene co-expression network analysis (WGCNA). Intersecting genes were determined by aligning DEGs with WGCNA module genes. A predictive model was developed utilizing 11 machine learning algorithms and 92 algorithm combinations. Furthermore, the correlation between lactylation-related gene score and immune infiltration as well as drug sensitivity in CRC were also investigated with "CIBERSORT" and "oncoPredict" package.
[RESULTS] Eight lactylation-related genes in CRC were identified and used to construct a predictive model employing Random Forest (RF) and Gradient Boosting Machine (GBM) algorithms. The model demonstrated strong predictive efficacy for chemotherapy response in CRC patients. Using lactylation gene scores, we effectively stratified patients into high- and low-score groups, which showed distinct patterns in immune cell infiltration, tumor mutational profile, and response to conventional antitumor drugs. Notably, the high-lactylation score group exhibited reduced Treg immune characteristics and increased sensitivity to 5-Fluorouracil.
[CONCLUSIONS] In summary, our findings demonstrate that machine learning-driven analysis of lactylation biomarkers represents a promising approach for advancing personalized therapy and optimizing clinical management in CRC.
🏷️ 키워드 / MeSH 📖 같은 키워드 OA만
같은 제1저자의 인용 많은 논문 (5)
- Anatomical Guidelines and Technical Tips for Neck Aesthetics with Botulinum Toxin.
- Correction: The lncRNA THOR interacts with and stabilizes hnRNPD to promote cell proliferation and metastasis in breast cancer.
- Tapentadol induces progressive hepatic damage via disrupting Hippo-YAP and bile acid-FXR pathways: An integrated experimental and computational perspective.
- Pulmonary infection by mimicking lung cancer with concurrent pulmonary embolism in an immunocompetent host: a case highlighting the diagnostic role of mNGS.
- Advances in natural killer cell immunotherapy for hematologic malignancies.
📖 전문 본문 읽기 PMC JATS · ~53 KB · 영문
Introduction
Introduction
Colorectal cancer (CRC) ranks as the third most prevalent malignancy globally, accounting for an estimated 1.9 million new cases and 903,859 deaths in 2022 [1]. Despite advances in treatment strategies and evolving medical standards, the prognosis of CRC patients remains unfavorable, largely due to frequent late-stage diagnosis [2]. Current first-line chemotherapy regimens for advanced CRC (including FOLFOX, FOLFIRI, FOLFOXIRI, and CAPEOX) demonstrate response rates of only approximately 50%, while non-responders experience delayed clinical benefits and treatment-related adverse effects [3, 4]. These limitations underscore the urgent need for predictive biomarkers to identify patients most likely to benefit from specific chemotherapy regimens.
Chemotherapy response in CRC is influenced by multiple molecular mechanisms. Key determinants include intratumoral heterogeneity, tumor microenvironment (TME) composition, and the presence of cancer stem cells, all of which significantly affect drug sensitivity [5]. Treatment outcomes are further modulated by complex interactions between pre-existing factors and chemotherapy-induced adaptations, involving apoptotic pathway disruptions, mitochondrial dysfunction, altered autophagy, dysregulated cellular signaling, reduced drug bioavailability and various genetic and epigenetic modifications [6, 7]. Recently, lactylation has emerged as a novel post-translational modification involving the conjugation of lactoyl groups to lysine residues, playing important regulatory roles in gene expression and tumor biology. Emerging evidence demonstrates its multifaceted roles in cancer progression, including promotion of tumor angiogenesis [8, 9] and suppression of immune cell cytotoxicity within the TME [10–12]. As both a metabolic byproduct and signaling molecule, lactate mediates intra- and intercellular communication [13, 14] and enhances chemoresistance through gene expression modulation [15]. Notably, recent studies have shown that upregulated histone lactylation in CRC promotes bevacizumab resistance via RUBCNL-mediated autophagy enhancement [16].
Recent advances in machine learning have demonstrated powerful predictive capabilities, emerging as transformative tools for disease diagnosis and prognosis across medical specialties. These algorithms excel at extracting meaningful patterns from complex datasets, identifying subtle relationships, and enhancing both predictive accuracy and clinical decision-making. In oncology particularly, machine learning offers unprecedented opportunities to decipher resistance mechanisms and develop personalized treatment strategies to overcome chemoresistance [17, 18]. Although machine learning approaches have been widely employed to predict drug responses in malignancies such as breast cancer [19], their application in CRC remains underexplored, representing a promising avenue for advancing precision medicine in CRC management.
Weighted Gene Co-expression Network Analysis (WGCNA) is a systems biology method for identifying biologically relevant gene modules and their associations with phenotypic traits, while simultaneously detecting hub genes [20]. In this study, we employed WGCNA to identify lactylation-associated module genes and subsequently selected candidate genes using multiple machine learning algorithms, including Gradient Boosting Machine (GBM) and Random Forest (RF). The predictive performance of these key genes for chemotherapy response in CRC was rigorously evaluated through receiver operating characteristic (ROC) curve analysis using independent validation datasets. Furthermore, we conducted comprehensive analyses to examine correlations between our lactylation gene signature score and various biological factors, including immune cell infiltration patterns, tumor mutational profiles, functional pathway enrichment and drug sensitivity.
Colorectal cancer (CRC) ranks as the third most prevalent malignancy globally, accounting for an estimated 1.9 million new cases and 903,859 deaths in 2022 [1]. Despite advances in treatment strategies and evolving medical standards, the prognosis of CRC patients remains unfavorable, largely due to frequent late-stage diagnosis [2]. Current first-line chemotherapy regimens for advanced CRC (including FOLFOX, FOLFIRI, FOLFOXIRI, and CAPEOX) demonstrate response rates of only approximately 50%, while non-responders experience delayed clinical benefits and treatment-related adverse effects [3, 4]. These limitations underscore the urgent need for predictive biomarkers to identify patients most likely to benefit from specific chemotherapy regimens.
Chemotherapy response in CRC is influenced by multiple molecular mechanisms. Key determinants include intratumoral heterogeneity, tumor microenvironment (TME) composition, and the presence of cancer stem cells, all of which significantly affect drug sensitivity [5]. Treatment outcomes are further modulated by complex interactions between pre-existing factors and chemotherapy-induced adaptations, involving apoptotic pathway disruptions, mitochondrial dysfunction, altered autophagy, dysregulated cellular signaling, reduced drug bioavailability and various genetic and epigenetic modifications [6, 7]. Recently, lactylation has emerged as a novel post-translational modification involving the conjugation of lactoyl groups to lysine residues, playing important regulatory roles in gene expression and tumor biology. Emerging evidence demonstrates its multifaceted roles in cancer progression, including promotion of tumor angiogenesis [8, 9] and suppression of immune cell cytotoxicity within the TME [10–12]. As both a metabolic byproduct and signaling molecule, lactate mediates intra- and intercellular communication [13, 14] and enhances chemoresistance through gene expression modulation [15]. Notably, recent studies have shown that upregulated histone lactylation in CRC promotes bevacizumab resistance via RUBCNL-mediated autophagy enhancement [16].
Recent advances in machine learning have demonstrated powerful predictive capabilities, emerging as transformative tools for disease diagnosis and prognosis across medical specialties. These algorithms excel at extracting meaningful patterns from complex datasets, identifying subtle relationships, and enhancing both predictive accuracy and clinical decision-making. In oncology particularly, machine learning offers unprecedented opportunities to decipher resistance mechanisms and develop personalized treatment strategies to overcome chemoresistance [17, 18]. Although machine learning approaches have been widely employed to predict drug responses in malignancies such as breast cancer [19], their application in CRC remains underexplored, representing a promising avenue for advancing precision medicine in CRC management.
Weighted Gene Co-expression Network Analysis (WGCNA) is a systems biology method for identifying biologically relevant gene modules and their associations with phenotypic traits, while simultaneously detecting hub genes [20]. In this study, we employed WGCNA to identify lactylation-associated module genes and subsequently selected candidate genes using multiple machine learning algorithms, including Gradient Boosting Machine (GBM) and Random Forest (RF). The predictive performance of these key genes for chemotherapy response in CRC was rigorously evaluated through receiver operating characteristic (ROC) curve analysis using independent validation datasets. Furthermore, we conducted comprehensive analyses to examine correlations between our lactylation gene signature score and various biological factors, including immune cell infiltration patterns, tumor mutational profiles, functional pathway enrichment and drug sensitivity.
Materials and methods
Materials and methods
Datasets searching and screening
The RNA transcriptome data and clinicopathological information of CRC patients were extracted from The Cancer Genome Atlas (TCGA) database(http://cancergenome.nih.gov), encompassing 473 tumor samples and 41 adjacent normal samples. Additionally, RNA-seq data from 307 normal samples were sourced from the Genotype-Tissue Expression (GTEx) project (https://www.gtexportal.org/). Four independent gene expression datasets, namely, GSE19860, GSE62080, GSE62322, and GSE69657 were downloaded from Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo). The chemotherapy response status (responder vs. non-responder) for patients in these GEO datasets was extracted from the accompanying clinical metadata, which was primarily assessed using the Response Evaluation Criteria in Solid Tumors (RECIST). These datasets include patients treated with FOLFOX or FOLFIRI regimens, allowing for analysis of gene expression profiles associated with differential treatment response.
Weighted gene correlation network analysis to determine the core lactylation-related genes
In this study, we utilized the ‘WGCNA’ package in R to investigate gene-phenotype correlations by constructing a gene co-expression network. We began by integrating TCGA COAD and GTEx tissue sample data, filtering out genes with low expression (counts below 5 across all samples). Next, we calculated Pearson correlation matrices for all gene pairs and derived a weighted adjacency matrix using the average linkage method with a weighted correlation coefficient. Applying the ‘soft’ thresholding power (β), we converted the adjacency into a topological overlap matrix (TOM). Using TOM-derived dissimilarity measures, we conducted average linkage hierarchical clustering to categorize genes with similar expression patterns into modules, setting a minimum module size of 200 genes. With a merge threshold of 0.25, we delineated 7 distinct modules. To quantitatively define the “lactylation” trait for the WGCNA analysis, we obtained a validated list of lactylation-related genes from a published authoritative study [21]. Using the GSVA method, we calculated the gene set enrichment score for this predefined gene set in each sample. All samples were then stratified into “high-lactylation” and “low-lactylation” groups based on the median of these GSVA scores. This binary trait was used as the key phenotypic input for the WGCNA. By correlating the constructed gene co-expression network with this “High/Low Lactylation” trait, we identified the specific modules most significantly associated with lactylation status. The genes within these core lactylation-associated modules were extracted and defined as candidate lactylation-related gene sets for further analysis.
Differentially expressed gene screening
We identified differentially expressed genes (DEGs) between chemotherapy responders and non-responders in the GSE19860 dataset using the following criteria: p < 0.05 and |log2(fold change)| >1.5. This analysis was performed using the limma package in R. Furthermore, we intersected these DEGs genes with the candidate lactylation-related gene set obtained from WGCNA to identify lactylation-related DEGs for subsequent analysis.
Function enrichment analysis
To elucidate the biological functions of the identified genes, we performed Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analyses using the “clusterProfiler” package in R, with a significance threshold of p < 0.05. The ensuing results were graphically represented using R (version 4.3.2).
Machine learning
The study employed a comprehensive machine learning framework incorporating 11 machine learning algorithms, including Ridge Regression (Ridge), Elastic Net (Enet) (with different alpha parameters), Stepwise Generalized Linear Model (Stepglm) (with forward/backward/both selection directions), Support Vector Machine (SVM), Linear Discriminant Analysis (LDA), Generalized Linear Model Boosting (glmBoost), Partial Least Squares Regression for Generalized Linear Models (plsRglm), Random Forest (RF), Gradient Boosting Machine (GBM), Extreme Gradient Boosting (XGBoost), and Naive Bayes. Through a two-layer approach combining feature selection and predictive modeling, 92 algorithm combinations were systematically constructed and evaluated. The signature development process followed these key stages: (a) Lactate-related module genes identified by WGCNA were screened through module-trait association analysis. The DE-lactylation-related genes were obtained by intersecting the candidate module genes with the DEGs between chemotherapy responders and non-responders; (b) Batch correction and normalization of expression profiles from the GEO datasets (GSE19860, GSE62080, GSE62322, GSE69657) were performed using R package “sva” (Figure S1). (c) GSE19860 cohort served as the training set, while the remaining three cohorts (GSE62080, GSE62322, GSE69657) were maintained as independent validation sets. All 92 algorithm combinations were applied to the DE-lactylation-related genes, with model performance evaluated through 10-fold cross-validation on the training set for unbiased model comparison. Model parameters were tuned based on inner cross-validation; (d) The C-index was calculated for each algorithm combination across both training and validation sets. The combination achieving the highest average C-index in validation was selected as optimal. The C-index ranges from 0 to 1, with values closer to 1 indicating better predictive performance.
Gene set variation analysis (GSVA) and gene set enrichment analysis (GSEA)
Based on the 8 lactylation-related genes identified through machine learning, and the GSVA score was computed for each sample using the R package GSVA. Subsequently, samples were categorized into GSVA_high score group and GSVA_low score group according to the median GSVA value for further analysis. To assess biological pathway activities across different groups, we performed GSVA enrichment analysis using the ‘c2.cp.kegg.Hs.symbols’ gene set obtained from the MSigDB online database (http://software.broadinstitute.org/gsea/msigdb). Additionally, Gene Set Enrichment Analysis (GSEA) was conducted to further characterize the functional differences between groups, employing the same gene set as reference. Significantly enriched pathways were identified using a false discovery rate (FDR) threshold of < 0.05.
Immune infiltration analysis
Using CIBERSORT, we analyzed gene expression profiles to quantify immune cell infiltration levels. Comparative analyses were conducted to examine immune cell composition between responders and non-responders in the GSE19860 dataset, as well as between high and low lactylation signature groups in the TCGA COAD cohort. Boxplots were generated to compare relative immune cell abundances across different groups. Furthermore, spearman’s correlation is used to calculate the correlation between immune cells and cells, as well as between the 8 lactylation-related genes and the abundance of immune cells. All correlation analyses were visualized using the “linkET” and “ggplot2” packages in R.
Drug sensitivity analysis
To investigate the heterogeneity in chemotherapy response, we retrieved pharmacological data for commonly used chemotherapeutic agents from the Cancer Drug Sensitivity Genomics (GDSC) Database (https://www.cancerrxgene.org/). Using the oncoPredict package, we estimated the drug sensitivity (IC50) profiles within the context of cancer treatment between high and low lactylation signature groups. The differences in IC50 among these subgroups (Wilcoxon rank test, p < 0.05 and fold change > 1.5) were visually represented through heatmap and box plots created using the “ComplexHeatmap”, “ggplot2” and “ggpubr” packages. Pearson correlation analysis was performed to get the correlation between gene mRNA expression of the 8 lactylation-related genes and drug IC50 values. P-value was adjusted by FDR.
Mutation characteristics of the two risk subgroups
To determine the correlations between gene mutations and the two subgroups defined by the 8 lactylation-related gene signature score (high and low GSVA score groups), we obtained CRC mutation data from the TCGA database. Somatic mutation patterns, including variant characteristics, co-occurrence, and mutual exclusivity between the two subgroups, were comprehensively analyzed using the “Maftools” R package.
Statistical analysis
Statistical analysis was conducted to analyze the data obtained in this study. The ROC curve and the area under the curve (AUC) were constructed using R software (version 4.3.2), and the 95% confidence intervals (CI) was calculated. Group comparisons of clinical characteristics from TCGA, immune cell proportions, and drug sensitivity (IC50) values were conducted using the Wilcoxon rank test. Survival differences between subgroups were assessed via Kaplan-Meier analysis, with statistical significance evaluated by the log-rank test A p-value < 0.05 was considered statistically significant.
Datasets searching and screening
The RNA transcriptome data and clinicopathological information of CRC patients were extracted from The Cancer Genome Atlas (TCGA) database(http://cancergenome.nih.gov), encompassing 473 tumor samples and 41 adjacent normal samples. Additionally, RNA-seq data from 307 normal samples were sourced from the Genotype-Tissue Expression (GTEx) project (https://www.gtexportal.org/). Four independent gene expression datasets, namely, GSE19860, GSE62080, GSE62322, and GSE69657 were downloaded from Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo). The chemotherapy response status (responder vs. non-responder) for patients in these GEO datasets was extracted from the accompanying clinical metadata, which was primarily assessed using the Response Evaluation Criteria in Solid Tumors (RECIST). These datasets include patients treated with FOLFOX or FOLFIRI regimens, allowing for analysis of gene expression profiles associated with differential treatment response.
Weighted gene correlation network analysis to determine the core lactylation-related genes
In this study, we utilized the ‘WGCNA’ package in R to investigate gene-phenotype correlations by constructing a gene co-expression network. We began by integrating TCGA COAD and GTEx tissue sample data, filtering out genes with low expression (counts below 5 across all samples). Next, we calculated Pearson correlation matrices for all gene pairs and derived a weighted adjacency matrix using the average linkage method with a weighted correlation coefficient. Applying the ‘soft’ thresholding power (β), we converted the adjacency into a topological overlap matrix (TOM). Using TOM-derived dissimilarity measures, we conducted average linkage hierarchical clustering to categorize genes with similar expression patterns into modules, setting a minimum module size of 200 genes. With a merge threshold of 0.25, we delineated 7 distinct modules. To quantitatively define the “lactylation” trait for the WGCNA analysis, we obtained a validated list of lactylation-related genes from a published authoritative study [21]. Using the GSVA method, we calculated the gene set enrichment score for this predefined gene set in each sample. All samples were then stratified into “high-lactylation” and “low-lactylation” groups based on the median of these GSVA scores. This binary trait was used as the key phenotypic input for the WGCNA. By correlating the constructed gene co-expression network with this “High/Low Lactylation” trait, we identified the specific modules most significantly associated with lactylation status. The genes within these core lactylation-associated modules were extracted and defined as candidate lactylation-related gene sets for further analysis.
Differentially expressed gene screening
We identified differentially expressed genes (DEGs) between chemotherapy responders and non-responders in the GSE19860 dataset using the following criteria: p < 0.05 and |log2(fold change)| >1.5. This analysis was performed using the limma package in R. Furthermore, we intersected these DEGs genes with the candidate lactylation-related gene set obtained from WGCNA to identify lactylation-related DEGs for subsequent analysis.
Function enrichment analysis
To elucidate the biological functions of the identified genes, we performed Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analyses using the “clusterProfiler” package in R, with a significance threshold of p < 0.05. The ensuing results were graphically represented using R (version 4.3.2).
Machine learning
The study employed a comprehensive machine learning framework incorporating 11 machine learning algorithms, including Ridge Regression (Ridge), Elastic Net (Enet) (with different alpha parameters), Stepwise Generalized Linear Model (Stepglm) (with forward/backward/both selection directions), Support Vector Machine (SVM), Linear Discriminant Analysis (LDA), Generalized Linear Model Boosting (glmBoost), Partial Least Squares Regression for Generalized Linear Models (plsRglm), Random Forest (RF), Gradient Boosting Machine (GBM), Extreme Gradient Boosting (XGBoost), and Naive Bayes. Through a two-layer approach combining feature selection and predictive modeling, 92 algorithm combinations were systematically constructed and evaluated. The signature development process followed these key stages: (a) Lactate-related module genes identified by WGCNA were screened through module-trait association analysis. The DE-lactylation-related genes were obtained by intersecting the candidate module genes with the DEGs between chemotherapy responders and non-responders; (b) Batch correction and normalization of expression profiles from the GEO datasets (GSE19860, GSE62080, GSE62322, GSE69657) were performed using R package “sva” (Figure S1). (c) GSE19860 cohort served as the training set, while the remaining three cohorts (GSE62080, GSE62322, GSE69657) were maintained as independent validation sets. All 92 algorithm combinations were applied to the DE-lactylation-related genes, with model performance evaluated through 10-fold cross-validation on the training set for unbiased model comparison. Model parameters were tuned based on inner cross-validation; (d) The C-index was calculated for each algorithm combination across both training and validation sets. The combination achieving the highest average C-index in validation was selected as optimal. The C-index ranges from 0 to 1, with values closer to 1 indicating better predictive performance.
Gene set variation analysis (GSVA) and gene set enrichment analysis (GSEA)
Based on the 8 lactylation-related genes identified through machine learning, and the GSVA score was computed for each sample using the R package GSVA. Subsequently, samples were categorized into GSVA_high score group and GSVA_low score group according to the median GSVA value for further analysis. To assess biological pathway activities across different groups, we performed GSVA enrichment analysis using the ‘c2.cp.kegg.Hs.symbols’ gene set obtained from the MSigDB online database (http://software.broadinstitute.org/gsea/msigdb). Additionally, Gene Set Enrichment Analysis (GSEA) was conducted to further characterize the functional differences between groups, employing the same gene set as reference. Significantly enriched pathways were identified using a false discovery rate (FDR) threshold of < 0.05.
Immune infiltration analysis
Using CIBERSORT, we analyzed gene expression profiles to quantify immune cell infiltration levels. Comparative analyses were conducted to examine immune cell composition between responders and non-responders in the GSE19860 dataset, as well as between high and low lactylation signature groups in the TCGA COAD cohort. Boxplots were generated to compare relative immune cell abundances across different groups. Furthermore, spearman’s correlation is used to calculate the correlation between immune cells and cells, as well as between the 8 lactylation-related genes and the abundance of immune cells. All correlation analyses were visualized using the “linkET” and “ggplot2” packages in R.
Drug sensitivity analysis
To investigate the heterogeneity in chemotherapy response, we retrieved pharmacological data for commonly used chemotherapeutic agents from the Cancer Drug Sensitivity Genomics (GDSC) Database (https://www.cancerrxgene.org/). Using the oncoPredict package, we estimated the drug sensitivity (IC50) profiles within the context of cancer treatment between high and low lactylation signature groups. The differences in IC50 among these subgroups (Wilcoxon rank test, p < 0.05 and fold change > 1.5) were visually represented through heatmap and box plots created using the “ComplexHeatmap”, “ggplot2” and “ggpubr” packages. Pearson correlation analysis was performed to get the correlation between gene mRNA expression of the 8 lactylation-related genes and drug IC50 values. P-value was adjusted by FDR.
Mutation characteristics of the two risk subgroups
To determine the correlations between gene mutations and the two subgroups defined by the 8 lactylation-related gene signature score (high and low GSVA score groups), we obtained CRC mutation data from the TCGA database. Somatic mutation patterns, including variant characteristics, co-occurrence, and mutual exclusivity between the two subgroups, were comprehensively analyzed using the “Maftools” R package.
Statistical analysis
Statistical analysis was conducted to analyze the data obtained in this study. The ROC curve and the area under the curve (AUC) were constructed using R software (version 4.3.2), and the 95% confidence intervals (CI) was calculated. Group comparisons of clinical characteristics from TCGA, immune cell proportions, and drug sensitivity (IC50) values were conducted using the Wilcoxon rank test. Survival differences between subgroups were assessed via Kaplan-Meier analysis, with statistical significance evaluated by the log-rank test A p-value < 0.05 was considered statistically significant.
Results
Results
Characteristics of the included datasets
The research flow chart depicts the study design (Fig. 1). Four GEO datasets were included in the further analysis and characteristics of datasets were displayed in Table 1. The dataset exhibited batch effects, which were successfully removed (Figure S1). In summary, all datasets employed the Affymetrix Human Genome U133 Plus 2.0 Array for microarray analysis. GSE19860 were composed of 25 non-responder (NR) and 15 responder (R) samples of CRCs (Table 1). GSE62080 had 12 NR and 9 R samples. GSE62322 had 31 NR and 26 R samples, while GSE69657 had 17 NR and 13 R samples. Treatment response evaluation was performed at the end of the first-line FOLFOX treatment, and response rates varied from 31.03% to 60.60% across datasets.
Identification of differentially expressed genes
To investigate the factors that contribute to the response to chemotherapy in CRC, we compared expression profiles of chemotherapy responders and non-responders within the GSE19860 cohort, identifying a total of 780 differentially expressed genes (DEGs), comprising 353 upregulated and 427 downregulated genes (Fig. 2A), following the removal of duplicate values and multi-annotated probes. We referred to these DEGs as CRC chemotherapy response-associated genes.
A heatmap plot of the DEGs is presented in Fig. 2B. KEGG analysis revealed that the DEGs were related to Diabetic cardiomyopathy, Proteasome, Glutathione metabolism, Chemical carcinogenesis-reactive oxygen species, Thermogenesis, Citrate cycle (TCA cycle) in the KEGG pathway (Fig. 2C), while DEGs were related to aerobic respiration, oxidative phosphorylation, mitochondrial translation and ATP synthesis coupled electron transport in the biological process (BP) category (Fig. 2D). Regarding the molecular function (MF) category, DEGs were associated with structural constituent of ribosome, peroxidase activity, electron transfer activity and antioxidant activity etc. Moreover, they were principally enriched in mitochondrial protein-containing complex, mitochondrial inner membrane, mitochondrial matrix in the cellular component (CC) category (Fig. 2D).
Weighted gene co-expression network analysis and critical module identification
Based on the integrated expression matrix of 34,972 genes from TCGA COAD and GTEx (473 tumor and 348 normal samples), we established a weighted gene coexpression network. In Fig. 3A, the horizontal axis represents the threshold, and the vertical axis represents the evaluation parameters of the scale-free networks. As the network’s evaluation parameters increase, they become more consistent with the characteristics of a scale-free network. The horizontal line in the graph indicates a threshold value of 0.90. To create a scale-free network (Fig. 3A), we established a soft threshold power β of 9 based on the connection between the soft threshold and mean connectivity. The genes with similar expression patterns clustered together, and modules with a cut height difference of < 0.25 were merged, which ultimately produced 7 coexpression modules (Fig. 3B). The results revealed that 1628 genes in the blue module were strongly associated with CRC (cor = 0.9, p < 0.0001) and lactylation high score (cor = 0.84, p < 0.0001) (Fig. 3C). Therefore, 1628 hub genes of the blue module were subjected to further investigation.
We implemented enrichment analysis for the intersection of genes from DEGs and blue module genes. A total of 50 common genes were obtained, as shown in Fig. 3D. KEGG analysis elucidated that common genes were involved in “Cell cycle”, “Cellular senescenece”, “DNA replication” and “p53 signaling pathway”, as shown in Figure S2A. The results of GO analysis revealed that common genes were enriched in BP terms, including “regulation of mitotic sister chromatid segregation”, “negative regulation of chromosome segregation”, and “mitotic cell cycle checkpoint signaling”, as shown in Figure S2B. For CC ontology, the common genes are involved in “spindle microtubule”, “spindle”, “cyclin-dependent protein kinase holoenzyme complex”, “chromosomal region”. For MF, the results showed that “protein serine kinase activity”, “cytokine receptor activity”, “immune receptor activity” was the most significant term in common genes.
Identification of candidate hub genes via machine learning
To develop a robust chemotherapy response signature, we utilized a comprehensive machine learning framework on 50 candidate genes from the GSE19860 training dataset. This framework integrated feature selection and predictive modeling phases, encompassing 92 algorithm combinations to construct candidate models. The predictive performance of each combination was evaluated and compared using 10-fold cross-validation applied internally to the training set to ensure unbiased model selection. Model parameters were tuned through inner cross-validation during this process. The C-index was calculated for each model across both training and independent validation datasets. Notably, the signature derived from the ‘RF + GBM’ combination demonstrated the highest average C-index across all validation sets, as shown in Fig. 4A. Through variable importance analysis derived from both RF and GBM algorithms, this optimal model consistently identified 8 critical genes—CCNB1, CENPK, CXADR, HN1, MGAT4B, UBE2T, UGT8, and ZDHHC23—that were closely associated with the chemotherapy response in CRC patients.
To further evaluate the performance of the prediction model in response to chemotherapy in different CRC patients, the receiver operating characteristic curves of 8 feature genes obtained by machine learning method were drawn, and the AUC values of 8 disease feature genes were all > 0.5. Finally, the AUC values in the test dataset (GSE62080, GSE62322, GSE69657) were 0.824, 0.640, and 0.769, respectively (Fig. 4B), indicating its relatively good performance in predicting the response of patients to chemotherapy. Further, the ROC of genes in all datasets indicating that these 8 genes were characteristic genes of chemotherapy response with certain accuracy, as shown in Figure S3.
Functional analysis of the lactylation-related risk signature
Further, the results showed that 8 lactylation-related genes were significantly highly expressed in the CRC chemotherapy response group (Fig. 5A, B). The GSVA score was calculated based on the 8 lactylation-related genes in GSE19860, and patients were divided into high and low groups using the median score as the cut-off value to evaluate the performance of the prediction model (Figure S4). For KEGG enrichment of GSEA analysis, the results showed that cell cycle, oxidative phosphorylation and proteasome etc. were significantly enriched in the high score group, while cytokine-cytokine receptor interaction and neuroactivate ligand receptor interaction pathways were significantly enriched in the low score group (Fig. 5C). According to the KEGG analysis of GSVA, the genes in high score groups were enriched in signaling pathways such as vasopressin regulated water reabsorption, vibrio cholerae infection, endocytosis, p53 signaling pathway, while low score group involved in JAK-STAT signaling pathway, Toll like receptor signaling pathway, MAPK signaling pathway etc. (Fig. 5D).
Evaluation of the performance of the lactylation-related risk signature
To assess the performance of the module, patients were stratified into high- and low-score categories using the median score as the cutoff of GSVA of 7 lactylation-related genes (HN1 was excluded due to missing expression data) in TCGA samples (Figure S4). Investigation of the association between the grouping characteristics and clinical data indicated significant differences in the tumor and normal comparison (Fig. 6A). However, no significant differences were observed in age, gender and TNM stage (Fig. 6B-D). The Kaplan-Meier (KM) curve demonstrated that the low-score group displayed markedly worse survival outcomes (p = 0.025) (Fig. 6E). Although none of the individual genes had significant survival (Figure S5). These findings suggest that the model-related response signature gene sets may serve as predictive markers for the survival prognosis of CRC patients.
Immune infiltration analysis
Considering the crucial role of immune infiltration in cancer progression, we used the CIBERSORT algorithm to quantify the difference in the abundance of immune cell infiltration between the high- and low-score groups in the TCGA dataset. As shown in Fig. 7A, there were significant differences in the infiltration levels of 9 immune cell types between the high- and low-score groups, such as high-score group contained more CD4 memory T cells, follicular helper T, dendritic activated cells and eosinophils, whereas the low-score group had higher contents in Tregs and M2 macrophages.
Notably, among the 8 genes utilized in constructing the signature, the expression levels of 7 genes (CCNB1, CENPK, CXADR, HN1, MGAT4B, UBE2T, UGT8, and ZDHHC23) were significantly negatively linked to the abundance of Tregs (P < 0.05) (Fig. 7B). Five genes were significantly positively correlated with the CD4 memory activated T cells. Three genes (CENPK, CXADR, and ZDHHC23) were significantly negatively correlated with CD8 T cells. Furthermore, we assessed the immune cell infiltration between the NR and R groups in GSE19860. Interestingly, there was no significant difference in the abundance of all cell subsets between the two groups (Figure S6A). However, 8 lactation genes were still significantly associated with many immune cell infiltrates (Figure S6B). For example, NK activated cells were significantly negatively correlated with three genes (MGAT4B, CXADR and HN1). Both MGAT4B and CXADR were significantly correlated with a large number of immune cell abundance. These findings suggest that the 8 lactylation-related gene predictive signature may be associated with immune cell infiltration levels and could potentially serve as a biomarker for predicting the tumor immune microenvironment of CRC.
Genomic variation landscape and intra-tumor heterogeneity in different lactylation-gene subgroups
To investigate the differences in genomic mutations between lactylation-gene subgroups, we depicted the mutation landscape between high- and low-score groups (Fig. 8A). We found a distinct mutation profile between the two groups. For instance, the APC gene encodes a tumor suppressor protein that acts as an antagonist of the Wnt signaling pathway, and is involved in the cell migration and adhesion, transcriptional activation, and apoptosis. Mutations in the APC gene occur in most CRCs. In this study, APC mutations showed a frequency of 75% in the high-score group, higher than the 67% in the low-score group. By contrast, mutations in SYNE1 (a gene associated with autosomal recessive spinocerebellar ataxia type 8) occurred at 28% in the high-score group compared to 32% in the low-score group. The divergent mutational profiles between subgroups could account for the varying prognostic outcomes observed in lactylation-gene subgroups. Additionally, the co-occurrence and exclusivity of mutations among the top 25 mutated genes in high- and low-score groups were analyzed. The results revealed a higher prevalence of mutually exclusive mutations in the low-score group, as depicted in Fig. 8B. For instance, TP53 was mutually exclusive with more than half of the genes in the low group, but only with three genes in the high group.
Analysis of the correlation between the lactylation-gene and drug sensitivity and validation of gene expression
We examined the sensitivity profiles of the lactylation-gene score subgroups using data from GDSC2 database (https://www.cancerrxgene.org/downloads/drug_data). Our findings indicate that the high-score group exhibited a significantly higher half-maximal inhibitory concentration (IC50) for nutlin-3a (Fig. 9A, B), and expression levels of CCNB1, CXADR, UBE2T, UGT8, ZDHHC23 showed positive correlation with nutlin-3a IC50 (Fig. 9C). Conversely, the high-score group demonstrated significantly lower IC50 for 5-Fluorouracil (Fig. 9A, B), while CCNB1, CXADR, MGAT4B expression showed positive correlation with 5-Fluorouracil IC50 (Fig. 9C). These findings suggest that patients in the high-score group may respond better to 5-Fluorouracil treatment, whereas those in the low-score group may be more sensitive to nutlin-3a.
Characteristics of the included datasets
The research flow chart depicts the study design (Fig. 1). Four GEO datasets were included in the further analysis and characteristics of datasets were displayed in Table 1. The dataset exhibited batch effects, which were successfully removed (Figure S1). In summary, all datasets employed the Affymetrix Human Genome U133 Plus 2.0 Array for microarray analysis. GSE19860 were composed of 25 non-responder (NR) and 15 responder (R) samples of CRCs (Table 1). GSE62080 had 12 NR and 9 R samples. GSE62322 had 31 NR and 26 R samples, while GSE69657 had 17 NR and 13 R samples. Treatment response evaluation was performed at the end of the first-line FOLFOX treatment, and response rates varied from 31.03% to 60.60% across datasets.
Identification of differentially expressed genes
To investigate the factors that contribute to the response to chemotherapy in CRC, we compared expression profiles of chemotherapy responders and non-responders within the GSE19860 cohort, identifying a total of 780 differentially expressed genes (DEGs), comprising 353 upregulated and 427 downregulated genes (Fig. 2A), following the removal of duplicate values and multi-annotated probes. We referred to these DEGs as CRC chemotherapy response-associated genes.
A heatmap plot of the DEGs is presented in Fig. 2B. KEGG analysis revealed that the DEGs were related to Diabetic cardiomyopathy, Proteasome, Glutathione metabolism, Chemical carcinogenesis-reactive oxygen species, Thermogenesis, Citrate cycle (TCA cycle) in the KEGG pathway (Fig. 2C), while DEGs were related to aerobic respiration, oxidative phosphorylation, mitochondrial translation and ATP synthesis coupled electron transport in the biological process (BP) category (Fig. 2D). Regarding the molecular function (MF) category, DEGs were associated with structural constituent of ribosome, peroxidase activity, electron transfer activity and antioxidant activity etc. Moreover, they were principally enriched in mitochondrial protein-containing complex, mitochondrial inner membrane, mitochondrial matrix in the cellular component (CC) category (Fig. 2D).
Weighted gene co-expression network analysis and critical module identification
Based on the integrated expression matrix of 34,972 genes from TCGA COAD and GTEx (473 tumor and 348 normal samples), we established a weighted gene coexpression network. In Fig. 3A, the horizontal axis represents the threshold, and the vertical axis represents the evaluation parameters of the scale-free networks. As the network’s evaluation parameters increase, they become more consistent with the characteristics of a scale-free network. The horizontal line in the graph indicates a threshold value of 0.90. To create a scale-free network (Fig. 3A), we established a soft threshold power β of 9 based on the connection between the soft threshold and mean connectivity. The genes with similar expression patterns clustered together, and modules with a cut height difference of < 0.25 were merged, which ultimately produced 7 coexpression modules (Fig. 3B). The results revealed that 1628 genes in the blue module were strongly associated with CRC (cor = 0.9, p < 0.0001) and lactylation high score (cor = 0.84, p < 0.0001) (Fig. 3C). Therefore, 1628 hub genes of the blue module were subjected to further investigation.
We implemented enrichment analysis for the intersection of genes from DEGs and blue module genes. A total of 50 common genes were obtained, as shown in Fig. 3D. KEGG analysis elucidated that common genes were involved in “Cell cycle”, “Cellular senescenece”, “DNA replication” and “p53 signaling pathway”, as shown in Figure S2A. The results of GO analysis revealed that common genes were enriched in BP terms, including “regulation of mitotic sister chromatid segregation”, “negative regulation of chromosome segregation”, and “mitotic cell cycle checkpoint signaling”, as shown in Figure S2B. For CC ontology, the common genes are involved in “spindle microtubule”, “spindle”, “cyclin-dependent protein kinase holoenzyme complex”, “chromosomal region”. For MF, the results showed that “protein serine kinase activity”, “cytokine receptor activity”, “immune receptor activity” was the most significant term in common genes.
Identification of candidate hub genes via machine learning
To develop a robust chemotherapy response signature, we utilized a comprehensive machine learning framework on 50 candidate genes from the GSE19860 training dataset. This framework integrated feature selection and predictive modeling phases, encompassing 92 algorithm combinations to construct candidate models. The predictive performance of each combination was evaluated and compared using 10-fold cross-validation applied internally to the training set to ensure unbiased model selection. Model parameters were tuned through inner cross-validation during this process. The C-index was calculated for each model across both training and independent validation datasets. Notably, the signature derived from the ‘RF + GBM’ combination demonstrated the highest average C-index across all validation sets, as shown in Fig. 4A. Through variable importance analysis derived from both RF and GBM algorithms, this optimal model consistently identified 8 critical genes—CCNB1, CENPK, CXADR, HN1, MGAT4B, UBE2T, UGT8, and ZDHHC23—that were closely associated with the chemotherapy response in CRC patients.
To further evaluate the performance of the prediction model in response to chemotherapy in different CRC patients, the receiver operating characteristic curves of 8 feature genes obtained by machine learning method were drawn, and the AUC values of 8 disease feature genes were all > 0.5. Finally, the AUC values in the test dataset (GSE62080, GSE62322, GSE69657) were 0.824, 0.640, and 0.769, respectively (Fig. 4B), indicating its relatively good performance in predicting the response of patients to chemotherapy. Further, the ROC of genes in all datasets indicating that these 8 genes were characteristic genes of chemotherapy response with certain accuracy, as shown in Figure S3.
Functional analysis of the lactylation-related risk signature
Further, the results showed that 8 lactylation-related genes were significantly highly expressed in the CRC chemotherapy response group (Fig. 5A, B). The GSVA score was calculated based on the 8 lactylation-related genes in GSE19860, and patients were divided into high and low groups using the median score as the cut-off value to evaluate the performance of the prediction model (Figure S4). For KEGG enrichment of GSEA analysis, the results showed that cell cycle, oxidative phosphorylation and proteasome etc. were significantly enriched in the high score group, while cytokine-cytokine receptor interaction and neuroactivate ligand receptor interaction pathways were significantly enriched in the low score group (Fig. 5C). According to the KEGG analysis of GSVA, the genes in high score groups were enriched in signaling pathways such as vasopressin regulated water reabsorption, vibrio cholerae infection, endocytosis, p53 signaling pathway, while low score group involved in JAK-STAT signaling pathway, Toll like receptor signaling pathway, MAPK signaling pathway etc. (Fig. 5D).
Evaluation of the performance of the lactylation-related risk signature
To assess the performance of the module, patients were stratified into high- and low-score categories using the median score as the cutoff of GSVA of 7 lactylation-related genes (HN1 was excluded due to missing expression data) in TCGA samples (Figure S4). Investigation of the association between the grouping characteristics and clinical data indicated significant differences in the tumor and normal comparison (Fig. 6A). However, no significant differences were observed in age, gender and TNM stage (Fig. 6B-D). The Kaplan-Meier (KM) curve demonstrated that the low-score group displayed markedly worse survival outcomes (p = 0.025) (Fig. 6E). Although none of the individual genes had significant survival (Figure S5). These findings suggest that the model-related response signature gene sets may serve as predictive markers for the survival prognosis of CRC patients.
Immune infiltration analysis
Considering the crucial role of immune infiltration in cancer progression, we used the CIBERSORT algorithm to quantify the difference in the abundance of immune cell infiltration between the high- and low-score groups in the TCGA dataset. As shown in Fig. 7A, there were significant differences in the infiltration levels of 9 immune cell types between the high- and low-score groups, such as high-score group contained more CD4 memory T cells, follicular helper T, dendritic activated cells and eosinophils, whereas the low-score group had higher contents in Tregs and M2 macrophages.
Notably, among the 8 genes utilized in constructing the signature, the expression levels of 7 genes (CCNB1, CENPK, CXADR, HN1, MGAT4B, UBE2T, UGT8, and ZDHHC23) were significantly negatively linked to the abundance of Tregs (P < 0.05) (Fig. 7B). Five genes were significantly positively correlated with the CD4 memory activated T cells. Three genes (CENPK, CXADR, and ZDHHC23) were significantly negatively correlated with CD8 T cells. Furthermore, we assessed the immune cell infiltration between the NR and R groups in GSE19860. Interestingly, there was no significant difference in the abundance of all cell subsets between the two groups (Figure S6A). However, 8 lactation genes were still significantly associated with many immune cell infiltrates (Figure S6B). For example, NK activated cells were significantly negatively correlated with three genes (MGAT4B, CXADR and HN1). Both MGAT4B and CXADR were significantly correlated with a large number of immune cell abundance. These findings suggest that the 8 lactylation-related gene predictive signature may be associated with immune cell infiltration levels and could potentially serve as a biomarker for predicting the tumor immune microenvironment of CRC.
Genomic variation landscape and intra-tumor heterogeneity in different lactylation-gene subgroups
To investigate the differences in genomic mutations between lactylation-gene subgroups, we depicted the mutation landscape between high- and low-score groups (Fig. 8A). We found a distinct mutation profile between the two groups. For instance, the APC gene encodes a tumor suppressor protein that acts as an antagonist of the Wnt signaling pathway, and is involved in the cell migration and adhesion, transcriptional activation, and apoptosis. Mutations in the APC gene occur in most CRCs. In this study, APC mutations showed a frequency of 75% in the high-score group, higher than the 67% in the low-score group. By contrast, mutations in SYNE1 (a gene associated with autosomal recessive spinocerebellar ataxia type 8) occurred at 28% in the high-score group compared to 32% in the low-score group. The divergent mutational profiles between subgroups could account for the varying prognostic outcomes observed in lactylation-gene subgroups. Additionally, the co-occurrence and exclusivity of mutations among the top 25 mutated genes in high- and low-score groups were analyzed. The results revealed a higher prevalence of mutually exclusive mutations in the low-score group, as depicted in Fig. 8B. For instance, TP53 was mutually exclusive with more than half of the genes in the low group, but only with three genes in the high group.
Analysis of the correlation between the lactylation-gene and drug sensitivity and validation of gene expression
We examined the sensitivity profiles of the lactylation-gene score subgroups using data from GDSC2 database (https://www.cancerrxgene.org/downloads/drug_data). Our findings indicate that the high-score group exhibited a significantly higher half-maximal inhibitory concentration (IC50) for nutlin-3a (Fig. 9A, B), and expression levels of CCNB1, CXADR, UBE2T, UGT8, ZDHHC23 showed positive correlation with nutlin-3a IC50 (Fig. 9C). Conversely, the high-score group demonstrated significantly lower IC50 for 5-Fluorouracil (Fig. 9A, B), while CCNB1, CXADR, MGAT4B expression showed positive correlation with 5-Fluorouracil IC50 (Fig. 9C). These findings suggest that patients in the high-score group may respond better to 5-Fluorouracil treatment, whereas those in the low-score group may be more sensitive to nutlin-3a.
Discussion
Discussion
Chemotherapy is a primary treatment option for advanced CRC patients, yet only approximately 50% of these patients achieve an objective response. Lactylation modification, which occurs widely on both histone and non-histone proteins, has been increasingly implicated in cellular metabolism and cancer immunity [22]. Emerging evidence indicates that protein lactylation is essential in cancer development and progression [23]. In this study, for the first time, we comprehensively explored the potential roles of lactylation-related genes in the role of CRC chemotherapy response. Firstly, DEGs related to CRC chemotherapy response were identified, and then we intersected them with key modules derived from WGCNA. This yielded 50 candidate genes, which were subsequently subjected to machine learning analysis using 92 algorithm combinations. An 8 lactylation-related gene signature derived from the ‘RF + GBM’ combination model demonstrated the highest mean C-index across all training and validation datasets. The signature genes—CCNB1, CENPK, CXADR, HN1, MGAT4B, UBE2T, UGT8, and ZDHHC23—were identified as the most critical genes closely related to the chemotherapy response in CRC patients.
Among the 8 essential genes comprising our lactylation-related signature, all were significantly over-expressed in CRC tissues from patients who responded to chemotherapy compared to those who did not. This collective expression pattern underscores their potential role in mediating chemotherapy response in CRC, though their individual functional contexts reveal considerable complexity. The gene CCNB1 encodes a regulatory protein critical for mitosis and the G2/M phase transition, forming the maturation-promoting factor with p34(cdc2). Aberrant CCNB1 expression is common in tumors and has been associated with chemosensitivity in triple-negative breast cancer [24], as well as with tumor grade and metastasis in many cancers [25, 26]. CENPK, essential for kinetochore function and chromosome segregation [27], is associated with poorer survival in CRC [28]. While high CENPK expression has been linked to chemoresistance, metastasis, and proliferation in cervical cancer [29]. CXADR, initially identified as a receptor, shows prognostic significance in pancreatic adenocarcinoma where high expression correlates with shorter overall and disease-specific survival [30]. HN1 is a ubiquitously expressed and highly conserved gene associated with malignant behaviors of the cells and poor prognosis [31]. Studies demonstrate that HN1 silencing disrupts cell cycle progression through effects on regulators including Cyclin B1 [32]. UBE2T is a member of the UBE2 superfamily and is essential for the second step of ubiquitination, being closely linked to oncogenesis, metastasis, survival, and prognosis in cancer patients [33]. UGT8, encoding a key enzyme in the glucuronate pathway, typically contributes to inactivate the metabolism of chemotherapeutic agents [34]. ZDHHC23, a palmitoyl-acyltransferase, modulates protein localization and function through palmitoylation, with recent evidence implicating such enzymes in cancer cell proliferation, survival, and therapeutic resistance [35]. Notably, while several signature genes (including CENPK and UGT8) have been reported in the literature to be associated with chemoresistance in specific contexts, their presence in our 8-gene signature, which collectively defines a high lactylation-score subgroup showing better chemotherapy response, suggests that these genes may exert dual functions depending on the cellular background. This observation highlights the complexity of how individual genes function within a multi-gene predictive signature.
Numerous studies have developed devise approaches to predict therapeutic responses of CRC patients to specific chemotherapy regimens. For example, an online prognostic tool based on XGBoost has been established, incorporating metabolism-related SNPs and inflammatory markers to improve treatment precision and facilitate personalized chemotherapy strategies [36]. In stage II CRC patients with elevated PAX8-AS1 levels, fluorouracil-based adjuvant chemotherapy significantly prolonged relapse-free survival. PAX8-AS1 was shown to enhance chemosensitivity in CRC cells in vitro and in vivo by stabilizing PAX8 mRNA [37]. Which suggests the genetic features represent crucial mechanisms underlying chemotherapy response. In our study, patients were divided into high- and low-lactylation gene score groups. KEGG and GSVA enrichment analyses showed that cell cycle, oxidative phosphorylation and proteasome, and p53 signaling pathway were significantly enriched in the high lactylation score group. The p53 signaling pathway is well-established as a key regulator of chemosensitivity [38, 39]. Our data suggest that lactylation may influence chemotherapy response in CRC through regulation of the cell cycle and activation of p53 signaling. Emerging evidence indicates that protein lactylation may affect the response of cancer cells to chemotherapy [40]. Lactate modulates cancer progression through various mechanisms, including cell cycle regulation, immune suppression, and energy metabolism, highlighting the therapeutic potential of targeting lactate-related processes [41]. Interestingly, p53 lactylation is associated with poor prognosis in cancer patients with wild-type p53, and suppression of p53 lactylation attenuates tumorigenesis in animal models [42], further supporting the functional importance of lactylation in cancer therapy.
Immune cell infiltration analysis revealed significant differences in 9 immune cell types between the high- and low-score groups, such as CD4 memory T cells, follicular helper T, Tregs, M2 macrophages, activated dendritic cells and eosinophils. Notably, the expression levels of the majority of lactylation signature genes were significantly negatively correlated with Treg infiltration frequency, consistent with recent findings [43]. Reports indicates that histone lactylation modulates gene expression in both tumor and immune cells, contributing to malignancy and immunosuppression, while non-histone protein lactylation also regulates tumor progression and treatment resistance [44]. Furthermore, our drug sensitivity analysis demonstrated the potential of lactylation risk score in guiding medication selection for CRC patients. Evaluation of IC50 levels of 138 widely used chemotherapeutic agents revealed that patients in high lactylation score group exhibited greater sensitivity to 5-Fluorouracil. This aligns with our survival and treatment response data, confirming that the high-score group, characterized by a specific lactylation-driven transcriptome, represents a distinct subtype with enhanced sensitivity to conventional chemotherapy.
Gene regulatory network modeling, focusing on how genes interact and collaborate within modules to fulfill biological roles, has been broadly utilized in genomic analyses and is recognized for its efficacy in elucidating the intricacies of complex biological systems [45]. In this study, we evaluated over 11 machine learning algorithms, with model performance rigorously assessed through cross-validation. While the 8 gene lactylation signature demonstrated overall strong predictive performance for chemotherapy response, we observed some variability in its efficacy across independent validation cohorts (AUCs: 0.824, 0.640, 0.769). This variation may be attributed to several factors, including differences in chemotherapy regimens (FOLFOX versus FOLFIRI) which involve distinct molecular mechanisms, inherent heterogeneity in patient populations across datasets, and the limited sample sizes of some validation cohorts. Ensuring consistency in data sources could enhance model relevance. Our study’s lactylation risk score not only demonstrated strong predictive performance for chemotherapy response but also connected gene mutation and immune cell infiltration, shedding light on potential mechanisms. However, to bolster the robustness of our findings, it is crucial to include a larger sample size and diverse validation cohorts. Moreover, the molecular mechanisms underlying the hub genes of lactylation in CRC remain elusive and warrant further investigation.
Chemotherapy is a primary treatment option for advanced CRC patients, yet only approximately 50% of these patients achieve an objective response. Lactylation modification, which occurs widely on both histone and non-histone proteins, has been increasingly implicated in cellular metabolism and cancer immunity [22]. Emerging evidence indicates that protein lactylation is essential in cancer development and progression [23]. In this study, for the first time, we comprehensively explored the potential roles of lactylation-related genes in the role of CRC chemotherapy response. Firstly, DEGs related to CRC chemotherapy response were identified, and then we intersected them with key modules derived from WGCNA. This yielded 50 candidate genes, which were subsequently subjected to machine learning analysis using 92 algorithm combinations. An 8 lactylation-related gene signature derived from the ‘RF + GBM’ combination model demonstrated the highest mean C-index across all training and validation datasets. The signature genes—CCNB1, CENPK, CXADR, HN1, MGAT4B, UBE2T, UGT8, and ZDHHC23—were identified as the most critical genes closely related to the chemotherapy response in CRC patients.
Among the 8 essential genes comprising our lactylation-related signature, all were significantly over-expressed in CRC tissues from patients who responded to chemotherapy compared to those who did not. This collective expression pattern underscores their potential role in mediating chemotherapy response in CRC, though their individual functional contexts reveal considerable complexity. The gene CCNB1 encodes a regulatory protein critical for mitosis and the G2/M phase transition, forming the maturation-promoting factor with p34(cdc2). Aberrant CCNB1 expression is common in tumors and has been associated with chemosensitivity in triple-negative breast cancer [24], as well as with tumor grade and metastasis in many cancers [25, 26]. CENPK, essential for kinetochore function and chromosome segregation [27], is associated with poorer survival in CRC [28]. While high CENPK expression has been linked to chemoresistance, metastasis, and proliferation in cervical cancer [29]. CXADR, initially identified as a receptor, shows prognostic significance in pancreatic adenocarcinoma where high expression correlates with shorter overall and disease-specific survival [30]. HN1 is a ubiquitously expressed and highly conserved gene associated with malignant behaviors of the cells and poor prognosis [31]. Studies demonstrate that HN1 silencing disrupts cell cycle progression through effects on regulators including Cyclin B1 [32]. UBE2T is a member of the UBE2 superfamily and is essential for the second step of ubiquitination, being closely linked to oncogenesis, metastasis, survival, and prognosis in cancer patients [33]. UGT8, encoding a key enzyme in the glucuronate pathway, typically contributes to inactivate the metabolism of chemotherapeutic agents [34]. ZDHHC23, a palmitoyl-acyltransferase, modulates protein localization and function through palmitoylation, with recent evidence implicating such enzymes in cancer cell proliferation, survival, and therapeutic resistance [35]. Notably, while several signature genes (including CENPK and UGT8) have been reported in the literature to be associated with chemoresistance in specific contexts, their presence in our 8-gene signature, which collectively defines a high lactylation-score subgroup showing better chemotherapy response, suggests that these genes may exert dual functions depending on the cellular background. This observation highlights the complexity of how individual genes function within a multi-gene predictive signature.
Numerous studies have developed devise approaches to predict therapeutic responses of CRC patients to specific chemotherapy regimens. For example, an online prognostic tool based on XGBoost has been established, incorporating metabolism-related SNPs and inflammatory markers to improve treatment precision and facilitate personalized chemotherapy strategies [36]. In stage II CRC patients with elevated PAX8-AS1 levels, fluorouracil-based adjuvant chemotherapy significantly prolonged relapse-free survival. PAX8-AS1 was shown to enhance chemosensitivity in CRC cells in vitro and in vivo by stabilizing PAX8 mRNA [37]. Which suggests the genetic features represent crucial mechanisms underlying chemotherapy response. In our study, patients were divided into high- and low-lactylation gene score groups. KEGG and GSVA enrichment analyses showed that cell cycle, oxidative phosphorylation and proteasome, and p53 signaling pathway were significantly enriched in the high lactylation score group. The p53 signaling pathway is well-established as a key regulator of chemosensitivity [38, 39]. Our data suggest that lactylation may influence chemotherapy response in CRC through regulation of the cell cycle and activation of p53 signaling. Emerging evidence indicates that protein lactylation may affect the response of cancer cells to chemotherapy [40]. Lactate modulates cancer progression through various mechanisms, including cell cycle regulation, immune suppression, and energy metabolism, highlighting the therapeutic potential of targeting lactate-related processes [41]. Interestingly, p53 lactylation is associated with poor prognosis in cancer patients with wild-type p53, and suppression of p53 lactylation attenuates tumorigenesis in animal models [42], further supporting the functional importance of lactylation in cancer therapy.
Immune cell infiltration analysis revealed significant differences in 9 immune cell types between the high- and low-score groups, such as CD4 memory T cells, follicular helper T, Tregs, M2 macrophages, activated dendritic cells and eosinophils. Notably, the expression levels of the majority of lactylation signature genes were significantly negatively correlated with Treg infiltration frequency, consistent with recent findings [43]. Reports indicates that histone lactylation modulates gene expression in both tumor and immune cells, contributing to malignancy and immunosuppression, while non-histone protein lactylation also regulates tumor progression and treatment resistance [44]. Furthermore, our drug sensitivity analysis demonstrated the potential of lactylation risk score in guiding medication selection for CRC patients. Evaluation of IC50 levels of 138 widely used chemotherapeutic agents revealed that patients in high lactylation score group exhibited greater sensitivity to 5-Fluorouracil. This aligns with our survival and treatment response data, confirming that the high-score group, characterized by a specific lactylation-driven transcriptome, represents a distinct subtype with enhanced sensitivity to conventional chemotherapy.
Gene regulatory network modeling, focusing on how genes interact and collaborate within modules to fulfill biological roles, has been broadly utilized in genomic analyses and is recognized for its efficacy in elucidating the intricacies of complex biological systems [45]. In this study, we evaluated over 11 machine learning algorithms, with model performance rigorously assessed through cross-validation. While the 8 gene lactylation signature demonstrated overall strong predictive performance for chemotherapy response, we observed some variability in its efficacy across independent validation cohorts (AUCs: 0.824, 0.640, 0.769). This variation may be attributed to several factors, including differences in chemotherapy regimens (FOLFOX versus FOLFIRI) which involve distinct molecular mechanisms, inherent heterogeneity in patient populations across datasets, and the limited sample sizes of some validation cohorts. Ensuring consistency in data sources could enhance model relevance. Our study’s lactylation risk score not only demonstrated strong predictive performance for chemotherapy response but also connected gene mutation and immune cell infiltration, shedding light on potential mechanisms. However, to bolster the robustness of our findings, it is crucial to include a larger sample size and diverse validation cohorts. Moreover, the molecular mechanisms underlying the hub genes of lactylation in CRC remain elusive and warrant further investigation.
Conclusions
Conclusions
In summary, we developed a robust 8 lactylation-related gene signature for predicting chemotherapy response in CRC patients by integrating multiple bioinformatics analyses and 92 combinatorial machine learning algorithms. Our findings provide novel insights into the mechanisms by which lactylation is involved in CRC chemotherapy response. Collectively, our results highlight the potential of combining machine learning with lactylation analysis as a valuable tool for advancing personalized therapy and optimizing clinical management of CRC.
In summary, we developed a robust 8 lactylation-related gene signature for predicting chemotherapy response in CRC patients by integrating multiple bioinformatics analyses and 92 combinatorial machine learning algorithms. Our findings provide novel insights into the mechanisms by which lactylation is involved in CRC chemotherapy response. Collectively, our results highlight the potential of combining machine learning with lactylation analysis as a valuable tool for advancing personalized therapy and optimizing clinical management of CRC.
Supplementary Information
Supplementary Information
출처: PubMed Central (JATS). 라이선스는 원 publisher 정책을 따릅니다 — 인용 시 원문을 표기해 주세요.
🏷️ 같은 키워드 · 무료전문 — 이 논문 MeSH/keyword 기반
- A Phase I Study of Hydroxychloroquine and Suba-Itraconazole in Men with Biochemical Relapse of Prostate Cancer (HITMAN-PC): Dose Escalation Results.
- Impact of Comorbidities on Clinical Outcomes and Quality of Life of Patients With Hormone Receptor-Positive/Human Epidermal Growth Factor Receptor 2-Negative (HR+/HER2-) Advanced Breast Cancer Treated With Palbociclib in the POLARIS Study.
- Key Considerations for Targeting in Pancreatic Cancer: Potential Impact on the Treatment Paradigm.
- Overcoming Chemoresistance in Glioblastoma: Mechanisms, Therapeutic Strategies, and Functional Precision Medicine.
- Current Systemic Treatment Options for Advanced Pancreatic Cancer-An Overview Article.
- A Phase II Study of Durvalumab, Doxorubicin, and Ifosfamide in Recurrent and/or Metastatic Pulmonary Sarcomatoid Carcinoma (KCSG LU-19-24).