본문으로 건너뛰기
← 뒤로

Multi-omics analysis of the HMGB2 tumor epithelial cells in lactylation subgroups in colorectal cancer.

1/5 보강
Cell & bioscience 📖 저널 OA 92.3% 2023: 1/1 OA 2024: 1/1 OA 2025: 14/14 OA 2026: 6/8 OA 2023~2026 2025 Vol.15(1) p. 158
Retraction 확인
출처

Hu S, Lou J, Ding M, Chen Y, Qin J, Liu Z

📝 환자 설명용 한 줄

Colorectal cancer (CRC) is a prevalent malignancy, yet the role of lactylation in its progression remains unclear.

이 논문을 인용하기

↓ .bib ↓ .ris
APA Hu S, Lou J, et al. (2025). Multi-omics analysis of the HMGB2 tumor epithelial cells in lactylation subgroups in colorectal cancer.. Cell & bioscience, 15(1), 158. https://doi.org/10.1186/s13578-025-01491-x
MLA Hu S, et al.. "Multi-omics analysis of the HMGB2 tumor epithelial cells in lactylation subgroups in colorectal cancer.." Cell & bioscience, vol. 15, no. 1, 2025, pp. 158.
PMID 41261427 ↗

Abstract

Colorectal cancer (CRC) is a prevalent malignancy, yet the role of lactylation in its progression remains unclear. This study investigates High Mobility Group Box 2 positive tumor epithelial cells (HMGB2Epi), a lactylation-associated subpopulation. By integrating multi-omics data, including proteomics, single-cell, spatial, and bulk transcriptomics, we explored the function of HMGB2Epi in CRC. Elevated lactylation levels in CRC tissues were correlated with poor prognosis. Single-cell analysis identified HMGB2Epi as a central lactylation-enriched subpopulation. Functionally, HMGB2 enhanced the Warburg effect, promoting CRC cell proliferation, migration, and invasion. HMGB2 knockout reduced lactylation levels and inhibited tumor progression. Mechanistically, NFYB directly bound to the HMGB2 promoter, forming the NFYB-HMGB2 axis that drives lactylation and metabolic reprogramming. Cell-cell communication analysis revealed enhanced interactions between HMGB2Epi and fibroblasts, endothelial cells, and T/NK cells. Molecular dynamics and in-vitro assays suggest that BI-2536 downregulates HMGB2 and lactylation in CRC cells. A risk model based on HMGB2Epi outperformed 125 previously published models in independent cohorts. In summary, HMGB2Epi represents a key lactylation-enriched subgroup, with the NFYB-HMGB2 axis driving CRC progression via lactylation. BI-2536 as a tool compound implicating the HMGB2-lactylation axis, and the HMGB2Epi-based risk model provides a novel target for precision CRC therapy.

🏷️ 키워드 / MeSH 📖 같은 키워드 OA만

같은 제1저자의 인용 많은 논문 (5)

📖 전문 본문 읽기 PMC JATS · ~90 KB · 영문

Introduction

Introduction
Colorectal cancer (CRC) is one of the most common malignancies worldwide, with increasing incidence and mortality rates, posing a severe threat to human health [1]. According to global cancer statistics, CRC has become one of the leading causes of cancer-related deaths, particularly in Western developed countries and certain developing nations [2] Despite recent advancements in comprehensive therapeutic strategies for CRC, including surgery, radiotherapy, chemotherapy, immunotherapy, and targeted therapy, tumor heterogeneity remains a significant challenge. As a result, many patients continue to face issues such as tumor recurrence, distant metastasis, and drug resistance [3]. Therefore, a deeper understanding of the mechanisms underlying CRC development and the identification of new molecular targets are crucial for improving patient prognosis.
In recent years, as cancer metabolism research deepens, metabolic reprogramming has emerged as a key driver of tumor cell survival and progression [4]. Glycolysis is a hallmark of CRC and many other cancers, leading to excessive lactate accumulation, which in turn affects the tumor microenvironment (TME) and immune regulation [5, 6]. However, lactate is not only a byproduct of glycolysis. Recent studies have demonstrated that lactate serves as a substrate for post-translational modifications (PTMs), forming lactylation, a novel modification that plays a significant role in various tumor processes [7, 8]. The discovery of lactylation has expanded our understanding of lactate’s role in cancer biology, suggesting its potential involvement in processes such as transcriptional regulation, metabolic adaptation, and immune evasion. Nevertheless, the specific role of lactylation in CRC and its regulatory mechanisms remain incompletely understood, especially its function in distinct tumor cell subpopulations.
High Mobility Group Box 2 (HMGB2) is a nuclear chromatin-associated protein belonging to the High Mobility Group (HMG) family [9]. HMGB2 plays critical roles in various biological processes, including DNA structure regulation, gene transcription, cell cycle regulation, and DNA damage repair [10]. Previous studies have shown that HMGB2 is overexpressed in multiple malignancies and is closely associated with cell proliferation, invasion, drug resistance, and poor prognosis [11]. In CRC, HMGB2 not only promotes tumor cell proliferation and survival but may also influence tumor progression through regulation of the TME [12]. However, the precise functions and regulatory mechanisms of HMGB2 remain unclear.
This study employs an integrated multi-omics approach, combining bulk transcriptomics, single-cell RNA sequencing (scRNA-seq), spatial transcriptomics, and proteomics, to systematically analyze the distribution of lactylation in different CRC cell subpopulations. The study focuses on the lactylation levels in HMGB2+ tumor epithelial cells and their impact on CRC initiation and progression. This research aims to provide insights into the crucial role of lactylation in CRC progression, enrich our understanding of HMGB2 function, and identify novel molecular targets for precision therapies in CRC.

Materials and methods

Materials and methods

CRC transcriptome sequencing cohort collection and processing
This study collected data from eight independent multi-center CRC Bulk RNA sequencing cohorts, all containing survival information (overall survival [OS] or relapse-free survival [RFS]). These datasets were sourced from the TCGA and GEO databases, including TCGA-CRC (TCGA-COAD + TCGA-READ), GSE14333, GSE161158, GSE17538, GSE29621, GSE31595, GSE38832, GSE39582, and GSE72970. Prior to the combined analysis, all GEO datasets underwent batch effect removal using the “sva” R package, and a unified GEO cohort was constructed. Additionally, this study included four independent CRC single-cell RNA sequencing (scRNA-seq) datasets (GSE188711, GSE221575, GSE200997, GSE132465). Batch effect correction during data integration was performed using the “harmony” R package, with data quality control conducted using the “Seurat” R package. Moreover, this study incorporated four CRC spatial transcriptomics (ST) datasets sourced from the research of Wu et al. [13]. Differentially expressed genes (DEGs) for cell populations in this study were identified using the “FindAllMarkers” function from the Seurat R package (min.pct = 0.25, log2fc.threshold > 0.25, p < 0.05). Preprocessing and cell annotation of single-cell and spatial transcriptomics data were rigorously performed according to the methods outlined in previous studies [14] to ensure scientific accuracy and consistency in data analysis.

Inference of copy number variation
To infer large-scale chromosomal copy number variations (CNVs) and identify malignant epithelial cells within the single-cell RNA sequencing dataset, we applied the “copycat” algorithm implemented in the R package copycat [15]. This method utilizes an integrative Bayesian segmentation model to detect genome-wide aneuploidy based on gene expression profiles, enabling the distinction between tumor cells and normal diploid cells. In this study, single-cell gene expression matrices were normalized and log-transformed as input. The algorithm was run with default parameters, classifying cells into two categories: aneuploid (malignant) and diploid (non-malignant). The inferred CNV heatmap was visualized to assess chromosomal gains and losses across individual cells.

Inference of Epi_3 cell infiltration
To estimate the infiltration level of the Epi_3 epithelial subpopulation in bulk RNA-seq datasets, we employed a gene signature–based deconvolution approach using single-sample Gene Set Enrichment Analysis (ssGSEA). The Epi_3 gene signature was derived from our single-cell RNA-seq analysis, consisting of differentially expressed marker genes specifically enriched in the Epi_3 cluster (log2FC > 1, adjusted p-value < 0.05). The final gene set included the following 24 genes: UBE2C, HMGB2, PTTG1, H2AFZ, TUBA1B, STMN1, HMGB1, TUBB, CKS1B, CDKN3, KIAA0101, CENPW, CKS2, HMGN2, RANBP1, CCNB1, BIRC5, TOP2A, CDC20, CENPF, UBE2S, MAD2L1, MKI67, and LDHB. We applied the GSVA R package to compute ssGSEA enrichment scores for each sample in the TCGA and GEO CRC bulk RNA-seq cohorts. The resulting ssGSEA score for each sample was interpreted as a surrogate measure of Epi_3 infiltration.

Cell–cell communication analysis
To investigate cell–cell communication patterns, we used the CellChat R package. The single-cell RNA-seq expression matrix and cell annotations (11 epithelial subpopulations and 7 major cell types: Mast cells, Fibroblasts, Myeloid cells, B cells, T/NK cells, Plasma cells, and Endothelial cells) were input into the analysis. Using the CellChatDB.human ligand–receptor database, we calculated intercellular communication probabilities via the computeCommunProb() and computeCommunProbPathway() functions, followed by filtering low-confidence interactions with filterCommunication(). Communication networks were visualized using netVisual_circle() and netVisual_heatmap().

High-dimensional weighted gene co-expression network analysis (hdWGCNA) and WGCNA
High-dimensional weighted gene co-expression network analysis (hdWGCNA) was conducted using the “hdWGCNA” R package[16], while traditional weighted gene co-expression network analysis (WGCNA) was performed using the "WGCNA" R package to identify co-expression gene modules closely associated with HMGB2+Epi cells and lactylation modifications[17].

Prognostic analysis
Prognostic analysis, including overall survival (OS) and relapse-free survival (RFS) analysis, was performed with univariate Cox regression analysis, using the “survival” and “survminer” R packages to assess the impact on patient survival prognosis.

Enrichment analysis
Gene set variation analysis (GSVA) was conducted using the ssGSEA algorithm from the “GSVA” R package to estimate the activity scores of key gene sets. GO/KEGG and gene set enrichment analysis (GSEA) was performed using the “clusterProfiler” R package to analyze molecular pathways. The “KEGG” and “HALLMARK” gene sets used for enrichment analysis were sourced from the Molecular Signatures Database (MSD)[18].

Calculation of lactylation score
Based on prior literature, we curated a lactylation-related gene set (n = 332 genes; see Supplementary Table 1) and applied single-sample gene set enrichment analysis (ssGSEA) to quantify lactylation-associated transcriptional activity for each sample (and each single cell) [19]. For bulk RNA-seq and single-cell RNA-seq data, raw counts were normalized; ssGSEA scores for each sample or cell were then computed using GSVA::gsva (expression matrix, method = “ssgsea”, kcdf = “Gaussian”, abs.ranking = TRUE, ssgsea.norm = TRUE). Samples (or single cells) were subsequently stratified into high- and low-lactylation groups based on the median ssGSEA score.

Protein-level analysis based on public databases
Immunohistochemical and Protein Expression Analysis of HMGB2 and NFYB HMGB2 protein expression levels in normal and tumor tissues were analyzed using The University of Alabama at Birmingham Cancer Data Analysis Portal database[20]. The Human Protein Atlas database[21] was used to examine the immunohistochemical (IHC) expression levels of HMGB2 and NFYB in CRC tissues.

Construction of a lactylation-associated prognostic model
In this study, we first performed WGCNA on the TCGA-CRC and GEO cohorts using the WGCNA R package. Two phenotypic traits were used for the analysis: the ssGSEA-derived lactylation score and the enrichment score of HMGB2+ epithelial cells (HMGB2+Epi). Module–trait correlations were assessed to identify key gene modules associated with both features. The intersection of genes from the two most correlated modules was subjected to univariate Cox proportional hazards regression analysis in both cohorts to identify genes significantly associated with CRC prognosis (p < 0.05). These prognostic genes were used as input features for model construction. A leave-one-out cross-validation (LOOCV) framework, as described by Liu et al., was applied to identify the optimal gene combination and construct a multigene risk model. The prognostic value of the final model was validated across multiple independent CRC cohorts and compared with previously published prognostic signatures using the concordance index (C-index) (Supplementary Table 3) [22].
All bioinformatics analyses strictly followed the methods outlined in previous research [14].

Collection of external tissue sample cohorts
A total of 28 paired tumor and adjacent non-tumor tissue samples from CRC patients were collected to assess HMGB2 protein expression levels. This study was approved by the Ethics Committee of the First Affiliated Hospital of Nanjing Medical University, and all patients provided written informed consent.

Cell culture
This study utilized the normal human colon epithelial cell line NCM460 and eight CRC cell lines, including SW480, SW620, HCT116, DLD1, HCT15, HT-29, LOVO, and HCT8, all of which were purchased from the Shanghai Cell Bank. All cells were cultured in Dulbecco's Modified Eagle Medium (DMEM) or RPMI-1640 medium (Gibco, USA) supplemented with 10% fetal bovine serum (FBS) and 1% penicillin–streptomycin (Penicillin–Streptomycin). The culture environment was maintained at 37 °C in a 5% CO2 humidified incubator, with cell passaging ratios controlled between 1:3 and 1:5. Fresh culture medium was replaced every 2 to 3 days.

Generation of stable HMGB2 knockout and overexpression cell models
The CRISPR-Cas9 gene editing technology was used to establish stable HMGB2 knockout (KO) cell lines. Specific sgRNA sequences targeting HMGB2 were designed based on the Genetic Perturbation Platform portal (https://portals.broadinstitute.org/gppx/crispick/public). The designed sgRNA sequences are as follows:sgRNA-1-F: CACCGTGACAAAGCTCGCTATGACA, sgRNA-1-R: CTGTCATAGCGAGCTTTGTCACAAA,

sgRNA-2-F: CACCGGAAGTGTTCGGAGAGATGGA,

sgRNA-2-R: CTCCATCTCTCCGAACACTTCCAAA,

sgRNA-3-F: CACCGCCGGCAGGTCTGCACGAAGA,

sgRNA-3-R: CTCTTCGTGCAGACCTGCCGGCAAA.

After annealing, the sgRNA oligonucleotide sequences were cloned into the lentiCRISPRv2 vector. The HMGB2 knockout vector and packaging plasmids (psPAX2 and pMD2.G) were co-transfected into HEK293T cells using Lipofectamine 3000 reagent (Thermo Fisher, USA) according to the standard lentivirus packaging protocol. After 48 h of transfection, the cell culture supernatant was collected, centrifuged, and filtered through a 0.45 μm membrane to obtain high-titer lentiviral particles. Polybrene (8 μg/mL) was used to enhance virus infection in DLD1 cells, and after 48 h of infection, the medium was replaced with fresh culture medium. Stable knockout cell lines were selected using 2 μg/mL puromycin (Sigma-Aldrich), and selection continued for 7 days. Western blotting was then performed to verify the knockout efficiency of HMGB2, and stable knockout cell lines were selected for further experiments. For the generation of HMGB2 overexpression cell lines, lentiviral-mediated gene transfection was employed. The HMGB2 lentiviral vector and packaging system provided by GeneChem (Shanghai GeneChem Biotechnology Co., Ltd.) were used for lentivirus packaging and infection, following the manufacturer’s instructions. The HMGB2-overexpressing lentiviral solution was used to infect SW620 cells, and after 48 h of infection, the medium was replaced with fresh culture medium. Selection of stable overexpression cell lines was performed using 2 μg/mL puromycin for 7 days. Western blotting was then used to confirm the overexpression efficiency of HMGB2. Stable cell lines were used for subsequent functional cell assays and in vivo animal experiments.

Transient transfection
In this study, small interfering RNA (siRNA) was used for transient transfection to knock down NFYB expression. NFYB siRNA (GenePharma, China) was transfected into DLD1 cells using Lipofectamine 3000 transfection reagent (Invitrogen, USA) according to the manufacturer's instructions. After 48 h, the cells were collected, and Western blotting was performed to assess the knockdown efficiency of NFYB. The siRNA sequences used were as follows:SiRNA_1 (5′ → 3′): GGAUCUACCUCUUCUUCAATT;

SiRNA_2 (5′ → 3′): CCUGAAGACCUACGAGAAATT;

SiRNA_3 (5′ → 3′): GCUACCUUCUACGACAUUATT.

Recombinant human HMGB2 protein (rHMGB2, Catalog # H00003148-H01, Abnova) was added to the culture medium of DLD1 cells to overexpress HMGB2.

Total protein extraction and western blot (WB)
Total cellular proteins were extracted using RIPA lysis buffer (Beyotime, China), and protein concentrations were determined using the BCA protein assay kit (Thermo Fisher, USA). Protein samples were boiled for 5 min, separated by SDS-PAGE, and transferred onto PVDF membranes (Millipore). After blocking with 5% BSA for 2 h, the membranes were incubated overnight at 4 °C with primary antibodies. The membranes were then washed three times with PBS (5 min per wash) and incubated for 1 h with HRP-conjugated secondary antibodies (Biosharp). Protein bands were detected using ECL chemiluminescent substrate (Thermo Fisher, USA) and analyzed with the Bio-Rad imaging system. GAPDH was used as a loading control. ImageJ software was used for densitometric analysis of the bands. The primary antibodies used for Western blotting included: HMGB2 (Abcam, batch: ab124670), NFYB (Abcam, batch: ab6559), Pan-Lactate (ABclonal, batch: A23004), Histone H3 (Proteintech, batch: 17168-1-AP), FBP1 (ABclonal, batch: A11664), HK2 (ABclonal, batch: A22319), LDHA (ABclonal, batch: A20991), PKM2 (ABclonal, batch: A20991), GAPDH (ABclonal, batch: A19056). Secondary antibodies included Goat anti-mouse IgG, peroxidase conjugated, H + L (Biosharp) and Goat anti-rabbit IgG, peroxidase conjugated, H + L (Biosharp).

Immunohistochemistry (IHC)
IHC was performed to evaluate the expression of HMGB2 and NFYB in CRC tissues. Tissue sections were deparaffinized with xylene, followed by gradient ethanol dehydration. Antigen retrieval was performed using citrate buffer (pH 6.0) under high temperature and pressure. Endogenous peroxidase activity was blocked with 3% H2O2, and non-specific binding was blocked with 5% BSA. The sections were incubated overnight at 4 °C with primary antibodies for HMGB2 (Abcam, batch: ab124670) and NFYB (Abcam, batch: ab6559). The next day, after PBS washing, HRP-conjugated secondary antibodies (Biosharp) were applied, followed by DAB staining. Hematoxylin counterstaining was performed, and the sections were dehydrated and mounted.

Cellular functional assays
Cellular functional assays included CCK-8 proliferation assays, colony formation assays, Transwell migration and invasion assays, and flow cytometry.
CCK-8 Proliferation Assay: Cells (2 × 103 per well) were seeded in 96-well plates, with six replicates per group. After 24 h of culture, 10 μL of CCK-8 reagent (Dojindo, Japan) was added to each well. The cells were then incubated for an additional 2 h, and absorbance (OD value) was measured at a wavelength of 450 nm.
Colony Formation Assay: Cells (500 per well) were seeded in six-well plates and cultured for 14 days. After PBS washing, the cells were fixed with 4% paraformaldehyde for 15 min, stained with crystal violet for 20 min, and then photographed. The number of colonies was counted.
Transwell Migration and Invasion Assays: Transwell chambers (8 μm pore size, Corning, USA) were used for the migration and invasion assays. For the migration assay, cells (2 × 105 per well) were resuspended in serum-free medium and added to the upper chamber, while the lower chamber contained medium with 10% FBS. After 24 h, non-migrated cells were removed using a cotton swab, and the remaining cells were fixed with 4% paraformaldehyde for 15 min. Crystal violet staining was performed, and migration was quantified by counting cells under a microscope. For the invasion assay, the upper chamber was pre-coated with Matrigel (Corning, USA), and the remaining steps were performed as in the migration assay.
Flow Cytometry: The effect of HMGB2 on apoptosis in CRC cells was assessed using Annexin V-FITC/PI double staining and flow cytometry. DLD1 HMGB2 knockout cells were collected and treated with trypsin without EDTA, washed with PBS, and resuspended. Cells were stained using the Annexin V-FITC/PI Apoptosis Detection Kit (Beyotime, China), incubated at room temperature in the dark for 15 min, and analyzed by flow cytometry (BD FACSCanto II). The proportions of early apoptotic (Annexin V+/PI⁻) and late apoptotic (Annexin V+/PI+) cells were determined.

Xenograft tumorigenesis in nude mice
The xenograft tumorigenesis assay was conducted to evaluate the role of HMGB2 in CRC cell proliferation and tumor formation. Female BALB/c nude mice (4–6 weeks old, SPF grade) were used, and all procedures were performed according to institutional animal ethics guidelines. HMGB2-overexpressing cells (SW620) and HMGB2 knockout cells (DLD1) were suspended in PBS (2 × 106 cells/100 μL) and subcutaneously injected into the right axillary region of nude mice under sterile conditions, with four mice per group. Tumor volume was measured every 7 days, and tumor growth was monitored for approximately 5 weeks. When tumor volume reached approximately 1000 mm3 or if the mice showed signs of distress, the experiment was terminated. Mice were euthanized, and tumors were excised, weighed, and stored for subsequent immunohistochemical analysis and H&E staining.

Identification of transcription factors in single-cell analysis and chromatin immunoprecipitation (ChIP) assay
In this study, the SCENIC (Single-Cell Regulatory Network Inference and Clustering) method was employed to identify key transcription factors at the single-cell level in CRC. Using the “SCENIC” R package, a gene regulatory network was constructed based on single-cell RNA sequencing data. The analysis pipeline includes: (1) Inferring gene co-expression networks using GENIE3; (2) Identifying potential regulatory transcription factors through RcisTarget; (3) Calculating transcription factor activity scores for each cell using AUCell and identifying the most active regulatory modules. Chromatin immunoprecipitation (ChIP) was performed to validate the binding of the transcription factor NFYB to HMGB2. The JASPAR database was used to predict potential binding sites for NFYB in the HMGB2 promoter region, with the motif site identified as TTATTGGTC. DLD1 cells were used for the experiment, following a standard ChIP protocol. After cell fixation, 1% formaldehyde was used for crosslinking for 10 min, followed by cell lysis and sonication to fragment the chromatin to a size range of 200–500 bp. NFYB antibody (Abcam, Catalog #ab6559) was used for incubation, and Protein A/G magnetic beads were employed to capture the immunocomplex. The precipitated DNA was extracted and subjected to qPCR to detect NFYB binding sites in the HMGB2 promoter region, with IgG antibody used as a negative control. The primer sequences for amplification were as follows: Forward primer (5′ → 3′): ACTGGTTACCTTTTTAGACAGT, Reverse primer (5′ → 3′): CTTGGCACGATATGCAGCAA.

Risk prognostic model and nomogram construction
This study developed a risk prognostic model for HMGB2-positive tumor epithelial cells based on the leave-one-out cross-validation (LOOCV) machine learning framework developed by Zaoqu Liu et al. [22]. The model construction strictly followed the methodology of a previous study [14]. Additionally, to further validate the model's efficacy, previously published CRC risk prognostic models were retrieved and analyzed. The risk feature genes of these models were extracted, and their concordance index (C-index) was calculated. The C-index of this study's risk model was then compared to those of previously published models to assess relative predictive performance. Furthermore, a nomogram was constructed using the "rms" R package, where risk scores were calculated based on regression coefficients and combined with patient clinical features (such as stage) to establish a comprehensive survival prediction model. The model's predictive performance was evaluated using calibration curves and receiver operating characteristic (ROC) curves to assess accuracy and clinical applicability.

Drug screening and molecular dynamics simulation
In this study, the R package “oncoPredict” was used to screen potential inhibitors of HMGB2, followed by evaluation of the binding interactions between HMGB2 and the candidate drugs using the CB-Dock2 molecular docking platform (https://cadd.labshare.cn/cb-dock2/index.php) [23]. The Vina score was calculated, and the drug with the lowest binding energy was selected for molecular dynamics (MD) simulations. The 3D structure files of the drugs were downloaded from the PubChem database, while the PDB file for HMGB2 was obtained from the AlphaFold Protein Structure Database. AutoDockTools was used to optimize the protein and ligand structures, and MD simulations were performed using GROMACS 2022 software with the AMBER99SB-ILDN force field, TIP3P water model for solvation, and energy minimization, NVT/NPT equilibration, followed by a 100 ns production run. The analysis of the results included root-mean-square deviation (RMSD) for protein–ligand stability, root-mean-square fluctuation (RMSF) for residue flexibility, hydrogen bonding interactions, radius of gyration (Rg), and solvent-accessible surface area (SASA) to assess the binding affinity of potential HMGB2 inhibitors. The inhibitor BI-2536 used in the in vitro experiments was purchased from MedChemExpress (MCE), batch number: HY-50698. In 48-well plates, 10,000 DLD1 cells were seeded per well and cultured for 24 h to ensure proper adhesion. Subsequently, the medium was replaced with fresh medium containing the drug at gradient concentrations of 0, 20, 40, 80, 160, and 320 nmol, with each concentration tested in triplicate. After a 72 h incubation at 37 °C in a 5% CO2 atmosphere, the medium was removed, and cells were gently washed with PBS. Cells were then stained with a crystal violet-methanol solution at room temperature for 10–20 min. Following the removal of unbound dye, 2% SDS solution was added to solubilize the stain. Absorbance was measured at 560 nm using a microplate reader. Cell viability was calculated based on absorbance values, dose–response curves were plotted, and the IC50 value was determined through nonlinear regression analysis.

Statistical analysis
All statistical analyses in this study were conducted using R software (version 4.3.0). The “limma” R package was used to identify differentially expressed genes between two groups, while Spearman's test was employed to evaluate correlations between variables, and Wilcoxon test was used for comparisons among multiple groups. All in vitro experiments were repeated three times, and data were expressed as mean ± standard deviation (mean ± SD). Statistical analyses were performed using two-sided tests, with a significance level of P < 0.05 considered statistically significant. The following significance thresholds were used: *P < 0.05, **P < 0.01, ***P < 0.001.

Result

Result

Lactylation landscape of bulk sequencing
To systematically explore the role of lactylation in CRC, we integrated bulk sequencing data from the TCGA and GEO databases (Fig. 1A). The GEO database included eight independent CRC cohorts, which were batch-corrected and merged into a single GEO cohort (Fig. 1B). The merged GEO cohort comprised 1705 samples, with 1000 samples containing OS data and 1327 samples with RFS data. The TCGA cohort consisted of 701 samples, including 51 normal tissues and 650 tumor tissues, with 571 samples containing OS data and 235 samples with RFS data. The analysis revealed a significant elevation of lactylation scores in tumor tissues (Fig. 1C). Using ssGSEA, we computed a lactylation score for each sample. Across two cohorts, the score showed positive associations with the core lactylation writer EP300 (R > 0.8, P < 0.01) and with four genes involved in lactate production—PFKM, PFKP, ALDOA, and ENO1 (each R > 0.6, P < 0.01), supporting the robustness and construct validity of the score (Supplementary Fig. 1A). Based on the median lactylation score, CRC samples were classified into high lactylation and low lactylation score groups. Survival analysis demonstrated that the high lactylation score group had significantly poorer OS and RFS compared to the low lactylation score group, and high lactylation scores were closely associated with CRC progression (Fig. 1D). GSVA analysis based on the HALLMARK gene set revealed that the high lactylation score group was primarily enriched in pathways related to cell proliferation and immune regulation, including G2M_CHECKPOINT, MYC_TARGETS_V1, E2F_TARGETS, IL6_JAK_STAT3_SIGNALING, and HYPOXIA (Figs. 1E-F). These findings suggest that lactylation may promote CRC progression by modulating the cell cycle, transcription factors, and immune signaling. Differential gene analysis between high and low lactylation groups (Fig. 1G) identified shared differentially expressed genes in the TCGA and GEO cohorts (Fig. 1H). KEGG and GO enrichment analyses showed that these differentially expressed genes (Up-regulated and down-regulated genes) were mainly involved in cell proliferation, energy metabolism, immune regulation pathways (Fig. 1I), suggesting that lactylation may have the potential to promote CRC progression.

Lactylation landscape of single-cell sequencing
To further investigate the role of lactylation in CRC, we integrated four single-cell sequencing datasets from the GEO database (Fig. 2A, B), comprising a total of 66 samples, including 47 tumor samples and 19 normal samples (Fig. 2C). These samples were categorized based on tissue type (normal vs. tumor), stage (TNM staging), age (≥ 65 years vs. < 65 years), sex (male vs. female), and tissue location (Fig. 2D–E). By calculating lactylation scores at the single-cell level, we found that lactylation scores were significantly lower in normal tissues compared to tumor tissues, with the highest scores observed in advanced samples (Fig. 2F–G).
Further analysis of the impact of clinical features on lactylation levels revealed that in normal tissues, the lactylation score in the < 65 years group was significantly lower than that in the ≥ 65 years group, and male samples had significantly higher lactylation scores than female samples. Among different intestinal segments, the cecum had the lowest lactylation score, while the ascending colon had the highest (Fig. 2H). In tumor tissues, the lactylation score was significantly higher in the < 65 years group compared to the ≥ 65 years group, with males again showing higher scores than females. The lactylation score in the cecum remained the lowest, while the rectum exhibited the highest score (Fig. 2I). Additionally, we identified eight major cell populations (Supplementary Fig. 1B, Fig. 2J). Among these, cell types in tumor tissues showing significantly elevated lactylation scores included epithelial cells, fibroblasts, endothelial cells, and mast cells (Fig. 2K). Furthermore, an analysis of lactylation levels across all cell types revealed that in normal tissues, epithelial cells had the lowest lactylation scores, whereas in tumor tissues, epithelial cells had the highest scores (Fig. 2L). We re-clustered epithelial cells and further identified malignant cells within the epithelial compartment (Supplementary Fig. 1C, D). In normal tissues, all epithelial cells were classified as non-malignant, while in tumor tissues, 88.6% of epithelial cells were identified as malignant and 11.4% as non-malignant (Supplementary Fig. 1E). Notably, the lactylation scores in malignant epithelial cells were significantly higher than those in non-malignant epithelial cells (Supplementary Fig. 1F). These findings suggest that tumor epithelial cells are the most lactylation-associated population in colorectal cancer and may play a important role in lactylation-driven CRC progression.

Identification of lactylation-associated tumor epithelial cell subpopulations
To explore the role of epithelial cells in the lactylation process, this study re-clustered and analyzed the epithelial cell populations within the single-cell data, identifying 11 distinct epithelial cell subpopulations (Epi_1 to Epi_11) (Fig. 3A). These subpopulations all exhibited varying degrees of expression of epithelial markers EPCAM, KRT18, KRT19, and KRT8 (Supplementary Fig. 2A), with each subpopulation demonstrating unique molecular expression profiles (Supplementary Fig. 2B). Further analysis revealed that the distribution of these epithelial subpopulations varied across tissue type, TNM stage, and tumor localization (Fig. 3B). Among the 11 epithelial subpopulations, Epi_3 exhibited the highest lactylation levels (Fig. 3C). Moreover, whether analyzing normal and tumor tissues combined or focusing solely on tumor tissues, Epi_3 consistently showed the highest proportion of malignant epithelial cells across all epithelial clusters (Supplementary Fig. 2C, D). We further stratified epithelial cells into three categories—normal epithelial cells, non-malignant epithelial cells, and malignant epithelial cells—and found that Epi_3 displayed the highest lactylation score across all three groups (Supplementary Fig. 2E). Based on these findings, we selected Epi_3 as the primary focus for further investigation in this study. Proportional analysis showed that Epi_3 was most abundant in tumor tissues, and its proportion increased progressively with advancing TNM stage, with the highest enrichment observed in rectal tissues (Fig. 3D). We further conducted GO and KEGG functional enrichment analyses based on the upregulated genes (log2FC > 0.25, p < 0.05) identified in each epithelial subpopulation from Epi_1 to Epi_11. Compared with other subclusters, Epi_3 was predominantly enriched in biological processes closely related to lactate metabolism, including gene expression regulation, protein translation, RNA processing, ubiquitin-mediated degradation, cell adhesion, mitochondrial function, glycolysis, and oxidative phosphorylation (Fig. 3E-F; Supplementary Fig. 3). Pathway enrichment analysis demonstrated that, compared to other epithelial subpopulations, Epi_3 was predominantly enriched in the MAPK and PI3K signaling pathways (Fig. 3G). Furthermore, Gene Set Enrichment Analysis (GSEA) highlighted the significant involvement of Epi_3 in energy metabolism, oxidative balance, and cell cycle-related biological processes (Fig. 3H). Integrating bulk sequencing data, we found that high infiltration of Epi_3 was associated with poor prognosis in CRC patients, with significantly reduced OS and RFS in patients with high Epi_3 infiltration (Fig. 3I). Additionally, analysis of four CRC spatial transcriptomics datasets revealed spatial co-localization of lactylation levels and Epi_3 (Fig. 3J), further supporting the pivotal role of Epi_3 in lactylation regulation. Thus, this study identifies Epi_3 as the epithelial cell subpopulation most strongly associated with lactylation, which may play a critical role in CRC initiation and progression.

Epi_3 defined as HMGB2-positive epithelial cells (HMGB2+Epi) and in vitro validation
To further characterize the representative genes of Epi_3, we performed high-dimensional weighted gene co-expression network analysis (hdWGCNA) on the epithelial cell populations. This analysis identified six co-expression modules (M1–M6) (Fig. 4A, B), with the M4 module being most significantly enriched in Epi_3 (Fig. 4C, D). We further extracted the top 10 core genes from the M4 module (Fig. 4E) and intersected these with the high-expression genes in Epi_3 (logFC > 1, P < 0.05), resulting in the identification of five common genes: CDC20, UBE2C, CCNB1, HMGB2, and TOP2A (Fig. 4F). The ranking of the five genes based on differential expression (from highest to lowest) is as follows: UBE2C (log2FC = 2.1), HMGB2 (log2FC = 1.9), CCNB1 (log2FC = 1.2), TOP2A (log2FC = 1.2), and CDC20 (log2FC = 1.1). Bulk sequencing data analysis revealed that HMGB2 showed the highest correlation with Epi_3 (Fig. 4G), and its expression was significantly higher in Epi_3 compared to other epithelial subpopulations (Fig. 4H). Moreover, HMGB2 was positively correlated with lactylation scores (Fig. 4I), suggesting its potential key role in lactylation regulation. Therefore, we define Epi_3 as HMGB2-positive epithelial cells (HMGB2+Epi). Survival analysis from TCGA and GEO datasets demonstrated that high HMGB2 expression was associated with poorer RFS in CRC patients, while the association with OS was less significant (Supplementary Fig. 4). Further analysis revealed that HMGB2 was significantly higher in tumor tissues than in normal tissues at both RNA and protein levels (Fig. 4J, K). Immunohistochemical results from the Human Protein Atlas (HPA) database showed that HMGB2 was predominantly expressed in tumor cells (Fig. 4L). To validate these findings, we collected CRC tissue samples for in vitro experiments. Immunohistochemistry and Western blot (WB) analysis confirmed that HMGB2 was expressed at low levels in normal tissues and highly expressed in CRC tissues, primarily localized in tumor cells (Fig. 4M, N). Additionally, lactylation levels were significantly elevated in tumor tissues compared to normal tissues (Fig. 4O). To further explore the relationship between HMGB2 and lactylation, we analyzed one normal colon cell line and eight CRC cell lines. The results revealed that HMGB2 was most highly expressed in the DLD1 cell line and least expressed in the SW620 cell line (Fig. 4P). Correspondingly, DLD1 cells exhibited the highest lactylation levels, while SW620 cells showed relatively lower lactylation levels (Fig. 4Q). Based on these findings, we generated stable SW620 HMGB2-overexpressing cells and DLD1 HMGB2-knockout cells, with the highest knockdown efficiency observed in sg_2 (Fig. 4R). These stable cell models will be used in subsequent experiments to further investigate the functional mechanisms of HMGB2 in lactylation and CRC progression.

HMGB2 promotes the warburg effect and malignant progression of CRC
This study found that HMGB2 enhances lactylation levels. Overexpression of HMGB2 in SW620 cells resulted in a significant increase in lactylation, while knockdown of HMGB2 in DLD1 cells led to a significant reduction in lactylation levels (Fig. 5A). The Warburg effect is the primary metabolic pathway through which tumor cells produce lactate. Further analysis revealed that HMGB2 overexpression enhanced the Warburg effect, characterized by a decrease in FBP1 expression and an increase in the expression of HK2, LDHA, and PKM2. Conversely, in HMGB2-knockout cells, the Warburg effect was inhibited (Fig. 5A). Both HMGB2 and the Warburg effect are closely associated with tumor invasion, migration, and proliferation. This study demonstrated that HMGB2 knockdown in DLD1 cells significantly reduced cell migration and invasion, as shown by Transwell assays, whereas HMGB2 overexpression in SW620 cells significantly enhanced migration and invasion (Fig. 5B, C). Clonogenic assays and CCK-8 experiments further confirmed that HMGB2 significantly promotes CRC cell proliferation (Fig. 5D, E). Additionally, flow cytometry apoptosis assays revealed a marked increase in apoptosis in DLD1 cells following HMGB2 knockdown (Fig. 5F). Notably, in vivo tumorigenesis experiments in nude mice further validated the role of HMGB2 in promoting CRC malignancy. Tumor volumes and weights were significantly larger in SW620 cells overexpressing HMGB2 compared to controls, whereas tumors formed by HMGB2-knockout DLD1 cells were markedly smaller (Fig. 5G–I). Hematoxylin and eosin (H&E) staining and Ki67 immunohistochemistry further confirmed that tumors with high HMGB2 expression exhibited greater proliferative capacity (Fig. 5J). In summary, HMGB2 promotes lactate production and the Warburg effect, accelerating CRC cell proliferation, migration, and invasion, while reducing apoptosis.

NFYB-HMGB2 axis regulates the warburg effect, proliferation, and metastasis of CRC
To further explore the upstream regulatory mechanisms of HMGB2, this study employed SCENIC analysis to investigate the transcriptional regulatory networks of different epithelial cell subpopulations, aiming to precisely identify key regulators specific to each subpopulation. The analysis revealed that NFYB is the most active transcription factor in HMGB2+Epi (Fig. 6A, B). Further analysis showed that NFYB transcriptional activity was significantly higher in the tumor group compared to the normal group, and this activity remained elevated across all TNM stages of CRC patients (Fig. 6C). Integration of TCGA and GEO bulk sequencing data confirmed that NFYB mRNA expression was significantly higher in tumor tissues than in normal tissues (Fig. 6D). Immunohistochemical analysis from the HPA database further verified that NFYB is predominantly expressed in tumor cells (Fig. 6E). Additionally, this study found a significant positive correlation between NFYB and lactylation scores (Fig. 6F), as well as between NFYB and HMGB2 expression levels (Fig. 6G). To further validate this regulatory relationship, immunohistochemistry was performed on an external cohort, which confirmed that NFYB expression was predominantly localized in tumor cells and was higher than in normal tissues (Fig. 6H). In the internal sample cohort, NFYB expression was also significantly positively correlated with HMGB2 expression (Fig. 6I). To explore the direct regulatory role of NFYB on HMGB2, three NFYB siRNAs were designed, and the most efficient siRNA_1 was selected. After transfection with NFYB siRNA_1, HMGB2 expression in DLD1 cells was significantly reduced (Fig. 6J), suggesting that NFYB may directly regulate HMGB2 expression. JASPAR database predictions indicated potential binding sites for NFYB in the HMGB2 promoter region (Fig. 6K), and ChIP-qPCR experiments confirmed that NFYB significantly enriched the HMGB2 promoter region (Fig. 6L), validating that NFYB directly regulates HMGB2 transcriptional activation. Furthermore, this study demonstrated that the NFYB-HMGB2 axis regulates the Warburg effect and lactylation levels (Fig. 6M). In cells with NFYB knockdown, lactylation levels decreased, and the expression of Warburg effect-related genes (FBP1, HK2, LDHA, PKM2) was suppressed; however, HMGB2 overexpression was able to rescue this process (Fig. 6M). Functional assays further confirmed that the NFYB-HMGB2 axis promotes CRC cell proliferation, as shown by CCK-8 assays (Fig. 6N), while Transwell assays revealed that the NFYB-HMGB2 axis significantly enhances CRC cell migration and invasion (Fig. 6O, P). In summary, NFYB regulates HMGB2 transcription and, consequently, modulates the Warburg effect and lactylation levels, thereby promoting CRC cell proliferation and metastasis.

Cell–cell communication analysis reveals the distinct interaction patterns of Epi_3 (HMGB2+Epi)
We systematically analyzed the cell–cell communication patterns between the 11 epithelial subpopulations (Epi_1–Epi_11) and seven major cell types in the CRC microenvironment—including Mast cells, Fibroblasts, Myeloid cells, B cells, T/NK cells, Plasma cells, and Endothelial cells—using the CellChat package. The results showed that T/NK cells were the strongest signal receivers, while Fibroblasts were the strongest signal senders. Notably, compared with lower-lactylation subpopulations, the high-lactylation epithelial subpopulation HMGB2+Epi exhibited greater communication capacity in both sending and receiving signals (Supplementary Fig. 5A). Further analysis revealed that HMGB2+Epi demonstrated markedly stronger interactions with Fibroblasts, Endothelial cells, and T/NK cells, suggesting a important role in modulating the tumor microenvironment (Supplementary Fig. 5B, C). Ligand–receptor pair analysis confirmed that HMGB2+Epi was enriched for a variety of ligands that could activate signaling pathways in these target cells, while also receiving signals to form a bidirectional communication network (Supplementary Fig. 5D). These findings underscore the distinct communication patterns of high-lactylation epithelial cells and suggest that their interaction with the microenvironment may contribute to CRC progression.

BI-2536 as a tool compound implicating the HMGB2–lactylation axis
To screen for potential inhibitors of HMGB2, we utilized the R package “oncoPredict” to calculate the IC50 values of 198 drugs from the TCGA and GEO cohorts and identified those that were negatively correlated with HMGB2 expression (selection threshold: R < − 0.3, P < 0.05) (Fig. 7A). Further analysis of the common drugs in both the TCGA and GEO datasets led to the identification of six candidate drugs: BI-2536, Gallibiscoquinazole, AGI-6780, Wee1 Inhibitor, GSK591, and Tozasertib (Fig. 7B). In both the TCGA and GEO cohorts, the IC50 values of these six drugs were negatively correlated with HMGB2 expression (Fig. 7C, Supplementary Fig. 6). Molecular docking analysis revealed that BI-2536 had the highest binding potential with HMGB2, exhibiting the lowest Vina score (Fig. 7D), suggesting that BI-2536 might serve as a high-affinity inhibitor of HMGB2. Subsequently, molecular dynamics simulations were performed on the HMGB2-BI-2536 complex. RMSD analysis indicated that the complex stabilized after 20 ns of binding between BI-2536 and HMGB2 (Fig. 7E). RMSF analysis showed increased flexibility of key residues in HMGB2, suggesting that these regions may be critical for BI-2536 binding (Fig. 7F). Hydrogen bond analysis further confirmed that BI-2536 forms dynamic hydrogen bond interactions with HMGB2, maintaining stable interactions throughout the simulation (Fig. 7G). Solvent-accessible surface area (SASA) analysis demonstrated that the solvent exposure of HMGB2 remained stable after BI-2536 binding, with no significant changes (Fig. 7H). Radius of gyration (Rg) analysis further indicated that the tertiary structure of the HMGB2-BI-2536 complex remained stable, with no significant expansion or contraction (Fig. 7I). Together, both molecular docking and molecular dynamics simulations support the conclusion that BI-2536 can stably bind to HMGB2 and may serve as a potential inhibitor. To experimentally validate the inhibitory effect of BI-2536 on HMGB2, we treated DLD1 colorectal cancer cells with a gradient of BI-2536 concentrations (0, 20, 40, 80, 160, 320 nmol) and determined an IC50 value of 68.16 nmol (Fig. 7J). Western blot analysis revealed a significant reduction in HMGB2 protein levels following treatment with BI-2536 at the IC50 concentration (Fig. 7K), supporting the nomination of BI-2536 as a tool compound for probing the HMGB2–lactylation axis. This supports concept validation of the HMGB2–lactylation axis but does not establish direct HMGB2 inhibition.

Construction of an HMGB2+Epi-related risk prognostic model
Based on the previous analysis, this study confirms that the lactylation subpopulation of HMGB2+Epi is closely associated with poor prognosis in CRC. Its biological characteristics suggest its potential for constructing a risk prognostic model. Using bulk sequencing data from TCGA and GEO, we employed the WGCNA method to identify co-expression modules of HMGB2+Epi and lactylation (Supplementary Fig. 7A–B). The results revealed that in the TCGA cohort, the MElightyellow module was the co-expression module of HMGB2+Epi and lactylation, while in the GEO cohort, the MEyellow module was identified as the corresponding co-expression module (Fig. 8A). By taking the intersection of both modules, a total of 520 genes were selected (Fig. 8B). Correlation analysis demonstrated that the scores of these genes were positively correlated with lactylation (correlation coefficient > 0.6) and HMGB2+Epi (correlation coefficient > 0.8) (Fig. 8C), further indicating their close relationship with both lactylation and HMGB2+Epi. Combining the survival data from TCGA and GEO cohorts, we further selected genes associated with prognosis. Univariate Cox regression analysis revealed that 38 genes significantly influenced patient prognosis across both cohorts (Fig. 8D). Subsequently, 101 algorithm combinations of 10 machine learning methods were applied to construct the risk prognostic model, using the TCGA cohort as the training set and the GEO cohort as the validation set. The results indicated that, compared to other algorithm combinations, the RSF + StepCox (Random Survival Forest + Stepwise Cox Regression) combination exhibited the highest average C-index and the best predictive performance (Fig. 8E). This algorithm combination ultimately identified 11 risk-associated feature genes: TRIP10, FLOT1, PMM2, LRRFIP2, TRIM2, PPARGC1A, MT2A, BNIP2, SERTAD2, SRSF3, and LSM3, with their respective risk coefficients listed in Supplementary Table 2. These risk feature genes were predominantly distributed in epithelial cells (Supplementary Fig. 7C). Survival analysis revealed that higher risk scores correlated with higher mortality, a trend that was consistent across the TCGA, GEO, and Meta (integrated TCGA and GEO) cohorts (Fig. 8F). Further analysis showed that patients with higher risk scores had significantly lower OS and RFS (Fig. 8G). To validate the predictive ability of the model, we collected 195 published CRC risk prognostic models, calculated their C-index values, and compared them with the model developed in this study. Among these, 125 models had available expression profiles in the TCGA and GEO cohorts. The results showed that the model constructed in this study outperformed the 125 previously published models across the TCGA, GEO, and Meta cohorts (Supplementary Table 3, Supplementary Fig. 8), further confirming the robustness and clinical applicability of our model. Additionally, to enhance the prognostic predictive capacity, we performed multivariate Cox regression analysis combining TNM staging (Supplementary Fig. 9A). The results indicated that both risk scores and TNM staging were independent poor prognostic factors for CRC patients. Based on this, we developed a nomogram model incorporating TNM staging (Fig. 8H), which can more accurately predict the OS of CRC patients. Calibration curves showed that the model performed well in predicting 1-, 3-, and 5-years survival rates (Fig. 8I), while ROC curve analysis further confirmed that its predictive ability surpassed that of risk scores or TNM staging alone (Fig. 8J). Similarly, we constructed a nomogram model for predicting CRC RFS (Supplementary Fig. 9B, C), which demonstrated significantly better performance in predicting 5-year RFS compared to either risk scores or TNM staging alone (Supplementary Fig. 9D). In conclusion, the HMGB2+Epi-related risk prognostic model constructed in this study demonstrates superior performance in predicting CRC patient prognosis and holds promise for clinical stratification and individualized management.

Discussion

Discussion
CRC is one of the most common malignancies worldwide, and its onset, progression, and resistance to treatment are closely associated with tumor metabolic reprogramming [24]. In recent years, lactylation, a novel post-translational modification (PTM), has been shown to play a significant role in regulating tumor cell fate and remodeling the TME [25]. However, the role of lactylation in CRC remains largely unexplored, particularly at the single-cell level, where the influence of specific tumor cell subpopulations on lactylation and its driving mechanisms is still unclear.
In this study, we observed a significant elevation of lactylation levels in CRC tumor tissues, which was closely associated with poor patient prognosis. GSVA revealed that samples with high lactylation scores were significantly enriched in hallmark pathways including cell cycle (G2M_CHECKPOINT), MYC targets, E2F targets, IL6_JAK_STAT3 signaling, and hypoxia. These pathways are critically involved in cell proliferation, transcriptional regulation, inflammatory responses, and remodeling of the tumor microenvironment. Notably, while the glycolysis pathway was also enriched in the high-lactylation group, it was not among the top-ranked pathways. This finding suggests that the transcriptomic profile associated with elevated lactylation reflects not merely upstream metabolic activity such as enhanced glycolysis, but rather downstream epigenetic consequences driven by lactate accumulation. Lactate, predominantly produced via the Warburg effect, serves as a substrate for protein lactylation—a covalent modification of lysine residues on histone and non-histone proteins—thereby influencing chromatin architecture and gene expression regulation. Thus, an elevated lactylation score likely indicates an active metabolic–epigenetic coupling state, rather than simply differential expression of glycolytic genes. Previous studies have demonstrated that lactylation, as an independent epigenetic modification, plays a pivotal role in promoting cell cycle progression [26, 27], enhancing immune evasion [28, 29], maintaining tumor cell stemness, and activating transcriptional programs associated with malignant phenotypes [30]. These findings support our observation that transcriptomes with high lactylation levels are more strongly enriched for pathways related to proliferation and immune modulation, rather than being confined to glycolytic signaling alone.
This study employed an integrated multi-omics approach to investigate the role of lactylation in CRC and identified a key driver subpopulation—HMGB2-positive lactylation-enriched epithelial cells (HMGB2+Epi). Single-cell analysis revealed that lactylation was predominantly enriched in epithelial cells, with the Epi_3 subpopulation exhibiting the highest lactylation levels, frequently observed in advanced TNM stages and rectal cancer cases. CNV analysis showed that over 90% of Epi_3 cells were malignant, while a small fraction were CNV-negative. However, these CNV-negative cells clustered tightly with Epi_3 malignant cells, indicating highly similar transcriptomic features. Given the complexity and heterogeneity of the tumor microenvironment, we speculate that these cells may represent an intermediate state transitioning from non-malignant to malignant, or reflect non-malignant epithelial cells influenced by tumor-derived signals and adopting a tumor-like transcriptional profile. Relying solely on CNV status to filter cells may underestimate the biological significance of Epi_3 in tumor–microenvironment interactions and lead to an incomplete understanding of lactylation-driven processes. More importantly, compared with other subpopulations, Epi_3 has the highest lactylation score in normal epithelial cells, non-malignant epithelial cells and malignant epithelial cells.Therefore, we chose not to use CNV as the sole criterion for selection but instead analyzed Epi_3 as an integrated functional unit based on its overall transcriptional and functional characteristics. This approach aimed to comprehensively characterize the molecular features of lactylation-high epithelial subpopulations and their potential roles in CRC progression. Additionally, we identified HMGB2 as the core regulatory gene within Epi_3 and defined this cluster as HMGB2+Epi. The role of HMGB2 in CRC has been limited, but recent studies have highlighted that extracellular secretion of HMGB2 is essential for the translocation of calnexin to the cell membrane, a process crucial for triggering immune responses. HMGB2 is also involved in regulating ferroptosis, affecting cancer cell survival [31]. In other tumors, extensive research has shown that HMGB2 plays a critical role in the growth and metastasis of hepatocellular carcinoma (HCC) both in vivo and in vitro and is capable of remodeling the immunosuppressive microenvironment of HCC [32, 33]. Additionally, HMGB2 has been shown to promote the proliferation and epithelial-mesenchymal transition of non-small cell lung cancer [34]. Therefore, the lactylation subpopulation HMGB2+Epi may play a important role in the initiation and development of CRC.
Multiple lines of evidence support a strong association between HMGB2 and lactylation levels. At the transcriptional level, analysis across multiple CRC cohorts from TCGA and GEO revealed a consistent, significant positive correlation between HMGB2 expression and lactylation ssGSEA scores. Functional experiments using CRC cell models further confirmed this link: HMGB2 overexpression (in SW620 cells) significantly increased lactylation levels, while HMGB2 knockout (in DLD1 cells) markedly reduced them, providing direct evidence of its regulatory role. Mechanistically, prior research has implicated the HMGB protein family in lactylation pathways. In macrophages, for example, elevated lactate promotes lysine lactylation of HMGB1 via monocarboxylate transporters and p300/CBP [35]. Given the ~ 80% sequence homology between HMGB1 and HMGB2—including conserved HMG-box domains and lysine-rich nuclear localization sequences—HMGB2 may exhibit similar post-translational modifications[36], though this remains to be directly demonstrated. HMGB2 also plays a role in lactate metabolism. It stabilizes HIF-1α under hypoxia and promotes expression of glycolytic genes such as GLUT1, HK2, and LDHA [37]. It also upregulates LDHB and represses FBP1, further enhancing lactate production while inhibiting gluconeogenesis [38]. These findings suggest that HMGB2 contributes to metabolic reprogramming by linking chromatin remodeling to the regulation of lactate-associated enzymes. Additionally, HMGB2 has been identified as a diagnostic marker in gastric cancer, where its silencing suppresses proliferation and glycolysis [39], reinforcing its broader role in tumor metabolism.
The Warburg effect refers to the phenomenon in which cancer cells preferentially convert glucose to lactate via glycolysis, even in the presence of sufficient oxygen, rather than utilizing oxidative phosphorylation in mitochondria for energy production [40]. This process results in the accumulation of lactate in the TME, significantly influencing tumor growth, metastasis, and immune evasion [41, 42]. In breast cancer, HMGB2 has been shown to activate the Warburg effect, thereby promoting malignant progression [43]. In pancreatic cancer, high expression of HMGB2 is associated with poor prognosis and is essential for maintaining the Warburg effect [37]. In this study, overexpression of HMGB2 promoted increased lactylation in CRC cells, whereas silencing HMGB2 suppressed lactylation. Notably, overexpression of HMGB2 activated the Warburg effect, while silencing HMGB2 inhibited it. Furthermore, HMGB2 accelerated CRC cell proliferation, migration, invasion, and reduced apoptosis. SCENIC analysis and ChIP-qPCR experiments revealed that NFYB directly binds to the HMGB2 promoter region, enhancing its transcription and forming an NFYB-HMGB2 axis. NFYB, a subunit of the NF-Y trimeric complex, acts as a transcription factor crucial for cell proliferation, differentiation, and metabolism [44]. Studies have shown that NFYB can enhance oxaliplatin resistance in CRC [45]. Another study demonstrated that NFYB influences the metabolism of gliomas and promotes glioma cell proliferation and metastasis [46]. In this study, activation of the NFYB-HMGB2 axis further promoted the Warburg effect by downregulating FBP1 expression and upregulating glycolysis-related genes such as HK2, LDHA, and PKM2, thereby increasing lactate production and accelerating CRC progression. This mechanism unveils how lactylation, through a transcription factor-metabolism regulatory network, forms a positive feedback loop within tumor cells that continuously drives tumor progression. This study is the first to identify HMGB2+Epi as a core lactylation-driven subpopulation and to elucidate the role of the NFYB-HMGB2 axis in lactylation-mediated metabolic reprogramming. These findings provide new potential strategies for the precision diagnosis and targeted therapy of CRC.
In the tumor microenvironment, the high-lactylation epithelial subpopulation HMGB2+Epi exhibits distinct cell–cell communication patterns. HMGB2+Epi demonstrates significantly greater abilities as both a signal sender and receiver compared to lower-lactylation epithelial subpopulations, with particularly active interactions with key microenvironmental components, including fibroblasts, endothelial cells, and T/NK cells. Further ligand–receptor analysis revealed that HMGB2+Epi is enriched for a variety of ligands capable of activating signaling pathways in these target cells, while also receiving diverse signals to form a bidirectional communication network. This observation suggests that lactylation is not merely a marker of metabolic reprogramming but may also reshape the cell–cell communication network, facilitating signal exchange and dynamic balance within the tumor microenvironment, ultimately driving CRC progression and metastasis. Previous studies have shown that lactate plays an important role in regulating the tumor immune microenvironment, such as by suppressing CD8+ T cell activity and promoting immune evasion [28]. In addition, high-lactate conditions have been shown to enhance fibroblast activity, promote cancer-associated fibroblast (CAF) differentiation, and contribute to tumor progression [47]. The unique communication pattern of HMGB2+Epi provides a theoretical basis for further exploring the role of lactylation in CRC.
In clinical applications, this study, through drug screening and molecular dynamics simulations, identified BI-2536 as a potential inhibitor of HMGB2. BI-2536 can reduce HMGB2 stability, suppress lactylation levels, and effectively inhibit CRC cell proliferation and migration. BI-2536 is a potent and selective inhibitor of Polo-like kinase 1 (PLK1), and has demonstrated antitumor activity in various cancer types [48, 49]. Prior to this study, BI-2536 had not been identified as a potential HMGB2 inhibitor. Therefore, this discovery not only expands the indications of BI-2536 but also provides a novel targeted strategy for lactylation metabolic intervention. In the future, BI-2536 or its derivatives may be further developed as an HMGB2-targeted inhibitor and used in combination with existing immunotherapies or metabolic inhibitors to enhance treatment efficacy in CRC patients. Machine learning is playing an increasingly important role in tumor prognosis prediction. By analyzing patients' multi-omics data and clinical information, machine learning models can provide more accurate survival predictions and risk assessments, thereby aiding clinical decision-making and improving therapeutic outcomes [50]. Compared to traditional CRC risk scoring models, the HMGB2+Epi-related prognostic model, constructed in this study using single-cell analysis, WGCNA, and machine learning, demonstrated superior predictive capability across multiple independent cohorts, outperforming 195 previously published risk prognostic models. By integrating TNM staging, we further developed a nomogram model that enhanced predictions of 1-, 3-, and 5-years OS and RFS in CRC patients, offering strong clinical applicability. This tool can be used for postoperative follow-up, prognostic evaluation, and precise treatment decision-making, providing more reliable quantitative indicators for the individualized management of CRC patients.
Although this study systematically elucidates the role of HMGB2+Epi in CRC and reveals the regulatory mechanism of the NFYB-HMGB2-lactylation axis, several limitations remain. Firstly, this study is primarily based on public databases and in vitro experiments, lacking independent validation with large-scale clinical samples. Future studies should incorporate prospective clinical cohorts to further assess the clinical prognostic value of HMGB2+Epi. Secondly, although this study proposes that the NFYB-HMGB2 axis regulates lactylation, the upstream signals triggering this axis remain unclear. Future work could integrate epigenetic and metabolomic approaches to further investigate this process. Furthermore, Although our study found that BI-2536 effectively downregulates HMGB2 protein levels and suppresses lactylation in HMGB2-high CRC cells, its specificity remains limited. Originally developed as a PLK1 inhibitor, BI-2536 possesses broad cell cycle–regulatory activity. Therefore, its observed inhibitory effect on lactylation may not result from direct targeting of HMGB2, but rather from indirect mechanisms such as disruption of cell cycle progression or upstream transcriptional networks. This suggests that BI-2536 should be considered a preliminary tool compound to validate the therapeutic potential of targeting the HMGB2–lactylation axis, rather than a highly selective HMGB2 inhibitor.We acknowledge the complexity introduced by using multi-target compounds in mechanistic studies. To address this, future work will combine CRISPR-mediated knockdown of HMGB2 with BI-2536 treatment to determine whether its lactylation-suppressive effects are indeed dependent on the HMGB2 pathway. In parallel, we are conducting structure-based screening and molecular docking to identify candidate compounds with stronger binding affinity and higher specificity for HMGB2, aiming to develop novel small molecules capable of directly modulating HMGB2-driven lactylation. Lastly, despite the robust performance of our lactylation-associated prognostic model across multiple cohorts, several limitations should be noted. The model was primarily based on genes from the HMGB2+Epi transcriptional program, which may limit its generalizability to CRC subtypes lacking this population. Given the molecular heterogeneity of CRC, a single-cell-type-derived signature may not fully capture broader risk variability. Although we applied a LOOCV framework to minimize overfitting, further validation in prospective clinical cohorts is necessary to confirm its clinical relevance. Additionally, the model was trained on bulk RNA-seq data, which may mask cell-specific expression patterns. Finally, while the model reflects lactylation-associated transcriptional activity, it does not directly quantify lactylation levels or enzymatic activity. Future work should integrate functional assays to strengthen mechanistic links between the signature and lactylation biology.

Conclusions

Conclusions
This study identifies HMGB2+Epi as a tumor epithelial cell subpopulation highly enriched in lactylation and reveals that the NFYB-HMGB2 axis promotes CRC progression through the regulation of lactylation and the Warburg effect. Additionally, BI-2536 was screened as a potential inhibitor of HMGB2, and an HMGB2+Epi-related risk prognostic model was constructed and validated for its clinical predictive value in independent cohorts. This study expands the understanding of lactylation in CRC and provides new strategies for the precise stratification management and targeted therapy of CRC. Further clinical validation and targeted intervention studies are needed in the future.

Supplementary Information

Supplementary Information

출처: PubMed Central (JATS). 라이선스는 원 publisher 정책을 따릅니다 — 인용 시 원문을 표기해 주세요.

🏷️ 같은 키워드 · 무료전문 — 이 논문 MeSH/keyword 기반

🟢 PMC 전문 열기