Dynamic remodelling of epithelial plasticity in colorectal cancer from single-cell and spatially resolved perspectives.
1/5 보강
[BACKGROUND] Epithelial compartments play a central role in the progression and metastasis of colorectal cancer (CRC), yet the mechanisms underlying their transcriptional reprogramming across disease
APA
Xu C, Pan Y, et al. (2025). Dynamic remodelling of epithelial plasticity in colorectal cancer from single-cell and spatially resolved perspectives.. Journal of translational medicine, 23(1), 1341. https://doi.org/10.1186/s12967-025-07380-8
MLA
Xu C, et al.. "Dynamic remodelling of epithelial plasticity in colorectal cancer from single-cell and spatially resolved perspectives.." Journal of translational medicine, vol. 23, no. 1, 2025, pp. 1341.
PMID
41286944 ↗
Abstract 한글 요약
[BACKGROUND] Epithelial compartments play a central role in the progression and metastasis of colorectal cancer (CRC), yet the mechanisms underlying their transcriptional reprogramming across disease stages remain incompletely understood.
[METHODS] We constructed comprehensive single-cell and spatial transcriptomic atlases encompassing normal colorectal tissue, primary CRC, and liver metastases (LMs), profiling over 200,000 high-quality cells. Specific coexpression modules and transcriptional regulators were identified and spatially mapped. Additionally, the clinical and functional roles of SCAND1 were investigated using western blot, RT-qPCR, and immunohistochemistry in clinical samples, as well as functional assays in CRC cell lines.
[RESULTS] Distinct coexpression modules were selectively enriched in normal epithelial cells and associated with vesicle transport, ion homeostasis, and barrier function, whereas LM-specific modules were linked to Wnt signalling, ribosome biogenesis, and cell cycle processes. High-CNV epithelial cells exhibited progressive activation of key transcription factors such as CEBPB, E2F, and the AP-1 family, with concurrent suppression of ARGLU1 and MTNR2L8, indicating coordinated remodelling of transcriptional and metabolic programs during metastasis. These changes were spatially validated and correlated with alterations in epithelial composition within the tumour core. Consistently, multiplex immunofluorescence (mIHC) confirmed the expression patterns of CEBPB, ARGLU1, S100A4 and FOSL1, across normal epithelium, CRC, and LM. Notably, SCAND1 was identified as a potential therapeutic target for malignant epithelial plasticity. We demonstrated for the first time that SCAND1 expression is significantly upregulated in CRC patients and is associated with superior diagnostic and prognostic value. Functional experiments revealed that SCAND1 promotes tumour cell proliferation, inhibits apoptosis, and enhances metastatic potential. Furthermore, co-culture of SCAND1-overexpressing tumour cells with T cells showed that SCAND1 overexpression attenuates T cell–mediated cytotoxicity, supporting an additional immune-evasive role.
[CONCLUSIONS] Our study elucidates the molecular logic of epithelial state transitions during CRC metastasis and underscores the promise of SCAND1 as a novel biomarker and therapeutic target in colorectal cancer.
[SUPPLEMENTARY INFORMATION] The online version contains supplementary material available at 10.1186/s12967-025-07380-8.
[METHODS] We constructed comprehensive single-cell and spatial transcriptomic atlases encompassing normal colorectal tissue, primary CRC, and liver metastases (LMs), profiling over 200,000 high-quality cells. Specific coexpression modules and transcriptional regulators were identified and spatially mapped. Additionally, the clinical and functional roles of SCAND1 were investigated using western blot, RT-qPCR, and immunohistochemistry in clinical samples, as well as functional assays in CRC cell lines.
[RESULTS] Distinct coexpression modules were selectively enriched in normal epithelial cells and associated with vesicle transport, ion homeostasis, and barrier function, whereas LM-specific modules were linked to Wnt signalling, ribosome biogenesis, and cell cycle processes. High-CNV epithelial cells exhibited progressive activation of key transcription factors such as CEBPB, E2F, and the AP-1 family, with concurrent suppression of ARGLU1 and MTNR2L8, indicating coordinated remodelling of transcriptional and metabolic programs during metastasis. These changes were spatially validated and correlated with alterations in epithelial composition within the tumour core. Consistently, multiplex immunofluorescence (mIHC) confirmed the expression patterns of CEBPB, ARGLU1, S100A4 and FOSL1, across normal epithelium, CRC, and LM. Notably, SCAND1 was identified as a potential therapeutic target for malignant epithelial plasticity. We demonstrated for the first time that SCAND1 expression is significantly upregulated in CRC patients and is associated with superior diagnostic and prognostic value. Functional experiments revealed that SCAND1 promotes tumour cell proliferation, inhibits apoptosis, and enhances metastatic potential. Furthermore, co-culture of SCAND1-overexpressing tumour cells with T cells showed that SCAND1 overexpression attenuates T cell–mediated cytotoxicity, supporting an additional immune-evasive role.
[CONCLUSIONS] Our study elucidates the molecular logic of epithelial state transitions during CRC metastasis and underscores the promise of SCAND1 as a novel biomarker and therapeutic target in colorectal cancer.
[SUPPLEMENTARY INFORMATION] The online version contains supplementary material available at 10.1186/s12967-025-07380-8.
🏷️ 키워드 / MeSH 📖 같은 키워드 OA만
같은 제1저자의 인용 많은 논문 (5)
- PAIRWISE: Deep Learning-based Prediction of Effective Personalized Drug Combinations in Cancer.
- Potential and challenges of natural product inhaled formulations in the treatment of pulmonary diseases.
- Mechanisms of Antibreast Cancer Activity of Terpenoids from the Fruiting Body of in Cellular and Mice Models.
- Programmed Cell Death Protein 1-Interleukin-2 Bispecific Agents for Cancer Therapy.
- miR-3921 functions as a tumor suppressor and negatively regulates RIPK1 in gastric cancer.
📖 전문 본문 읽기 PMC JATS · ~91 KB · 영문
Introduction
Introduction
Colorectal cancer (CRC) is one of the most common gastrointestinal malignancies worldwide, ranking third in cancer incidence and second in cancer-related mortality [1,, 2]. Approximately 50% of CRC patients develop liver metastasis (LM) during disease progression, which is the main cause of CRC-related death [3, 4]. Despite recent advances in surgery, chemotherapy, radiotherapy, and targeted therapies, the prognosis of metastatic CRC remains poor. This is mainly due to the high heterogeneity of tumour cells, their capacity for immune evasion, and treatment resistance [1, 5, 6]. Large-scale gene expression studies have established molecular classification systems for CRC—most notably, the Consensus Molecular Subtypes (CMS), which stratify CRC into four subtypes (CMS1-4) [7]. However, these classifications are primarily based on bulk sequencing data and cannot precisely resolve the complex cellular heterogeneity of the tumour microenvironment.
In the pathogenesis of CRC, epithelial cells, which are the normal cell type in which oncogenic events arise and initiate tumorigenesis, are central to disease progression. In the normal intestine, epithelial cells maintain highly differentiated structures and functional homeostasis. However, during carcinogenesis, their transcriptional programs are dramatically remodelled, resulting in the progressive acquisition of stemness, proliferation, migration, and other malignant features [8, 9]. Previous studies have shown that CRC epithelial cells can increase their invasiveness and ability to adapt to distant sites by activating epithelial–mesenchymal transition (EMT), adapting to hypoxic and inflammatory stress environments, or reacquiring stem cell-like plasticity [10, 11]. These state transitions not only involve shifts in cell fate but also may drive immune evasion, chemoresistance, and the formation of a prometastatic microenvironment. However, there is still a lack of integrative research frameworks regarding the distribution patterns of these “epithelial plasticity states” across different disease stages and spatial regions, their underlying transcriptional regulatory mechanisms, and their interactions with the tumour immune microenvironment or stromal cells.
SCAND1 is a SCAN domain-only member of the conserved SCAN transcription factor family, most of which are SCAN–zinc finger (SCAN-ZF) proteins [12–14]. The SCAN domain mediates oligomerization; SCAND1 lacks zinc fingers but hetero-oligomerizes with SCAN-ZFs such as MZF1 (ZSCAN6/ZNF42) to form repressive complexes that counteract SCAN-ZF transactivation [15–17]. Functionally, MZF1 activates the co-chaperone CDC37 to promote cancer progression, whereas SCAND1 represses CDC37 by binding MZF1, acting as a potential tumor suppressor; SCAND1–MZF1 oligomers can also reverse EMT, tumor growth, and migration by repressing EMT drivers and MAPK genes [16–18]. Clinically, MZF1 often correlates with poor outcomes, while SCAND1/MZF1 expression can associate with better prognosis in specific contexts, indicating context-dependent oncogenic versus suppressor roles [17]. Despite these advances, SCAND1’s role in colorectal cancer (CRC) remains unclear.
Recent advances in single-cell RNA sequencing (scRNA-seq) and spatial transcriptomics have enabled detailed characterization of cellular states and their spatial organization in complex tissues [19–22]. While these technologies have been used to map the immune microenvironment and CAF subsets in CRC [23, 24], the diversity and remodelling of epithelial states during metastasis remain underexplored. In this study, we integrated single-cell and spatial transcriptomic data from primary CRC, normal colorectal tissue, and liver metastases to systematically characterize epithelial lineage composition, transcriptional state transitions, and spatial distribution. Using hdWGCNA and pySCENIC, we identified epithelial coexpression modules and key regulators, highlighting metastasis-associated programs dominated by Wnt signalling, ribosome biogenesis, and cell cycle activation. CNV inference, trajectory, and spatial analyses further revealed that epithelial cells with high CNV burdens presented increased transcriptional plasticity and heterogeneity and extensive signalling interactions with the immune microenvironment. Our work reveals the dynamic remodelling of epithelial plasticity in CRC and provides a theoretical basis for targeting key epithelial states to inhibit metastasis.
Colorectal cancer (CRC) is one of the most common gastrointestinal malignancies worldwide, ranking third in cancer incidence and second in cancer-related mortality [1,, 2]. Approximately 50% of CRC patients develop liver metastasis (LM) during disease progression, which is the main cause of CRC-related death [3, 4]. Despite recent advances in surgery, chemotherapy, radiotherapy, and targeted therapies, the prognosis of metastatic CRC remains poor. This is mainly due to the high heterogeneity of tumour cells, their capacity for immune evasion, and treatment resistance [1, 5, 6]. Large-scale gene expression studies have established molecular classification systems for CRC—most notably, the Consensus Molecular Subtypes (CMS), which stratify CRC into four subtypes (CMS1-4) [7]. However, these classifications are primarily based on bulk sequencing data and cannot precisely resolve the complex cellular heterogeneity of the tumour microenvironment.
In the pathogenesis of CRC, epithelial cells, which are the normal cell type in which oncogenic events arise and initiate tumorigenesis, are central to disease progression. In the normal intestine, epithelial cells maintain highly differentiated structures and functional homeostasis. However, during carcinogenesis, their transcriptional programs are dramatically remodelled, resulting in the progressive acquisition of stemness, proliferation, migration, and other malignant features [8, 9]. Previous studies have shown that CRC epithelial cells can increase their invasiveness and ability to adapt to distant sites by activating epithelial–mesenchymal transition (EMT), adapting to hypoxic and inflammatory stress environments, or reacquiring stem cell-like plasticity [10, 11]. These state transitions not only involve shifts in cell fate but also may drive immune evasion, chemoresistance, and the formation of a prometastatic microenvironment. However, there is still a lack of integrative research frameworks regarding the distribution patterns of these “epithelial plasticity states” across different disease stages and spatial regions, their underlying transcriptional regulatory mechanisms, and their interactions with the tumour immune microenvironment or stromal cells.
SCAND1 is a SCAN domain-only member of the conserved SCAN transcription factor family, most of which are SCAN–zinc finger (SCAN-ZF) proteins [12–14]. The SCAN domain mediates oligomerization; SCAND1 lacks zinc fingers but hetero-oligomerizes with SCAN-ZFs such as MZF1 (ZSCAN6/ZNF42) to form repressive complexes that counteract SCAN-ZF transactivation [15–17]. Functionally, MZF1 activates the co-chaperone CDC37 to promote cancer progression, whereas SCAND1 represses CDC37 by binding MZF1, acting as a potential tumor suppressor; SCAND1–MZF1 oligomers can also reverse EMT, tumor growth, and migration by repressing EMT drivers and MAPK genes [16–18]. Clinically, MZF1 often correlates with poor outcomes, while SCAND1/MZF1 expression can associate with better prognosis in specific contexts, indicating context-dependent oncogenic versus suppressor roles [17]. Despite these advances, SCAND1’s role in colorectal cancer (CRC) remains unclear.
Recent advances in single-cell RNA sequencing (scRNA-seq) and spatial transcriptomics have enabled detailed characterization of cellular states and their spatial organization in complex tissues [19–22]. While these technologies have been used to map the immune microenvironment and CAF subsets in CRC [23, 24], the diversity and remodelling of epithelial states during metastasis remain underexplored. In this study, we integrated single-cell and spatial transcriptomic data from primary CRC, normal colorectal tissue, and liver metastases to systematically characterize epithelial lineage composition, transcriptional state transitions, and spatial distribution. Using hdWGCNA and pySCENIC, we identified epithelial coexpression modules and key regulators, highlighting metastasis-associated programs dominated by Wnt signalling, ribosome biogenesis, and cell cycle activation. CNV inference, trajectory, and spatial analyses further revealed that epithelial cells with high CNV burdens presented increased transcriptional plasticity and heterogeneity and extensive signalling interactions with the immune microenvironment. Our work reveals the dynamic remodelling of epithelial plasticity in CRC and provides a theoretical basis for targeting key epithelial states to inhibit metastasis.
Methods
Methods
Data collection
Single-cell RNA sequencing (scRNA-seq) data were obtained from the GEO database (accession numbers: GSE132465, GSE178318, and GSE200997). The dataset includes a total of 18 normal colorectal tissue samples, 45 colorectal cancer (CRC) samples, and 6 colorectal cancer liver metastasis (LM) samples. After quality control, a total of 203,379 high-quality sequenced cells were analyzed. Spatial transcriptomics samples were obtained from http://www.cancerdiversity.asia/scCRLM/and GSE284061. Gene expression datasets (GSE87211, GSE71187, GSE41258, GSE44076, GSE103512 and GSE106582) were obtained from the Gene Expression Omnibus (GEO; https://www.ncbi.nlm.nih.gov/geo/) as of October 2023. Diagnostic curves of a single hub gene were drawn using “pROC” package to evaluate the sensitivity and specificity of a single hub gene in these datasets. Bulk RNA-seq expression profiles and associated clinical data for colorectal cancer patients were acquired from The Cancer Genome Atlas (TCGA; https://portal.gdc.cancer.gov/).
scRNA-seq data processing
We used Seurat v4.1.3 for preprocessing, quality control, normalization, and dimensionality reduction and clustering of single-cell data. Doublets were identified and removed using DoubletFinder v2.0.3. Quality control criteria included a minimum of 400 genes expressed per cell and a mitochondrial gene content threshold of 20%. Normalization, identification of highly variable genes, and clustering were performed using Seurat’s default parameters and workflow. Batch correction and data integration across samples were performed using Harmony v1.2.3. Cell clusters were annotated using marker genes collected from the literature and manual curation. Differentially expressed genes (DEGs) between subpopulations were identified using the FindAllMarkers function based on the Wilcoxon rank-sum test, with selection criteria of adjusted p-value < 0.05 and |log2FC| > 0.5.
Spatial transcriptomics data processing
Spatial transcriptomics data were processed using Seurat v4.3.0. Raw spatial expression matrices were normalized using SCTransform, followed by PCA for dimensionality reduction. The top 30 principal components were used for graph-based neighborhood detection and clustering. UMAP was used for spatial domain visualization and to display spatial feature distributions. For scRNA-seq data with cell type annotation, log-normalization with a scale factor of 10,000 was applied, and 3,000 highly variable genes were identified using the “vst” method. Data were scaled and reduced using PCA. UMAP embeddings were generated based on the top 30 Harmony components. Graph-based clustering was performed using Harmony reduction and the Louvain algorithm (resolution = 0.5).
Cell type identification
We used Seurat’s FindAllMarkers function to perform differential gene expression analysis within each cluster to identify marker genes. Thresholds for defining marker genes were: adjusted p-value < 0.05, expression percentage > 0.25, and |log2 fold change| > 0.25. Cell clusters were annotated using the SingleR package based on marker gene composition and reference datasets, followed by manual validation and correction using the CellMarker database.
Spatial transcriptomics spot annotation by single-cell deconvolution
To deconvolute cell type composition in spatial transcriptomics data, we used the Robust Cell Type Decomposition (RCTD) algorithm implemented in the spacexr R package. Cell type labels from annotated scRNA-seq data were used as references. The reference matrix included raw counts, cell type annotations, and total UMI counts. For each spatial sample, spot-level count matrices, spatial coordinates, and UMI counts were used to create a spatial RNA object. RCTD was run in ‘full’ mode to flexibly assign cell types to each spatial spot. The resulting cell type weight matrix was integrated into Seurat’s spatial metadata. RCTD was also rerun in ‘doublet’ mode to identify and resolve spots containing two dominant cell types. The results were integrated into the spatial object for downstream analysis and visualization.
Cell–cell communication
We used the CellChat R package (v1.1.3) to infer cell–cell communication based on ligand–receptor interactions [25]. Communication probability and number of interactions were calculated to build a communication network. Interaction strength and frequency between any two cell groups were visualized. Scatter plots were used to show major sender (signal source) and receiver (target) cell types in 2D space, helping identify the key contributors to outgoing and incoming signals within immune cell populations. A global communication model was applied to identify how multiple immune cell types and signaling pathways coordinate.
Pseudotime trajectory analysis
Pseudotime trajectories were constructed using Monocle 2 for single-cell trajectory analysis [26]. The algorithm reduces high-dimensional expression data into a low-dimensional space and arranges cells along trajectories with branching points. Dynamic expression heatmaps were generated using the plot_pseudotime_heatmap function.
8. Gene set enrichment analysis (GSEA)
GSEA was performed using the GSEApy package. DEGs were ranked by log2 fold change in descending order to create a pre-ranked gene list. Hallmark gene sets were obtained from the MSigDB database (v2024.1, category: h.all) via the gseapy.Msigdb() interface. GSEA was conducted using gseapy.prerank with 1,000 permutations. Gene sets with sizes between 1 and 10,000 were included. The output included normalized enrichment scores (NES), nominal p-values, and FDR q-values. A fixed random seed ensured reproducibility.
Inference of copy number variation (CNV)
Single-cell CNV inference was performed using the inferCNV R package (v1.20.0). Gene genomic positions were curated and deduplicated. Raw count matrices and cell type annotations were extracted from the Seurat object, and cell type metadata was saved as a group annotation file. Immune cells (T cells, B cells, and myeloid cells) were used as the reference group. The analysis used a cutoff of 0.1 for UMI-based data (10x Genomics). Hierarchical clustering was performed using the “ward.D2” method. Denoising was enabled, and the HMM option was disabled to improve computational efficiency.
pySCENIC analysis
Gene regulatory network (GRN) analysis was conducted using the pySCENIC pipeline. Raw Loom-format expression data and a curated list of human transcription factors were input to GRNBoost2 to infer co-expression modules. Motif enrichment analysis was performed using a TF motif ranking database and annotation file in the ctx step with dropout masking and parallelization. Regulon activity was quantified per cell using the AUCell algorithm to generate a regulon-by-cell activity matrix.
hdWGCNA analysis
Gene co-expression networks in CRC epithelial cells were analyzed using the hdWGCNA package. A Seurat object containing epithelial cells was used to construct metacells with parameters: k = 25, assay = ‘SCT’, max_shared = 20, min_cells = 300, reduction = ‘umap’, slot = ‘data’. Cells of the same epithelial subtype within each sample were pooled to form metacells. Networks were built using ConstructNetwork with a soft-power threshold of 14. Functional enrichment of gene modules was performed using the EnrichR package.
Spatial transcriptomic analysis with SPATA2
Spatial transcriptomic data were analyzed using the SPATA2 R package. After log-normalization, highly variable genes were identified using the “vst” method. Spatially variable genes were detected using SPARK-X (p < 0.01). PCA was used for dimensionality reduction, followed by k-means clustering to define spatial domains. Spatial trajectories were either manually drawn or defined by spatial coordinates. Genes with significant expression gradients along these trajectories were identified using spatialTrajectoryScreening (FDR < 0.05). Both random and non-random gradients were visualized across tissues. Top genes were selected based on model fit for downstream analysis.
Clinical samples
Forty matched samples of colorectal cancer tissues and their corresponding adjacent normal tissues were collected from patients undergoing surgical resection at Yangpu Hospital, affiliated with Tongji University, between November 2018 and November 2019. The study protocol was approved by the Ethics Committee of Yangpu Hospital (Approval No. LL-2023-LW-012), and written informed consent was obtained from all participants.
RT-qPCR
Total RNA was isolated from matched tumor and adjacent normal tissue samples using TRIzol reagent (Invitrogen), followed by reverse transcription into complementary DNA (cDNA) using a commercial kit (Takara, Dalian, China), according to the manufacturer’s instructions. Quantitative real-time PCR (RT-qPCR) was subsequently carried out with gene-specific primers: SCAND1: 5’-CGCAGAGAAGCCAGAGACTT-3’ and 5’-TCAGCACTGCGTCTGCACC-3’; β-actin: 5’-CCTGGCACCCAGCACAAT-3’ and 5’-GGGCCGGACTGTCATAC-3’.
Western blot assay and immunohistochemistry (IHC)
Western blotting and immunohistochemistry (IHC) were independently performed in triplicate. For protein extraction, tissues were lysed in RIPA buffer (Solarbio, China) with protease inhibitors (1:100, Thermo Scientific). For IHC, Formalin-fixed, paraffin-embedded tissue blocks were sectioned at 4 μm thickness. Sections were dewaxed, rehydrated, and subjected to antigen retrieval using a pressure cooker for 30 minutes. Endogenous peroxidase activity was blocked using 3% hydrogen peroxide for 20 minutes. Non-specific binding was minimized with 5% BSA for 40 minutes. Sections were incubated overnight at 4 °C and visualization was achieved using a DAB detection kit and counterstaining with hematoxylin. The primary antibodies are used as follows: SCAND1 (1:1,000; Thermofisher, PA5-49816), β-actin (1:4,000; ProteinTech, 66009–1-Ig), ZEB2(1:1,000; ProteinTech, 14026–1-AP), Twist1(1:1,000; ProteinTech, 25465–1-AP), N-Cadherin(1:1,000; ProteinTech, 22018–1-AP), E-Cadherin(1:1,000; ProteinTech, 20874–1-AP), α-SMA(1:1,000; ProteinTech, 14395–1-AP), Vimentin(1:1,000; ProteinTech, 10366–1-AP), β-tubulin (1:4,000; ProteinTech, 10094–1-AP).
Multiplex immunohistochemistry staining (mIHC)
Paraffin-embedded tissue sections were deparaffinized in xylene and rehydrated using graded ethanol series. Antigen retrieval was carried out in EDTA buffer, followed by quenching of endogenous peroxidase with hydrogen peroxide. Non-specific binding was blocked using 3% bovine serum albumin (BSA). Sections were then sequentially incubated with primary antibodies against S100A4 (1:500, Proteintech, 16105–1-AP), CEBPB (1:500, Proteintech, 83791–6-RR), and ARGLU1 (1:500, Proteintech, 28155–1-AP), PanCK (1:400, ZSGB, ZM0069, 1:200), FOSL1(Proteintech, Ag25788), followed by species-matched secondary antibodies. Tyramide signal amplification was applied post-incubation, with antigen retrieval and antibody stripping repeated between cycles. Nuclei were counterstained with DAPI, and slides were mounted in antifade medium. LM samples, matched primary tumors, and corresponding adjacent normal tissues were obtained from six patients with metastatic colorectal cancer and subjected to mIHC analysis.
Cell culture and transfection
Human colorectal cancer (CRC) cell lines HCT116, LOVO, SW620, RKO along with the normal colonic epithelial cell line NCM460, were obtained from the Shanghai Institute of Biochemistry and Cell Biology. All cells were cultured in Dulbecco’s Modified Eagle Medium (DMEM) supplemented with 10% fetal bovine serum (FBS; Gibco, Carlsbad, CA, USA) and maintained at 37 °C in a humidified atmosphere containing 5% CO₂. Transfection was performed using Lipofectamine 3000 (Invitrogen, Carlsbad, CA, USA) with either a SCAND1-specific siRNA or a negative control (GeneChem, Shanghai, China). Cells were harvested for subsequent experiments 48 hours post-transfection. Each experiment was conducted in triplicate. For overexpression studies, a plasmid encoding SCAND1 was custom-synthesized by GenePharma (Shanghai, China). The sequences of the siRNAs are provided in Table S1.
Proliferation and apoptosis assays
To evaluate DNA synthesis activity, colorectal cancer (CRC) cell lines were incubated with 5-ethynyl-2’-deoxyuridine (EDU) at a final concentration of 50 μM. After a 30-minute exposure, nuclear DNA was counterstained with Hoechst 33342, and EDU-positive cells were visualized microscopically. HCT116 cells overexpressing SCAND1 and SW620 cells with SCAND1 knockdown were dissociated into single-cell suspensions using 0.25% trypsin. Cisplatin(1 μmol/L) was used as apoptosis inducer. Subsequently, the cells were stained with Annexin V-APC and 7-Aminoactinomycin D (7-AAD), and apoptotic populations were quantified by flow cytometry.
Cell viability assay
Cellular viability was evaluated via the Cell Counting Kit-8 (CCK-8) assay. HCT116 cells were plated in 96-well plates at 5 × 103 cells/well and incubated for 48 hours prior to absorbance measurement. Viability was determined using the CCK-8 kit (Beyotime Biotechnology, China), with optical density at 450 nm quantified on a SpectraMax i5x microplate reader (Molecular Devices).
Migration and invasion assays
Migration and invasion assays were performed using 24-well Transwell chambers (Nest, China). Cells were seeded in serum-free DMEM (250 μL) into the upper chamber, and 600 μL of DMEM with 10% FBS was added to the lower chamber. For invasion assays, inserts were pre-coated with Matrigel (2 mg/mL). After 24 hours, non-migrated/invaded cells were removed, and cells on the lower membrane surface were fixed with 4% paraformaldehyde and stained with crystal violet for 10 minutes. Cells were quantified in five non-overlapping fields under a microscope (Nikon, Japan). Confluent cells were scratched using a 10 μL pipette tip for the wound healing assay and cultured in a serum-free medium. Images were taken at 0 and 24 hours using phase-contrast microscopy to assess wound closure.
Cell co-culture assays
To investigate the functional interactions between T cells and CRC cells, indirect co-culture experiments were performed using a Transwell system (Corning, USA) with 0.4 μm pore-size polycarbonate membrane inserts, allowing soluble factor exchange while preventing direct cell-cell contact. The system was set up in 6-well plates. Jurkat cells (Human T cells) were obtained from commercial sources (QS-BIO, #BY-0094).
For each co-culture condition, 5 × 104 HCT116–SCAND1-OE or control group were seeded into the upper chamber (insert) in 1.5 mL of RPMI 1640 medium (Yamei, CB004) supplemented with 10% fetal bovine serum (FBS, Yamei, CY106) and 1% penicillin-streptomycin (Yamei, CB010). Cells were allowed to adhere overnight at 37 °C in a humidified 5% CO₂ incubator. Concurrently, 1 × 105 Jurkat cells (at a 2:1 Jurkat cell-to-cancer cell ratio) were seeded into the lower chamber in 2.5 mL of the same medium. Co-cultures were incubated for 48 h under standard conditions. All experimental conditions were performed in biological triplicate. Cellular viability and apoptosis assays of HCT116 cells was evaluated after 48 h of indirect co-culture.
Statistical analysis
All statistical analyses and visualizations were performed in R v4.1.3. Pearson correlation coefficients were used to assess relationships between continuous variables. For quantitative comparisons, two-tailed unpaired Student’s t-tests or one-way ANOVA with Tukey’s post hoc test were used. A p-value < 0.05 was considered statistically significant.
Data collection
Single-cell RNA sequencing (scRNA-seq) data were obtained from the GEO database (accession numbers: GSE132465, GSE178318, and GSE200997). The dataset includes a total of 18 normal colorectal tissue samples, 45 colorectal cancer (CRC) samples, and 6 colorectal cancer liver metastasis (LM) samples. After quality control, a total of 203,379 high-quality sequenced cells were analyzed. Spatial transcriptomics samples were obtained from http://www.cancerdiversity.asia/scCRLM/and GSE284061. Gene expression datasets (GSE87211, GSE71187, GSE41258, GSE44076, GSE103512 and GSE106582) were obtained from the Gene Expression Omnibus (GEO; https://www.ncbi.nlm.nih.gov/geo/) as of October 2023. Diagnostic curves of a single hub gene were drawn using “pROC” package to evaluate the sensitivity and specificity of a single hub gene in these datasets. Bulk RNA-seq expression profiles and associated clinical data for colorectal cancer patients were acquired from The Cancer Genome Atlas (TCGA; https://portal.gdc.cancer.gov/).
scRNA-seq data processing
We used Seurat v4.1.3 for preprocessing, quality control, normalization, and dimensionality reduction and clustering of single-cell data. Doublets were identified and removed using DoubletFinder v2.0.3. Quality control criteria included a minimum of 400 genes expressed per cell and a mitochondrial gene content threshold of 20%. Normalization, identification of highly variable genes, and clustering were performed using Seurat’s default parameters and workflow. Batch correction and data integration across samples were performed using Harmony v1.2.3. Cell clusters were annotated using marker genes collected from the literature and manual curation. Differentially expressed genes (DEGs) between subpopulations were identified using the FindAllMarkers function based on the Wilcoxon rank-sum test, with selection criteria of adjusted p-value < 0.05 and |log2FC| > 0.5.
Spatial transcriptomics data processing
Spatial transcriptomics data were processed using Seurat v4.3.0. Raw spatial expression matrices were normalized using SCTransform, followed by PCA for dimensionality reduction. The top 30 principal components were used for graph-based neighborhood detection and clustering. UMAP was used for spatial domain visualization and to display spatial feature distributions. For scRNA-seq data with cell type annotation, log-normalization with a scale factor of 10,000 was applied, and 3,000 highly variable genes were identified using the “vst” method. Data were scaled and reduced using PCA. UMAP embeddings were generated based on the top 30 Harmony components. Graph-based clustering was performed using Harmony reduction and the Louvain algorithm (resolution = 0.5).
Cell type identification
We used Seurat’s FindAllMarkers function to perform differential gene expression analysis within each cluster to identify marker genes. Thresholds for defining marker genes were: adjusted p-value < 0.05, expression percentage > 0.25, and |log2 fold change| > 0.25. Cell clusters were annotated using the SingleR package based on marker gene composition and reference datasets, followed by manual validation and correction using the CellMarker database.
Spatial transcriptomics spot annotation by single-cell deconvolution
To deconvolute cell type composition in spatial transcriptomics data, we used the Robust Cell Type Decomposition (RCTD) algorithm implemented in the spacexr R package. Cell type labels from annotated scRNA-seq data were used as references. The reference matrix included raw counts, cell type annotations, and total UMI counts. For each spatial sample, spot-level count matrices, spatial coordinates, and UMI counts were used to create a spatial RNA object. RCTD was run in ‘full’ mode to flexibly assign cell types to each spatial spot. The resulting cell type weight matrix was integrated into Seurat’s spatial metadata. RCTD was also rerun in ‘doublet’ mode to identify and resolve spots containing two dominant cell types. The results were integrated into the spatial object for downstream analysis and visualization.
Cell–cell communication
We used the CellChat R package (v1.1.3) to infer cell–cell communication based on ligand–receptor interactions [25]. Communication probability and number of interactions were calculated to build a communication network. Interaction strength and frequency between any two cell groups were visualized. Scatter plots were used to show major sender (signal source) and receiver (target) cell types in 2D space, helping identify the key contributors to outgoing and incoming signals within immune cell populations. A global communication model was applied to identify how multiple immune cell types and signaling pathways coordinate.
Pseudotime trajectory analysis
Pseudotime trajectories were constructed using Monocle 2 for single-cell trajectory analysis [26]. The algorithm reduces high-dimensional expression data into a low-dimensional space and arranges cells along trajectories with branching points. Dynamic expression heatmaps were generated using the plot_pseudotime_heatmap function.
8. Gene set enrichment analysis (GSEA)
GSEA was performed using the GSEApy package. DEGs were ranked by log2 fold change in descending order to create a pre-ranked gene list. Hallmark gene sets were obtained from the MSigDB database (v2024.1, category: h.all) via the gseapy.Msigdb() interface. GSEA was conducted using gseapy.prerank with 1,000 permutations. Gene sets with sizes between 1 and 10,000 were included. The output included normalized enrichment scores (NES), nominal p-values, and FDR q-values. A fixed random seed ensured reproducibility.
Inference of copy number variation (CNV)
Single-cell CNV inference was performed using the inferCNV R package (v1.20.0). Gene genomic positions were curated and deduplicated. Raw count matrices and cell type annotations were extracted from the Seurat object, and cell type metadata was saved as a group annotation file. Immune cells (T cells, B cells, and myeloid cells) were used as the reference group. The analysis used a cutoff of 0.1 for UMI-based data (10x Genomics). Hierarchical clustering was performed using the “ward.D2” method. Denoising was enabled, and the HMM option was disabled to improve computational efficiency.
pySCENIC analysis
Gene regulatory network (GRN) analysis was conducted using the pySCENIC pipeline. Raw Loom-format expression data and a curated list of human transcription factors were input to GRNBoost2 to infer co-expression modules. Motif enrichment analysis was performed using a TF motif ranking database and annotation file in the ctx step with dropout masking and parallelization. Regulon activity was quantified per cell using the AUCell algorithm to generate a regulon-by-cell activity matrix.
hdWGCNA analysis
Gene co-expression networks in CRC epithelial cells were analyzed using the hdWGCNA package. A Seurat object containing epithelial cells was used to construct metacells with parameters: k = 25, assay = ‘SCT’, max_shared = 20, min_cells = 300, reduction = ‘umap’, slot = ‘data’. Cells of the same epithelial subtype within each sample were pooled to form metacells. Networks were built using ConstructNetwork with a soft-power threshold of 14. Functional enrichment of gene modules was performed using the EnrichR package.
Spatial transcriptomic analysis with SPATA2
Spatial transcriptomic data were analyzed using the SPATA2 R package. After log-normalization, highly variable genes were identified using the “vst” method. Spatially variable genes were detected using SPARK-X (p < 0.01). PCA was used for dimensionality reduction, followed by k-means clustering to define spatial domains. Spatial trajectories were either manually drawn or defined by spatial coordinates. Genes with significant expression gradients along these trajectories were identified using spatialTrajectoryScreening (FDR < 0.05). Both random and non-random gradients were visualized across tissues. Top genes were selected based on model fit for downstream analysis.
Clinical samples
Forty matched samples of colorectal cancer tissues and their corresponding adjacent normal tissues were collected from patients undergoing surgical resection at Yangpu Hospital, affiliated with Tongji University, between November 2018 and November 2019. The study protocol was approved by the Ethics Committee of Yangpu Hospital (Approval No. LL-2023-LW-012), and written informed consent was obtained from all participants.
RT-qPCR
Total RNA was isolated from matched tumor and adjacent normal tissue samples using TRIzol reagent (Invitrogen), followed by reverse transcription into complementary DNA (cDNA) using a commercial kit (Takara, Dalian, China), according to the manufacturer’s instructions. Quantitative real-time PCR (RT-qPCR) was subsequently carried out with gene-specific primers: SCAND1: 5’-CGCAGAGAAGCCAGAGACTT-3’ and 5’-TCAGCACTGCGTCTGCACC-3’; β-actin: 5’-CCTGGCACCCAGCACAAT-3’ and 5’-GGGCCGGACTGTCATAC-3’.
Western blot assay and immunohistochemistry (IHC)
Western blotting and immunohistochemistry (IHC) were independently performed in triplicate. For protein extraction, tissues were lysed in RIPA buffer (Solarbio, China) with protease inhibitors (1:100, Thermo Scientific). For IHC, Formalin-fixed, paraffin-embedded tissue blocks were sectioned at 4 μm thickness. Sections were dewaxed, rehydrated, and subjected to antigen retrieval using a pressure cooker for 30 minutes. Endogenous peroxidase activity was blocked using 3% hydrogen peroxide for 20 minutes. Non-specific binding was minimized with 5% BSA for 40 minutes. Sections were incubated overnight at 4 °C and visualization was achieved using a DAB detection kit and counterstaining with hematoxylin. The primary antibodies are used as follows: SCAND1 (1:1,000; Thermofisher, PA5-49816), β-actin (1:4,000; ProteinTech, 66009–1-Ig), ZEB2(1:1,000; ProteinTech, 14026–1-AP), Twist1(1:1,000; ProteinTech, 25465–1-AP), N-Cadherin(1:1,000; ProteinTech, 22018–1-AP), E-Cadherin(1:1,000; ProteinTech, 20874–1-AP), α-SMA(1:1,000; ProteinTech, 14395–1-AP), Vimentin(1:1,000; ProteinTech, 10366–1-AP), β-tubulin (1:4,000; ProteinTech, 10094–1-AP).
Multiplex immunohistochemistry staining (mIHC)
Paraffin-embedded tissue sections were deparaffinized in xylene and rehydrated using graded ethanol series. Antigen retrieval was carried out in EDTA buffer, followed by quenching of endogenous peroxidase with hydrogen peroxide. Non-specific binding was blocked using 3% bovine serum albumin (BSA). Sections were then sequentially incubated with primary antibodies against S100A4 (1:500, Proteintech, 16105–1-AP), CEBPB (1:500, Proteintech, 83791–6-RR), and ARGLU1 (1:500, Proteintech, 28155–1-AP), PanCK (1:400, ZSGB, ZM0069, 1:200), FOSL1(Proteintech, Ag25788), followed by species-matched secondary antibodies. Tyramide signal amplification was applied post-incubation, with antigen retrieval and antibody stripping repeated between cycles. Nuclei were counterstained with DAPI, and slides were mounted in antifade medium. LM samples, matched primary tumors, and corresponding adjacent normal tissues were obtained from six patients with metastatic colorectal cancer and subjected to mIHC analysis.
Cell culture and transfection
Human colorectal cancer (CRC) cell lines HCT116, LOVO, SW620, RKO along with the normal colonic epithelial cell line NCM460, were obtained from the Shanghai Institute of Biochemistry and Cell Biology. All cells were cultured in Dulbecco’s Modified Eagle Medium (DMEM) supplemented with 10% fetal bovine serum (FBS; Gibco, Carlsbad, CA, USA) and maintained at 37 °C in a humidified atmosphere containing 5% CO₂. Transfection was performed using Lipofectamine 3000 (Invitrogen, Carlsbad, CA, USA) with either a SCAND1-specific siRNA or a negative control (GeneChem, Shanghai, China). Cells were harvested for subsequent experiments 48 hours post-transfection. Each experiment was conducted in triplicate. For overexpression studies, a plasmid encoding SCAND1 was custom-synthesized by GenePharma (Shanghai, China). The sequences of the siRNAs are provided in Table S1.
Proliferation and apoptosis assays
To evaluate DNA synthesis activity, colorectal cancer (CRC) cell lines were incubated with 5-ethynyl-2’-deoxyuridine (EDU) at a final concentration of 50 μM. After a 30-minute exposure, nuclear DNA was counterstained with Hoechst 33342, and EDU-positive cells were visualized microscopically. HCT116 cells overexpressing SCAND1 and SW620 cells with SCAND1 knockdown were dissociated into single-cell suspensions using 0.25% trypsin. Cisplatin(1 μmol/L) was used as apoptosis inducer. Subsequently, the cells were stained with Annexin V-APC and 7-Aminoactinomycin D (7-AAD), and apoptotic populations were quantified by flow cytometry.
Cell viability assay
Cellular viability was evaluated via the Cell Counting Kit-8 (CCK-8) assay. HCT116 cells were plated in 96-well plates at 5 × 103 cells/well and incubated for 48 hours prior to absorbance measurement. Viability was determined using the CCK-8 kit (Beyotime Biotechnology, China), with optical density at 450 nm quantified on a SpectraMax i5x microplate reader (Molecular Devices).
Migration and invasion assays
Migration and invasion assays were performed using 24-well Transwell chambers (Nest, China). Cells were seeded in serum-free DMEM (250 μL) into the upper chamber, and 600 μL of DMEM with 10% FBS was added to the lower chamber. For invasion assays, inserts were pre-coated with Matrigel (2 mg/mL). After 24 hours, non-migrated/invaded cells were removed, and cells on the lower membrane surface were fixed with 4% paraformaldehyde and stained with crystal violet for 10 minutes. Cells were quantified in five non-overlapping fields under a microscope (Nikon, Japan). Confluent cells were scratched using a 10 μL pipette tip for the wound healing assay and cultured in a serum-free medium. Images were taken at 0 and 24 hours using phase-contrast microscopy to assess wound closure.
Cell co-culture assays
To investigate the functional interactions between T cells and CRC cells, indirect co-culture experiments were performed using a Transwell system (Corning, USA) with 0.4 μm pore-size polycarbonate membrane inserts, allowing soluble factor exchange while preventing direct cell-cell contact. The system was set up in 6-well plates. Jurkat cells (Human T cells) were obtained from commercial sources (QS-BIO, #BY-0094).
For each co-culture condition, 5 × 104 HCT116–SCAND1-OE or control group were seeded into the upper chamber (insert) in 1.5 mL of RPMI 1640 medium (Yamei, CB004) supplemented with 10% fetal bovine serum (FBS, Yamei, CY106) and 1% penicillin-streptomycin (Yamei, CB010). Cells were allowed to adhere overnight at 37 °C in a humidified 5% CO₂ incubator. Concurrently, 1 × 105 Jurkat cells (at a 2:1 Jurkat cell-to-cancer cell ratio) were seeded into the lower chamber in 2.5 mL of the same medium. Co-cultures were incubated for 48 h under standard conditions. All experimental conditions were performed in biological triplicate. Cellular viability and apoptosis assays of HCT116 cells was evaluated after 48 h of indirect co-culture.
Statistical analysis
All statistical analyses and visualizations were performed in R v4.1.3. Pearson correlation coefficients were used to assess relationships between continuous variables. For quantitative comparisons, two-tailed unpaired Student’s t-tests or one-way ANOVA with Tukey’s post hoc test were used. A p-value < 0.05 was considered statistically significant.
Results
Results
Cellular heterogeneity and transcriptomic perturbation in colorectal cancer and liver metastasis
To construct a comprehensive single-cell transcriptomic atlas of normal and malignant colorectal tissues, we systematically collected and analysed single-cell RNA sequencing (scRNA-seq) data from 18 normal colorectal tissue samples, 45 colorectal cancer (CRC) samples, and 6 liver metastasis (LM) samples [27–29]. After stringent quality control and preprocessing, a total of 203,379 high-quality cells were retained for downstream analysis (Fig. 1A, Figure S1A-B). To comprehensively delineate the cellular composition of normal colorectal tissue, we first performed batch effect correction across datasets, followed by graph-based unsupervised clustering, ultimately identifying 41 distinct cell clusters (Fig. 1A, Figure S1C-D). On the basis of Spearman correlation analysis and known cell type-specific marker genes [30–32], these clusters were further annotated into eight major cell types: T cells, B cells, epithelial cells (Epi), myeloid cells (Mye), plasma cells, NK cells, fibroblasts, and endothelial cells (Fig. 1B–C, Figure S1E-F). T cells were the most abundant population in our dataset (104,664 cells), followed by epithelial cells. Each cell type displayed typical and specific expression profiles, confirming the accuracy of the annotations (Fig. 1D–E, Table S4). We next quantified the proportions of each cell type in normal, CRC, and LM samples and used the MiloR tool to analyse changes in cell abundance between groups. Compared with normal tissue, CRC samples presented a significant increase in myeloid and epithelial cell abundance. Compared with normal tissues, B cells, fibroblasts, and plasma cells were decreased in the LM group, whereas myeloid and T cells were increased. These changes suggest substantial remodelling of the immune microenvironment and tissue architecture during tumour progression and metastasis (Fig. 1F–H).
We further compared the number of differentially expressed genes (DEGs) numbers for each cell type across groups to assess the extent of transcriptomic perturbation (Fig. 1I). Compared with normal tissues, both CRC and LM epithelial cells presented the greatest number of DEGs. Notably, T cells presented a much greater DEG count in LM vs. normal than in CRC vs. normal, suggesting more pronounced transcriptomic reprogramming and potential functional changes in the metastatic liver microenvironment (Table S2-S3).
To explore the functional significance of these DEGs, we performed pathway enrichment analysis using the Reactome pathway database (Fig. 1J–K, Figure S2A-B). In epithelial cells, DEGs between CRC and normal tissues were enriched mainly in translation-related pathways such as translation and eukaryotic translation initiation/elongation, reflecting the high dependency of cancer cells on protein synthesis and growth. L13a-mediated translational silencing also indicated posttranscriptional regulatory abnormalities in CRC. For T cells, DEGs in LM vs. normal samples were significantly enriched in pathways related to cellular responses to stimuli and stress, implying that T cells in liver metastases are under greater metabolic stress and functional impairment. The enrichment of stress response pathways, such as HSF1 activation, further suggests that T-cell exhaustion and immune escape occur [33, 34]. Additionally, we applied the Augur algorithm, a random forest-based tool [35], to evaluate the degree of transcriptomic perturbation in each cell type. Compared with normal tissue, epithelial cells presented the most prominent perturbation in both CRC and LM samples, whereas fibroblasts presented the greatest changes between LM and CRC samples, which were associated mainly with the coagulation and glycolysis pathways (Figure S2C-E). Given that epithelial cells are the principal origin of malignancy in colorectal cancer (CRC), these observations motivated a focused analysis of the epithelial compartment in subsequent sections.
Epithelial dynamics and CNV-linked inflammatory rewiring in CRC metastasis
We performed an in-depth analysis of the epithelial lineage. A total of 26,509 epithelial cells were analysed, and unsupervised clustering identified eight distinct subpopulations. On the basis of the literature [32] and canonical marker gene expression, these subtypes were annotated as follows: inflammatory enterocyte (Inflam.Enterocyte; CXCL1, MMP7, CEACAM6, KRT17, AREG), transit-amplifying cell (TA; KLF5, ELF3, MUC4), cycling transit-amplifying cell (Cycling TA; UBE2C, CCNB1, BIRC5, STMN1), enterocyte (KRT20, FABP1, GUCA2A, GUCA2B, MUC12), goblet cell (Goblet; MUC2, TFF3, ZG16, SPINK4), tuft cell (Tuft; TRPM5, HPGDS), secretory cell (Secretory; REG1A, REG3A, LCN2, CLDN15, PDZK1IP1), and progenitor enterocyte (OLFM4). We further quantified the proportions of each subtype across disease states. In both CRC and liver metastasis (LM) samples, the fractions of inflammatory enterocytes, transit-amplifying cells, and their cycling subsets were markedly increased compared with those in normal tissue, whereas mature enterocytes, goblet cells, and tuft cells were significantly reduced (Fig. 2A–B, Figure S3, S4A-E). These changes suggest that different epithelial subtypes have distinct biological functions and malignant potential during disease progression. To explore functional features, we applied GSVA to the MsigDB hallmark gene sets and detected enrichment of epithelial-mesenchymal transition (EMT), the inflammatory response, and KRAS signalling in inflammatory enterocytes, whereas proliferative TA cells were enriched for MYC and mTORC signalling (Fig. 2E).
To further assess the malignant potential of the epithelial subtypes, we inferred copy number variation (CNV) levels using inferCNV [36]. The results revealed significant heterogeneity in CNV levels among epithelial subtypes. CNV scores were divided into five intervals (median 0, ±1, ±2) and further categorized as normal, low, medium, or high, following previous approaches [37], and visualized using UMAP (Fig. 2C–D, Figure S4F-G). High-CNV epithelial cells exhibited significant activation of interferon responses (INTERFERON_ALPHA_RESPONSE, INTERFERON_GAMMA_RESPONSE), IL6_JAK_STAT3 signalling, oxidative phosphorylation, and reactive oxygen species pathways, indicating enhanced inflammatory signalling, metabolic reprogramming, and oxidative stress in these cells (Fig. 2E). Analysis of group-specific CNV distributions revealed that the proportion of high-CNV epithelial cells was significantly greater in LM than in CRC (Fig. 2F–G). Among the subtypes, TA, cycling TA, and progenitor enterocyte had the largest fractions of high-CNV cells (Fig. 2H).
This high-resolution structure enabled us to compare the functional states of cells with similar CNV levels across different groups, providing insights into the mechanisms of tumour progression. To reveal the molecular drivers of CRC progression and liver metastasis, we performed integrated differential gene expression analysis among normal epithelial cells (normal CNV), primary CRC cells (high CNV), and liver metastasis cells (high CNV), identifying 114 intersecting DEGs (Figure S5A). Among these, CEBPB—a core transcription factor regulating inflammation and EMT [38]—was specifically upregulated in CRC and further activated in LM, providing an additional extension to previous findings. In contrast, ARGLU1, a transcriptional coactivator involved in RNA processing and chromatin regulation [39] which had been rarely reported in previous studies of cancer, was specifically downregulated in LM, reflecting further suppression of nuclear gene regulatory networks in metastatic cells. The classical EMT marker S100A4 was significantly upregulated in both CRC and LM, underscoring its central role in promoting migration and invasion. Meanwhile, the mIHC confirmed the expression pattern of those genes in the clinical samples (Figure S6). Notably, the mitochondria-derived neuroprotective gene MTRNR2L8 was markedly suppressed in LM, which may reduce the antiapoptotic capacity of metastatic cells and affect their survival and adaptation in heterogeneous microenvironments (Fig. 2I). Additionally, we conducted GSEA for group-specific DEGs. In CRC (high CNV) vs. normal (normal CNV) samples, EMT was significantly increased, whereas oxidative phosphorylation was decreased (Figure S5B-D). In LM vs. normal, the apoptosis and EMT pathways were activated, whereas oxidative phosphorylation and fatty acid metabolism were suppressed (Figure S5E-F). In LM vs. CRC, allograft rejection was activated, and oxidative phosphorylation was downregulated (Figure S5G-I).
Distinct epithelial coexpression modules drive metastatic adaptation
To dissect the molecular changes in epithelial cells during CRC progression and metastasis, we performed high-dimensional weighted gene coexpression network analysis (hdWGCNA) [40] on all epithelial cells, identifying 12 coexpression modules and constructing cell type-specific gene interaction networks for each (Fig. 3A–B, Figure S7A-B). Modules M1 and M4 were specifically activated in normal epithelial cells. M1 was enriched for cytoskeletal vesicle transport, immune regulation, and apoptosis, with key hub genes such as KLF5 and CSKMT. M4 was associated with copper and zinc ion homeostasis, with the core genes GUCA2B, CA4, and OTOP2, suggesting a role in maintaining epithelial ion balance and barrier function [41, 42]. Notably, M1 and M4 were strongly correlated, indicating potential synergistic regulation of epithelial homeostasis (Fig. 3C–E).
The module scores map onto the UMAP distribution of epithelial subtypes, and multiple node genes within each module point to the physiological or pathological functions of these subtypes (Figure S7C). For M3—Goblet, TFF3 and SPINK4 are classic goblet cell–associated secreted molecules: TFF3 contributes to mucus-layer repair and epithelial restitution [43, 44]; SPINK4 is a serine protease inhibitor–like secreted factor that reflects goblet cell secretory activity and is implicated in intestinal barrier function and inflammation regulation [45]. In addition, REG4 is frequently overexpressed in colorectal cancer and is associated with epithelial regeneration and pro-growth signaling, suggesting a regenerative program and tumor-promoting role of the goblet/secretory lineage. For M4—Enterocyte, GUCA2A/GUCA2B indicate the ion and fluid homeostasis axis of mature absorptive epithelium [46]; OTOP2 is involved in ion transport, reflecting distal colonic absorption/secretion features and potentially linking to changes in the metabolic microenvironment in tumors [47]. For M6—Cycling TA, CCNB1 and UBE2C are key drivers of the G2/M phase and mitotic progression, indicating robust proliferative activity; BIRC5 (Survivin) is an anti-apoptotic and mitotic regulator associated with stemness, invasiveness, and poor prognosis in CRC, together delineating a highly proliferative, apoptosis-resistant pathological profile of transit-amplifying cells [48, 49].
In contrast, modules M9, M10, and M11 were specifically activated in liver metastasis (LM) epithelial cells. Functional annotation revealed that the M9 module regulated Wnt signalling and EMT, likely promoting tumour cell migration and invasion. M10 was enriched for protein synthesis and ribosome function, centred on ribosomal subunits such as RPS18 and RPL7, reflecting enhanced translational capacity in metastatic cells. M11 was linked to the cell cycle and RNA polymerase II-driven transcription, with hub genes including FOS, JUN, IER2, and JUNB, indicating increased proliferative and transcriptional reprogramming capacity in LM cells. Together, these results suggest that LM epithelial cells transition from homeostatic to malignant adaptation via activation of Wnt/EMT, protein synthesis, and cell cycle coexpression networks, conferring greater invasiveness and adaptability (Fig. 3E, Figure S7C).
Importantly, most core genes in the M9, M10, and M11 modules were closely associated with CNV levels and were upregulated in more malignant cells. For example, JUN, a classical AP-1 transcription factor, drives cell cycle progression and EMT, promoting tumour migration and drug resistance; ASCL2, a downstream Wnt target, is essential for CRC stemness and regeneration, and its upregulation may enhance self-renewal and adaptability in metastatic cells; and NKD1, a negative regulator of Wnt signalling, may reflect complex Wnt pathway regulation in tumour cells [50–52]. The high expression of these genes, which is correlated with increased CNV, highlights the molecular basis for functional diversity and adaptation in tumour progression and metastasis through coexpression network remodelling (Fig. 3F).
Transcription factor network remodelling drives CRC progression and metastatic adaptation
To systematically dissect the key regulatory mechanisms of epithelial subpopulations across CRC disease stages and metastasis, we used pySCENIC to infer transcription factor (TF) regulatory networks from single-cell transcriptomic data, revealing subtype-specific changes in TF activity (Fig. 4A, Figure S8A). Dimensionality reduction of TF activity profiles revealed clear stratification among epithelial subtypes at the regulon level (Fig. 4B). Heatmaps of the top 50 TFs (Fig. 4C) revealed that CEBPB was highly activated in inflammatory and absorptive enterocytes, which is consistent with its known roles in inflammation, tumour microenvironment remodelling, and EMT—key processes in CRC metastasis and immune escape [38, 53]. The E2F family (E2F1, E2F7, and E2F8) is specifically activated in proliferative subpopulations, whereas JUN, JUNB, and JUND (AP-1 family) are enriched in inflammatory enterocytes, corresponding to EMT, migration, and drug resistance phenotypes [50, 54, 55]. MYBL2 and NFYB are activated in cycling TA cells and are associated with the cell cycle and poor prognosis (Fig. 4D) [56–58].
We further compared TF network activation across disease groups and CNV states (Fig. 4E–F, Figure S8B-D), finding distinct TF activation patterns related to both disease stage and genomic instability. Shared modules such as E2F1, CEBPB, and FOSL1 are activated in both primary and metastatic cancer, driving proliferation, inflammation, and transcriptional reprogramming [59–61]. FOSL1, in particular, is known to regulate EMT and invasiveness [62], and its expression was validated by mIHC (Figure S6). Primary CRC also showed activation of angiogenesis- and migration-related TFs such as ERG, ETS1, ERF, and ETV2, whereas metastases displayed specific activation of FOS, CREM, FOXL1, FLI1, and ATF3, the latter promoting tumour survival and drug resistance [63–65]. High-CNV epithelial cells presented broader and more specific TF activation, especially for the E2F family (e.g., E2F2 and E2F7), suggesting that genomic instability drives sustained proliferation. FOXO3 was also highly activated in high-CNV cells. While cells with normal or low CNV maintained relatively stable TF activation, medium and high CNV states were characterized by proliferative and stress-adaptive TFs.
Linear modelling revealed that the expression of TFs, including E2F2, TRPS1, MEIS2, NFE2, and the FOXO family (increasing) and HMGA1, HMGA2, CREB2, and TFAP2A (decreasing) (Fig. 4G), tended to increase with increasing CNV, suggesting that high-CNV tumour cells adapt to adverse microenvironments by driving the cell cycle and stress response programs. In the disease groups, IKZF1 and NR3C1 were specifically activated in liver metastases (LM), indicating roles in distant metastasis and microenvironmental adaptation (Fig. 4H, Figure S8E-H) [66, 67]. Overall, our results provide a comprehensive map of TF regulatory networks in CRC epithelial cells across disease stages, metastatic states, and genomic instability, revealing how dynamic TF activity reprogramming underlies tumour progression and adaptation.
Trajectory differentiation and state-specific molecular features of epithelial subtypes during CRC progression
To systematically characterize the dynamic changes in epithelial subtypes during disease progression, we performed pseudotime analysis of eight epithelial subtypes using Monocle. This approach reconstructed the developmental and state transition trajectories of epithelial cells and identified five major cell states (Fig. 5A–B). Analysis of the distribution of each subtype across states revealed notable differences: for example, epithelial cells from the normal group were almost exclusively found in State 3, whereas those from the liver metastasis (LM) group were predominantly in States 2 and 4. Importantly, high-CNV epithelial cells were significantly enriched in States 3 and 4, suggesting that genomic instability profoundly shapes epithelial cell states (Fig. 5C–E).
To uncover the molecular mechanisms underlying state transitions, we used BEAM analysis to examine gene expression changes at key branch points. At Node 1, Cell fate 1 (towards State 4) was associated with the upregulation of stress- and malignancy-associated genes, including the stress-response TFs ATF3 and ANGPTL4, which are involved in angiogenesis and metastasis, indicating that their activation may promote tumour cell invasiveness and adaptability (Fig. 5F–G). At Node 2, Cell fate 1 (towards State 3) was characterized by sustained upregulation of RACK1, a key signalling scaffold known to promote cancer cell migration and invasion and associated with poor prognosis in CRC, underscoring its role in tumour progression. Additionally, GUCA2A and SPINK4—markers of secretory and barrier-protective epithelial lineages—were also upregulated along this branch, reflecting the involvement of secretory and progenitor enterocyte lineages in microenvironment regulation and epithelial homeostasis (Fig. 5F–H, 5K, Table S5-S6).
To validate the robustness of these pseudotime inferences, we independently reconstructed cell trajectories using the pyVIA algorithm. The results revealed that secretory and progenitor enterocyte subpopulations were located mainly at the trajectory endpoints, whereas enterocytes were concentrated at the origin (Fig. 5I–J).
Immune subset reprogramming underlies tumour progression and metastatic adaptation
Given the marked perturbation of immune responses and the microenvironment observed in tumour samples, we systematically analysed lymphoid and myeloid cell populations. Unsupervised clustering divided lymphoid cells into eight subpopulations, which were annotated on the basis of specific marker gene expression [32] (Fig. 6A–B, Figure S9A-C). Notably, effector memory CD8+ T cells and Tregs were significantly expanded in both CRC and LM samples, indicating the activation of both effector and immunosuppressive mechanisms during tumour progression and metastasis (Fig. 6C–D).
In terms of transcriptomic disturbance, Tregs presented the greatest changes in CRC vs. normal, Tph cells in LM vs. normal, and Tn/Tcm cells in LM vs. CRC (Fig. 6E). GSEA revealed that in CRC vs. normal tissues, Tregs exhibited activation of the IL6_JAK_STAT3_SIGNALING pathway, which enhances Treg immunosuppressive function and promotes tumour immune evasion [68]. In LM vs. normal, Tph cells presented significant enrichment of interferon responses (IFN-γ and IFN-α signalling), suggesting roles in antiviral and inflammatory regulation within the metastatic liver environment [69]. In LM vs. CRC, the complement pathway was strongly activated in Tn/Tcm cells, indicating the involvement of the complement system in remodelling the metastatic niche (Fig. 6F–G). Together, these results highlight that Tregs are both expanded and transcriptionally remodeled in CRC and LM, suggesting a key role in tumour development, whereas Tph cells are consistently altered during liver metastasis, indicating that they are critical for metastatic adaptation (Fig. 6E, Figure S9D-F).
Further analysis revealed that LGALS1 was upregulated in Tregs from both CRC and LM, with further elevation in LM; this gene is known to promote Treg tolerance and suppress effector T-cell function [70, 71]. Additionally, MHC II molecules (e.g., HLA-DRA and HLA-DRB1) were further activated in Tregs from LM, suggesting enhanced antigen presentation capacity and a greater contribution to tumour immune escape [72, 73] (Figure S9G). In Tph cells, the upregulation of MHC-I (B2M) and MHC-II genes (HLA-DRA, HLA-DRB1), as well as DUSP4 and TNFRSF18, indicated increased immune activity (Figure S8H-I).
For myeloid cells, we identified six subpopulations (TAMs, monocytes, cDC2s, macrophages, mature DCs, and cDC1s). Monocytes were significantly expanded in both CRC and LM, whereas cDC1s and cDC2s were reduced (Fig. 6I–J, Figure S9J). Pathway analysis revealed that macrophages and mature DCs displayed specific pathway activation in CRC and LM, participating in microenvironment remodelling via the regulation of inflammation, antigen presentation, and immunosuppression (Fig. 6K).
Inflammatory enterocytes and tams orchestrate key signalling hubs
On the basis of the identified epithelial and immune cell subsets in CRC, we systematically analysed cell-cell communication networks across different disease stages using CellChat. The results revealed that the number of cell–cell interactions was significantly greater in CRC and LM tissues than in normal tissues, with CRC exhibiting the highest levels (Fig. 7A). Directionality analysis revealed that in CRC, inflammatory enterocytes (Inflam.Enterocyte), tumour-associated macrophages (TAMs), and Th17 cells were the main signal receivers, whereas Inflam.Enterocyte and Th17 cells were also major signal senders. In contrast, in LM, effector memory T (Tem) cells were the main signal receivers, and TAMs were the most active signal senders (Fig. 7B), indicating a notable shift in immune regulation within liver metastases, where TAMs may promote immune adaptation by modulating Tem function [74].
Analysis of key signalling sources revealed that in CRC, Inflam.Enterocyte predominantly transmitted MIF, VEGF, CCL, and PDGF signalling axes. MIF and CCL are involved in immune cell recruitment and inflammation, whereas VEGF and PDGF directly participate in angiogenesis and tissue remodelling [75–77]. The active engagement of these pathways suggests that inflammatory epithelial cells may drive both immune evasion and tumour angiogenesis in a coordinated manner (Fig. 7C). In LM, TAMs communicate with surrounding cells mainly via SPP1 (osteopontin), GALECTIN, and BAFF signalling, indicating a role for TAMs in promoting microenvironmental adaptation and immune suppression through immune modulation, cell adhesion, and B-cell survival [78–81]. With respect to signal reception, Inflam.Enterocyte in CRC notably received IGFBP (insulin-like growth factor-binding protein) and PAR (protease-activated receptor) signals, which may increase epithelial cell survival and regeneration [82, 83]; in LM, Tem cells mainly received SPP1 and GALECTIN signals, suggesting that their functional state is profoundly influenced by TAM-derived signals, potentially affecting their antitumour activity or exhaustion status (Fig. 7D).
CRC ligand-receptor pair analysis revealed that Inflam. Enterocyte communicated with other cells via the MDK pathway axis and uniquely activated interactions such as LGALS9–P4HB/CD44, distinguishing them from TA and Th17 cells. LGALS9 is known to induce T-cell apoptosis and immune tolerance, potentially contributing to immune suppression in the TME [84]. On the receiving end, Th17 cells specifically received MIF-related signals. Previous studies have shown that MIF regulates Th17 cell differentiation and function, and our findings support a key role for MIF-driven Th17 responses in the CRC immune microenvironment [85, 86]. We observed significant enhancement of MIF and VEGF signalling in CRC. MIF is involved in inflammatory responses, immune cell recruitment, and tumour immune escape, whereas VEGF is a central driver of tumour angiogenesis [75, 77]. Our data indicate that Inflam.Enterocyte and TA cells are the main sources of MIF signalling, primarily via the MIF-(CD74+CXCR4) and MIF-(CD74+CD44) receptor pairs. For VEGF signaling, Inflam.Enterocytes and monocytes were the main senders, with endothelial cells as the principal receivers, predominantly via the VEGFA–FLT1 (VEGFR1) and VEGFA–KDR (VEGFR2) interactions (Fig. 7E–H).
Beyond CRC, we examined ligand–receptor pairs between intestinal epithelial subtypes and key immune communicators in Normal (B cells, Tn/Tcm) and LM (Tem, TAM, and CTL) samples (Figure S10). In Normal tissues, epithelial subtypes acted as signal senders predominantly via the MIF and LGALS9 pathways, with Cycling TA, Inflam.Enterocyte, and Progenitor enterocyte being the main sources. MIF signals likely operate through the CD74–CXCR4 or CD74–CD44 axes to regulate recruitment and survival of lamina propria immune cells; LGALS9, via P4HB/CD44, lowers T-cell activation thresholds, thereby preventing excessive inflammation and supporting barrier function. In LM, MIF and LGALS9 remained the dominant outgoing signals from epithelial cells but were more frequent and of higher intensity, indicating a shift of epithelial programs toward immune modulation within the hepatic metastatic niche: enhanced MIF signalling promotes inflammatory cascades and immune-cell reprogramming, while the LGALS9 axis drives T-cell suppression/exhaustion, forming a positive feedback loop with TAM enrichment. Epithelial subtypes primarily received PPIA–BSG signals, with PPIA expressed on the epithelial side responding to BSG on immune cells. This pathway likely mediates epithelial–matrix remodelling and increased motility via CD147 (BSG)-dependent activation of MAPK/ERK and MMPs [87, 88]. The same mechanism is present—and more pronounced—in LM, suggesting cooperation with the amplified MIF/LGALS9 axes to drive tissue remodelling and metastasis-related plasticity.
Tumour-associated epithelial gradients and spatial signalling networks
To further validate the spatial distribution and interactions of key epithelial subpopulations identified at the single-cell level, we analysed spatial transcriptomic data from three CRC patients in the GEO database [30, 89]. Using previously constructed single-cell annotations, we applied the RCTD algorithm to deconvolute the spatial transcriptomic data and infer the cell type composition at each spot. The results revealed that TAs (transit-amplifying cells) and Inflam.Enterocyte (inflammatory enterocytes) were the predominant epithelial subtypes within tumour regions, highlighting their close association with tumourigenesis and microenvironmental remodelling (Fig. 8A–G, Figure S11A-B).
We then used the SPATA platform to perform spatial gradient gene expression analysis from tumour-enriched areas to distant normal regions. With increasing distance from the tumour, the expression of the chemokines CXCL2 and CXCL3 and the tight junction protein CLDN1 gradually decreased, whereas the levels of the secreted proteins SFRP2 and SFRP4 increased. CXCL2 and CXCL3 are classic inflammation-related chemokines that recruit neutrophils and other immune cells, and CLDN1 is involved in epithelial barrier alterations; their high expression in tumour zones indicates a highly inflammatory and structurally remodelled microenvironment [90, 91]. SFRP2 and SFRP4, antagonists of Wnt signalling, are upregulated in distant normal regions, suggesting that normal epithelial cells may restrict aberrant proliferation through Wnt inhibition [92, 93].
To nominate genes with both malignant relevance and spatial informativeness, we intersected genes that increase monotonically from Normal to CRC to LM (Fig. 2I) with the top spatial gradient genes identified by SPATA, focusing on those expressed within epithelial compartments. For example, ASCL2 and TM4SF1 were more highly expressed near tumours and decreased with distance. ASCL2 is a key transcription factor for intestinal stem and transit-amplifying cells, indicating enhanced stemness and regenerative capacity near tumours, whereas TM4SF1 is associated with epithelial migration and invasion, supporting an active migratory state within tumour regions [94, 95] (Fig. 8H, Figure S11D-E).
Cell-cell communication analysis of the spatial transcriptomic data revealed that TA and goblet cells were central hubs in the tumour tissue network (Fig. 8I, Figure S11C). Pathway analysis revealed spatial enrichment of TGFβ, WNT, CXCL, and EGF signalling, indicating that these pathways play pivotal roles in regulating intercellular interactions, tumour growth, and immune modulation (Fig. 8J–M, Figure S11F-L). Collectively, these results demonstrate that specific tumour-associated epithelial subtypes orchestrate spatial signalling networks, driving microenvironmental inflammation, cellular remodelling, and tissue homeostasis changes, thus providing new molecular insights into spatial heterogeneity and potential therapeutic targets in CRC.
SCAND1 as a novel biomarker for diagnosis and prognosis in colorectal cancer
Using SPATA, we identified genes exhibiting gradient expression patterns across the tumor parenchymal region (Fig. 8H). We then intersected this gene list with genes previously found to be upregulated in high-CNV epithelial cells from both CRC and LM groups in our scRNA-seq analysis (Fig. 2I). This analysis highlighted SCAND1, which not only displays a distinct spatial gradient but also shows progressively increased expression in high-CNV cells from CRC to LM. Moreover, SCAND1 serves as a hub gene in the LM-specific coexpression module M9(Figure 3E). Based on these findings, we further investigated the functional role of SCAND1. We first confirmed elevated SCAND1 expression in both paired and unpaired tumor tissues within the TCGA database (Fig. 9A). Consistent results were further validated by Western blotting (n = 24) (Fig. 9B), RT-qPCR (n = 40) (Fig. 9C), and immunohistochemistry (n = 6) (Fig. 9D).
Logistic regression analysis revealed that elevated SCAND1 expression was significantly associated with advanced T stage (p = 0.045) (Fig. 9E). TCGA and our validation cohorts confirmed higher SCAND1 levels in patients with lower T stages (Figs. 9F–9G).
Subsequently, receiver operating characteristic (ROC) curve analysis demonstrated that SCAND1 exhibited robust diagnostic performance in the TCGA colorectal cancer dataset (Fig. 9H), as well as in the separate colon and rectal cancer cohorts (Fig. 9I–J). These findings were further corroborated across multiple GSE datasets (Fig. 9K). Survival analysis confirmed that higher SCAND1 expression was associated with poorer prognosis (Fig. 9L–N), which was validated in the patient cohort from Yangpu District Central Hospital (Fig. 9O). To better assess the prognostic value of SCAND1, stratified analyses were performed across various clinical subgroups. The results revealed that patients with high SCAND1 expression had significantly poorer prognosis in the female, T3&T4, M0, age > 65, Stage II&III&IV, as well as N0&N1 subgroups (Figs. 9P–9>U, Figure S12).
SCAND1 enhances malignant behaviors of CRC cells and confers resistance to T cell-mediated cytotoxicity
SCAND1 protein expression was first examined in the normal colonic epithelial cell line NCM460 and four CRC cell lines (HCT116, SW620, RKO, and LOVO). Compared with NCM460, all four cancer cell lines exhibited increased expression, with the lowest level observed in HCT116 and the highest in SW620 (Fig. 10A). In order to explore the functions of SCAND1 in CRC, it was knocked down by siRNA in SW620 and overexpressed in HCT116, and the efficiency was verified by western blotting and RT-qPCR (Fig. 10B-C). EDU results showed that SW620 had a decreased proliferation, while HCT116 had an increased viability (Fig. 10D). Furthermore, we detected the effect of SCAND1 on CRC apoptosis, and our study indicated that overexpression of SCAND1 significantly reduced the apoptosis rate of CRC cells, while knockdown of SCAND1 significantly increased the apoptosis rate of CRC cells (Fig. 10E). The wound healing assay showed a marked decrease in cell migration following SCAND1 knockdown (Fig. 10F) and an increase after its overexpression. Consistent with the results, transwell assays verified that SCAND1 knockdown inhibited SW620 invasion and migration, and its overexpression in HCT116 had the opposite trend (Fig. 10 G). In addition, we examined the expression of key epithelial–mesenchymal transition (EMT) markers, including ZEB2, E-cadherin, N-cadherin, vimentin, α-SMA, and Twist1. We found that SCAND1 overexpression reduced E-cadherin levels while increasing the expression of ZEB2, N-cadherin, vimentin, α-SMA, and Twist1. Conversely, SCAND1 knockdown produced the opposite effects. Furthermore, to investigate the changes in immune cells within the tumor microenvironment, we performed co-culture experiments of HCT116 cells overexpressing SCAND1 with Jurkat T cells, and subsequently assessed the proliferation and apoptosis of HCT116 cells. The results demonstrated that compared with the control group, Jurkat T cells significantly suppressed CRC cell proliferation and promoted apoptosis. However, SCAND1 overexpression markedly attenuated these inhibitory and pro-apoptotic effects of T cells on CRC cells (Fig. 10I-J).
Cellular heterogeneity and transcriptomic perturbation in colorectal cancer and liver metastasis
To construct a comprehensive single-cell transcriptomic atlas of normal and malignant colorectal tissues, we systematically collected and analysed single-cell RNA sequencing (scRNA-seq) data from 18 normal colorectal tissue samples, 45 colorectal cancer (CRC) samples, and 6 liver metastasis (LM) samples [27–29]. After stringent quality control and preprocessing, a total of 203,379 high-quality cells were retained for downstream analysis (Fig. 1A, Figure S1A-B). To comprehensively delineate the cellular composition of normal colorectal tissue, we first performed batch effect correction across datasets, followed by graph-based unsupervised clustering, ultimately identifying 41 distinct cell clusters (Fig. 1A, Figure S1C-D). On the basis of Spearman correlation analysis and known cell type-specific marker genes [30–32], these clusters were further annotated into eight major cell types: T cells, B cells, epithelial cells (Epi), myeloid cells (Mye), plasma cells, NK cells, fibroblasts, and endothelial cells (Fig. 1B–C, Figure S1E-F). T cells were the most abundant population in our dataset (104,664 cells), followed by epithelial cells. Each cell type displayed typical and specific expression profiles, confirming the accuracy of the annotations (Fig. 1D–E, Table S4). We next quantified the proportions of each cell type in normal, CRC, and LM samples and used the MiloR tool to analyse changes in cell abundance between groups. Compared with normal tissue, CRC samples presented a significant increase in myeloid and epithelial cell abundance. Compared with normal tissues, B cells, fibroblasts, and plasma cells were decreased in the LM group, whereas myeloid and T cells were increased. These changes suggest substantial remodelling of the immune microenvironment and tissue architecture during tumour progression and metastasis (Fig. 1F–H).
We further compared the number of differentially expressed genes (DEGs) numbers for each cell type across groups to assess the extent of transcriptomic perturbation (Fig. 1I). Compared with normal tissues, both CRC and LM epithelial cells presented the greatest number of DEGs. Notably, T cells presented a much greater DEG count in LM vs. normal than in CRC vs. normal, suggesting more pronounced transcriptomic reprogramming and potential functional changes in the metastatic liver microenvironment (Table S2-S3).
To explore the functional significance of these DEGs, we performed pathway enrichment analysis using the Reactome pathway database (Fig. 1J–K, Figure S2A-B). In epithelial cells, DEGs between CRC and normal tissues were enriched mainly in translation-related pathways such as translation and eukaryotic translation initiation/elongation, reflecting the high dependency of cancer cells on protein synthesis and growth. L13a-mediated translational silencing also indicated posttranscriptional regulatory abnormalities in CRC. For T cells, DEGs in LM vs. normal samples were significantly enriched in pathways related to cellular responses to stimuli and stress, implying that T cells in liver metastases are under greater metabolic stress and functional impairment. The enrichment of stress response pathways, such as HSF1 activation, further suggests that T-cell exhaustion and immune escape occur [33, 34]. Additionally, we applied the Augur algorithm, a random forest-based tool [35], to evaluate the degree of transcriptomic perturbation in each cell type. Compared with normal tissue, epithelial cells presented the most prominent perturbation in both CRC and LM samples, whereas fibroblasts presented the greatest changes between LM and CRC samples, which were associated mainly with the coagulation and glycolysis pathways (Figure S2C-E). Given that epithelial cells are the principal origin of malignancy in colorectal cancer (CRC), these observations motivated a focused analysis of the epithelial compartment in subsequent sections.
Epithelial dynamics and CNV-linked inflammatory rewiring in CRC metastasis
We performed an in-depth analysis of the epithelial lineage. A total of 26,509 epithelial cells were analysed, and unsupervised clustering identified eight distinct subpopulations. On the basis of the literature [32] and canonical marker gene expression, these subtypes were annotated as follows: inflammatory enterocyte (Inflam.Enterocyte; CXCL1, MMP7, CEACAM6, KRT17, AREG), transit-amplifying cell (TA; KLF5, ELF3, MUC4), cycling transit-amplifying cell (Cycling TA; UBE2C, CCNB1, BIRC5, STMN1), enterocyte (KRT20, FABP1, GUCA2A, GUCA2B, MUC12), goblet cell (Goblet; MUC2, TFF3, ZG16, SPINK4), tuft cell (Tuft; TRPM5, HPGDS), secretory cell (Secretory; REG1A, REG3A, LCN2, CLDN15, PDZK1IP1), and progenitor enterocyte (OLFM4). We further quantified the proportions of each subtype across disease states. In both CRC and liver metastasis (LM) samples, the fractions of inflammatory enterocytes, transit-amplifying cells, and their cycling subsets were markedly increased compared with those in normal tissue, whereas mature enterocytes, goblet cells, and tuft cells were significantly reduced (Fig. 2A–B, Figure S3, S4A-E). These changes suggest that different epithelial subtypes have distinct biological functions and malignant potential during disease progression. To explore functional features, we applied GSVA to the MsigDB hallmark gene sets and detected enrichment of epithelial-mesenchymal transition (EMT), the inflammatory response, and KRAS signalling in inflammatory enterocytes, whereas proliferative TA cells were enriched for MYC and mTORC signalling (Fig. 2E).
To further assess the malignant potential of the epithelial subtypes, we inferred copy number variation (CNV) levels using inferCNV [36]. The results revealed significant heterogeneity in CNV levels among epithelial subtypes. CNV scores were divided into five intervals (median 0, ±1, ±2) and further categorized as normal, low, medium, or high, following previous approaches [37], and visualized using UMAP (Fig. 2C–D, Figure S4F-G). High-CNV epithelial cells exhibited significant activation of interferon responses (INTERFERON_ALPHA_RESPONSE, INTERFERON_GAMMA_RESPONSE), IL6_JAK_STAT3 signalling, oxidative phosphorylation, and reactive oxygen species pathways, indicating enhanced inflammatory signalling, metabolic reprogramming, and oxidative stress in these cells (Fig. 2E). Analysis of group-specific CNV distributions revealed that the proportion of high-CNV epithelial cells was significantly greater in LM than in CRC (Fig. 2F–G). Among the subtypes, TA, cycling TA, and progenitor enterocyte had the largest fractions of high-CNV cells (Fig. 2H).
This high-resolution structure enabled us to compare the functional states of cells with similar CNV levels across different groups, providing insights into the mechanisms of tumour progression. To reveal the molecular drivers of CRC progression and liver metastasis, we performed integrated differential gene expression analysis among normal epithelial cells (normal CNV), primary CRC cells (high CNV), and liver metastasis cells (high CNV), identifying 114 intersecting DEGs (Figure S5A). Among these, CEBPB—a core transcription factor regulating inflammation and EMT [38]—was specifically upregulated in CRC and further activated in LM, providing an additional extension to previous findings. In contrast, ARGLU1, a transcriptional coactivator involved in RNA processing and chromatin regulation [39] which had been rarely reported in previous studies of cancer, was specifically downregulated in LM, reflecting further suppression of nuclear gene regulatory networks in metastatic cells. The classical EMT marker S100A4 was significantly upregulated in both CRC and LM, underscoring its central role in promoting migration and invasion. Meanwhile, the mIHC confirmed the expression pattern of those genes in the clinical samples (Figure S6). Notably, the mitochondria-derived neuroprotective gene MTRNR2L8 was markedly suppressed in LM, which may reduce the antiapoptotic capacity of metastatic cells and affect their survival and adaptation in heterogeneous microenvironments (Fig. 2I). Additionally, we conducted GSEA for group-specific DEGs. In CRC (high CNV) vs. normal (normal CNV) samples, EMT was significantly increased, whereas oxidative phosphorylation was decreased (Figure S5B-D). In LM vs. normal, the apoptosis and EMT pathways were activated, whereas oxidative phosphorylation and fatty acid metabolism were suppressed (Figure S5E-F). In LM vs. CRC, allograft rejection was activated, and oxidative phosphorylation was downregulated (Figure S5G-I).
Distinct epithelial coexpression modules drive metastatic adaptation
To dissect the molecular changes in epithelial cells during CRC progression and metastasis, we performed high-dimensional weighted gene coexpression network analysis (hdWGCNA) [40] on all epithelial cells, identifying 12 coexpression modules and constructing cell type-specific gene interaction networks for each (Fig. 3A–B, Figure S7A-B). Modules M1 and M4 were specifically activated in normal epithelial cells. M1 was enriched for cytoskeletal vesicle transport, immune regulation, and apoptosis, with key hub genes such as KLF5 and CSKMT. M4 was associated with copper and zinc ion homeostasis, with the core genes GUCA2B, CA4, and OTOP2, suggesting a role in maintaining epithelial ion balance and barrier function [41, 42]. Notably, M1 and M4 were strongly correlated, indicating potential synergistic regulation of epithelial homeostasis (Fig. 3C–E).
The module scores map onto the UMAP distribution of epithelial subtypes, and multiple node genes within each module point to the physiological or pathological functions of these subtypes (Figure S7C). For M3—Goblet, TFF3 and SPINK4 are classic goblet cell–associated secreted molecules: TFF3 contributes to mucus-layer repair and epithelial restitution [43, 44]; SPINK4 is a serine protease inhibitor–like secreted factor that reflects goblet cell secretory activity and is implicated in intestinal barrier function and inflammation regulation [45]. In addition, REG4 is frequently overexpressed in colorectal cancer and is associated with epithelial regeneration and pro-growth signaling, suggesting a regenerative program and tumor-promoting role of the goblet/secretory lineage. For M4—Enterocyte, GUCA2A/GUCA2B indicate the ion and fluid homeostasis axis of mature absorptive epithelium [46]; OTOP2 is involved in ion transport, reflecting distal colonic absorption/secretion features and potentially linking to changes in the metabolic microenvironment in tumors [47]. For M6—Cycling TA, CCNB1 and UBE2C are key drivers of the G2/M phase and mitotic progression, indicating robust proliferative activity; BIRC5 (Survivin) is an anti-apoptotic and mitotic regulator associated with stemness, invasiveness, and poor prognosis in CRC, together delineating a highly proliferative, apoptosis-resistant pathological profile of transit-amplifying cells [48, 49].
In contrast, modules M9, M10, and M11 were specifically activated in liver metastasis (LM) epithelial cells. Functional annotation revealed that the M9 module regulated Wnt signalling and EMT, likely promoting tumour cell migration and invasion. M10 was enriched for protein synthesis and ribosome function, centred on ribosomal subunits such as RPS18 and RPL7, reflecting enhanced translational capacity in metastatic cells. M11 was linked to the cell cycle and RNA polymerase II-driven transcription, with hub genes including FOS, JUN, IER2, and JUNB, indicating increased proliferative and transcriptional reprogramming capacity in LM cells. Together, these results suggest that LM epithelial cells transition from homeostatic to malignant adaptation via activation of Wnt/EMT, protein synthesis, and cell cycle coexpression networks, conferring greater invasiveness and adaptability (Fig. 3E, Figure S7C).
Importantly, most core genes in the M9, M10, and M11 modules were closely associated with CNV levels and were upregulated in more malignant cells. For example, JUN, a classical AP-1 transcription factor, drives cell cycle progression and EMT, promoting tumour migration and drug resistance; ASCL2, a downstream Wnt target, is essential for CRC stemness and regeneration, and its upregulation may enhance self-renewal and adaptability in metastatic cells; and NKD1, a negative regulator of Wnt signalling, may reflect complex Wnt pathway regulation in tumour cells [50–52]. The high expression of these genes, which is correlated with increased CNV, highlights the molecular basis for functional diversity and adaptation in tumour progression and metastasis through coexpression network remodelling (Fig. 3F).
Transcription factor network remodelling drives CRC progression and metastatic adaptation
To systematically dissect the key regulatory mechanisms of epithelial subpopulations across CRC disease stages and metastasis, we used pySCENIC to infer transcription factor (TF) regulatory networks from single-cell transcriptomic data, revealing subtype-specific changes in TF activity (Fig. 4A, Figure S8A). Dimensionality reduction of TF activity profiles revealed clear stratification among epithelial subtypes at the regulon level (Fig. 4B). Heatmaps of the top 50 TFs (Fig. 4C) revealed that CEBPB was highly activated in inflammatory and absorptive enterocytes, which is consistent with its known roles in inflammation, tumour microenvironment remodelling, and EMT—key processes in CRC metastasis and immune escape [38, 53]. The E2F family (E2F1, E2F7, and E2F8) is specifically activated in proliferative subpopulations, whereas JUN, JUNB, and JUND (AP-1 family) are enriched in inflammatory enterocytes, corresponding to EMT, migration, and drug resistance phenotypes [50, 54, 55]. MYBL2 and NFYB are activated in cycling TA cells and are associated with the cell cycle and poor prognosis (Fig. 4D) [56–58].
We further compared TF network activation across disease groups and CNV states (Fig. 4E–F, Figure S8B-D), finding distinct TF activation patterns related to both disease stage and genomic instability. Shared modules such as E2F1, CEBPB, and FOSL1 are activated in both primary and metastatic cancer, driving proliferation, inflammation, and transcriptional reprogramming [59–61]. FOSL1, in particular, is known to regulate EMT and invasiveness [62], and its expression was validated by mIHC (Figure S6). Primary CRC also showed activation of angiogenesis- and migration-related TFs such as ERG, ETS1, ERF, and ETV2, whereas metastases displayed specific activation of FOS, CREM, FOXL1, FLI1, and ATF3, the latter promoting tumour survival and drug resistance [63–65]. High-CNV epithelial cells presented broader and more specific TF activation, especially for the E2F family (e.g., E2F2 and E2F7), suggesting that genomic instability drives sustained proliferation. FOXO3 was also highly activated in high-CNV cells. While cells with normal or low CNV maintained relatively stable TF activation, medium and high CNV states were characterized by proliferative and stress-adaptive TFs.
Linear modelling revealed that the expression of TFs, including E2F2, TRPS1, MEIS2, NFE2, and the FOXO family (increasing) and HMGA1, HMGA2, CREB2, and TFAP2A (decreasing) (Fig. 4G), tended to increase with increasing CNV, suggesting that high-CNV tumour cells adapt to adverse microenvironments by driving the cell cycle and stress response programs. In the disease groups, IKZF1 and NR3C1 were specifically activated in liver metastases (LM), indicating roles in distant metastasis and microenvironmental adaptation (Fig. 4H, Figure S8E-H) [66, 67]. Overall, our results provide a comprehensive map of TF regulatory networks in CRC epithelial cells across disease stages, metastatic states, and genomic instability, revealing how dynamic TF activity reprogramming underlies tumour progression and adaptation.
Trajectory differentiation and state-specific molecular features of epithelial subtypes during CRC progression
To systematically characterize the dynamic changes in epithelial subtypes during disease progression, we performed pseudotime analysis of eight epithelial subtypes using Monocle. This approach reconstructed the developmental and state transition trajectories of epithelial cells and identified five major cell states (Fig. 5A–B). Analysis of the distribution of each subtype across states revealed notable differences: for example, epithelial cells from the normal group were almost exclusively found in State 3, whereas those from the liver metastasis (LM) group were predominantly in States 2 and 4. Importantly, high-CNV epithelial cells were significantly enriched in States 3 and 4, suggesting that genomic instability profoundly shapes epithelial cell states (Fig. 5C–E).
To uncover the molecular mechanisms underlying state transitions, we used BEAM analysis to examine gene expression changes at key branch points. At Node 1, Cell fate 1 (towards State 4) was associated with the upregulation of stress- and malignancy-associated genes, including the stress-response TFs ATF3 and ANGPTL4, which are involved in angiogenesis and metastasis, indicating that their activation may promote tumour cell invasiveness and adaptability (Fig. 5F–G). At Node 2, Cell fate 1 (towards State 3) was characterized by sustained upregulation of RACK1, a key signalling scaffold known to promote cancer cell migration and invasion and associated with poor prognosis in CRC, underscoring its role in tumour progression. Additionally, GUCA2A and SPINK4—markers of secretory and barrier-protective epithelial lineages—were also upregulated along this branch, reflecting the involvement of secretory and progenitor enterocyte lineages in microenvironment regulation and epithelial homeostasis (Fig. 5F–H, 5K, Table S5-S6).
To validate the robustness of these pseudotime inferences, we independently reconstructed cell trajectories using the pyVIA algorithm. The results revealed that secretory and progenitor enterocyte subpopulations were located mainly at the trajectory endpoints, whereas enterocytes were concentrated at the origin (Fig. 5I–J).
Immune subset reprogramming underlies tumour progression and metastatic adaptation
Given the marked perturbation of immune responses and the microenvironment observed in tumour samples, we systematically analysed lymphoid and myeloid cell populations. Unsupervised clustering divided lymphoid cells into eight subpopulations, which were annotated on the basis of specific marker gene expression [32] (Fig. 6A–B, Figure S9A-C). Notably, effector memory CD8+ T cells and Tregs were significantly expanded in both CRC and LM samples, indicating the activation of both effector and immunosuppressive mechanisms during tumour progression and metastasis (Fig. 6C–D).
In terms of transcriptomic disturbance, Tregs presented the greatest changes in CRC vs. normal, Tph cells in LM vs. normal, and Tn/Tcm cells in LM vs. CRC (Fig. 6E). GSEA revealed that in CRC vs. normal tissues, Tregs exhibited activation of the IL6_JAK_STAT3_SIGNALING pathway, which enhances Treg immunosuppressive function and promotes tumour immune evasion [68]. In LM vs. normal, Tph cells presented significant enrichment of interferon responses (IFN-γ and IFN-α signalling), suggesting roles in antiviral and inflammatory regulation within the metastatic liver environment [69]. In LM vs. CRC, the complement pathway was strongly activated in Tn/Tcm cells, indicating the involvement of the complement system in remodelling the metastatic niche (Fig. 6F–G). Together, these results highlight that Tregs are both expanded and transcriptionally remodeled in CRC and LM, suggesting a key role in tumour development, whereas Tph cells are consistently altered during liver metastasis, indicating that they are critical for metastatic adaptation (Fig. 6E, Figure S9D-F).
Further analysis revealed that LGALS1 was upregulated in Tregs from both CRC and LM, with further elevation in LM; this gene is known to promote Treg tolerance and suppress effector T-cell function [70, 71]. Additionally, MHC II molecules (e.g., HLA-DRA and HLA-DRB1) were further activated in Tregs from LM, suggesting enhanced antigen presentation capacity and a greater contribution to tumour immune escape [72, 73] (Figure S9G). In Tph cells, the upregulation of MHC-I (B2M) and MHC-II genes (HLA-DRA, HLA-DRB1), as well as DUSP4 and TNFRSF18, indicated increased immune activity (Figure S8H-I).
For myeloid cells, we identified six subpopulations (TAMs, monocytes, cDC2s, macrophages, mature DCs, and cDC1s). Monocytes were significantly expanded in both CRC and LM, whereas cDC1s and cDC2s were reduced (Fig. 6I–J, Figure S9J). Pathway analysis revealed that macrophages and mature DCs displayed specific pathway activation in CRC and LM, participating in microenvironment remodelling via the regulation of inflammation, antigen presentation, and immunosuppression (Fig. 6K).
Inflammatory enterocytes and tams orchestrate key signalling hubs
On the basis of the identified epithelial and immune cell subsets in CRC, we systematically analysed cell-cell communication networks across different disease stages using CellChat. The results revealed that the number of cell–cell interactions was significantly greater in CRC and LM tissues than in normal tissues, with CRC exhibiting the highest levels (Fig. 7A). Directionality analysis revealed that in CRC, inflammatory enterocytes (Inflam.Enterocyte), tumour-associated macrophages (TAMs), and Th17 cells were the main signal receivers, whereas Inflam.Enterocyte and Th17 cells were also major signal senders. In contrast, in LM, effector memory T (Tem) cells were the main signal receivers, and TAMs were the most active signal senders (Fig. 7B), indicating a notable shift in immune regulation within liver metastases, where TAMs may promote immune adaptation by modulating Tem function [74].
Analysis of key signalling sources revealed that in CRC, Inflam.Enterocyte predominantly transmitted MIF, VEGF, CCL, and PDGF signalling axes. MIF and CCL are involved in immune cell recruitment and inflammation, whereas VEGF and PDGF directly participate in angiogenesis and tissue remodelling [75–77]. The active engagement of these pathways suggests that inflammatory epithelial cells may drive both immune evasion and tumour angiogenesis in a coordinated manner (Fig. 7C). In LM, TAMs communicate with surrounding cells mainly via SPP1 (osteopontin), GALECTIN, and BAFF signalling, indicating a role for TAMs in promoting microenvironmental adaptation and immune suppression through immune modulation, cell adhesion, and B-cell survival [78–81]. With respect to signal reception, Inflam.Enterocyte in CRC notably received IGFBP (insulin-like growth factor-binding protein) and PAR (protease-activated receptor) signals, which may increase epithelial cell survival and regeneration [82, 83]; in LM, Tem cells mainly received SPP1 and GALECTIN signals, suggesting that their functional state is profoundly influenced by TAM-derived signals, potentially affecting their antitumour activity or exhaustion status (Fig. 7D).
CRC ligand-receptor pair analysis revealed that Inflam. Enterocyte communicated with other cells via the MDK pathway axis and uniquely activated interactions such as LGALS9–P4HB/CD44, distinguishing them from TA and Th17 cells. LGALS9 is known to induce T-cell apoptosis and immune tolerance, potentially contributing to immune suppression in the TME [84]. On the receiving end, Th17 cells specifically received MIF-related signals. Previous studies have shown that MIF regulates Th17 cell differentiation and function, and our findings support a key role for MIF-driven Th17 responses in the CRC immune microenvironment [85, 86]. We observed significant enhancement of MIF and VEGF signalling in CRC. MIF is involved in inflammatory responses, immune cell recruitment, and tumour immune escape, whereas VEGF is a central driver of tumour angiogenesis [75, 77]. Our data indicate that Inflam.Enterocyte and TA cells are the main sources of MIF signalling, primarily via the MIF-(CD74+CXCR4) and MIF-(CD74+CD44) receptor pairs. For VEGF signaling, Inflam.Enterocytes and monocytes were the main senders, with endothelial cells as the principal receivers, predominantly via the VEGFA–FLT1 (VEGFR1) and VEGFA–KDR (VEGFR2) interactions (Fig. 7E–H).
Beyond CRC, we examined ligand–receptor pairs between intestinal epithelial subtypes and key immune communicators in Normal (B cells, Tn/Tcm) and LM (Tem, TAM, and CTL) samples (Figure S10). In Normal tissues, epithelial subtypes acted as signal senders predominantly via the MIF and LGALS9 pathways, with Cycling TA, Inflam.Enterocyte, and Progenitor enterocyte being the main sources. MIF signals likely operate through the CD74–CXCR4 or CD74–CD44 axes to regulate recruitment and survival of lamina propria immune cells; LGALS9, via P4HB/CD44, lowers T-cell activation thresholds, thereby preventing excessive inflammation and supporting barrier function. In LM, MIF and LGALS9 remained the dominant outgoing signals from epithelial cells but were more frequent and of higher intensity, indicating a shift of epithelial programs toward immune modulation within the hepatic metastatic niche: enhanced MIF signalling promotes inflammatory cascades and immune-cell reprogramming, while the LGALS9 axis drives T-cell suppression/exhaustion, forming a positive feedback loop with TAM enrichment. Epithelial subtypes primarily received PPIA–BSG signals, with PPIA expressed on the epithelial side responding to BSG on immune cells. This pathway likely mediates epithelial–matrix remodelling and increased motility via CD147 (BSG)-dependent activation of MAPK/ERK and MMPs [87, 88]. The same mechanism is present—and more pronounced—in LM, suggesting cooperation with the amplified MIF/LGALS9 axes to drive tissue remodelling and metastasis-related plasticity.
Tumour-associated epithelial gradients and spatial signalling networks
To further validate the spatial distribution and interactions of key epithelial subpopulations identified at the single-cell level, we analysed spatial transcriptomic data from three CRC patients in the GEO database [30, 89]. Using previously constructed single-cell annotations, we applied the RCTD algorithm to deconvolute the spatial transcriptomic data and infer the cell type composition at each spot. The results revealed that TAs (transit-amplifying cells) and Inflam.Enterocyte (inflammatory enterocytes) were the predominant epithelial subtypes within tumour regions, highlighting their close association with tumourigenesis and microenvironmental remodelling (Fig. 8A–G, Figure S11A-B).
We then used the SPATA platform to perform spatial gradient gene expression analysis from tumour-enriched areas to distant normal regions. With increasing distance from the tumour, the expression of the chemokines CXCL2 and CXCL3 and the tight junction protein CLDN1 gradually decreased, whereas the levels of the secreted proteins SFRP2 and SFRP4 increased. CXCL2 and CXCL3 are classic inflammation-related chemokines that recruit neutrophils and other immune cells, and CLDN1 is involved in epithelial barrier alterations; their high expression in tumour zones indicates a highly inflammatory and structurally remodelled microenvironment [90, 91]. SFRP2 and SFRP4, antagonists of Wnt signalling, are upregulated in distant normal regions, suggesting that normal epithelial cells may restrict aberrant proliferation through Wnt inhibition [92, 93].
To nominate genes with both malignant relevance and spatial informativeness, we intersected genes that increase monotonically from Normal to CRC to LM (Fig. 2I) with the top spatial gradient genes identified by SPATA, focusing on those expressed within epithelial compartments. For example, ASCL2 and TM4SF1 were more highly expressed near tumours and decreased with distance. ASCL2 is a key transcription factor for intestinal stem and transit-amplifying cells, indicating enhanced stemness and regenerative capacity near tumours, whereas TM4SF1 is associated with epithelial migration and invasion, supporting an active migratory state within tumour regions [94, 95] (Fig. 8H, Figure S11D-E).
Cell-cell communication analysis of the spatial transcriptomic data revealed that TA and goblet cells were central hubs in the tumour tissue network (Fig. 8I, Figure S11C). Pathway analysis revealed spatial enrichment of TGFβ, WNT, CXCL, and EGF signalling, indicating that these pathways play pivotal roles in regulating intercellular interactions, tumour growth, and immune modulation (Fig. 8J–M, Figure S11F-L). Collectively, these results demonstrate that specific tumour-associated epithelial subtypes orchestrate spatial signalling networks, driving microenvironmental inflammation, cellular remodelling, and tissue homeostasis changes, thus providing new molecular insights into spatial heterogeneity and potential therapeutic targets in CRC.
SCAND1 as a novel biomarker for diagnosis and prognosis in colorectal cancer
Using SPATA, we identified genes exhibiting gradient expression patterns across the tumor parenchymal region (Fig. 8H). We then intersected this gene list with genes previously found to be upregulated in high-CNV epithelial cells from both CRC and LM groups in our scRNA-seq analysis (Fig. 2I). This analysis highlighted SCAND1, which not only displays a distinct spatial gradient but also shows progressively increased expression in high-CNV cells from CRC to LM. Moreover, SCAND1 serves as a hub gene in the LM-specific coexpression module M9(Figure 3E). Based on these findings, we further investigated the functional role of SCAND1. We first confirmed elevated SCAND1 expression in both paired and unpaired tumor tissues within the TCGA database (Fig. 9A). Consistent results were further validated by Western blotting (n = 24) (Fig. 9B), RT-qPCR (n = 40) (Fig. 9C), and immunohistochemistry (n = 6) (Fig. 9D).
Logistic regression analysis revealed that elevated SCAND1 expression was significantly associated with advanced T stage (p = 0.045) (Fig. 9E). TCGA and our validation cohorts confirmed higher SCAND1 levels in patients with lower T stages (Figs. 9F–9G).
Subsequently, receiver operating characteristic (ROC) curve analysis demonstrated that SCAND1 exhibited robust diagnostic performance in the TCGA colorectal cancer dataset (Fig. 9H), as well as in the separate colon and rectal cancer cohorts (Fig. 9I–J). These findings were further corroborated across multiple GSE datasets (Fig. 9K). Survival analysis confirmed that higher SCAND1 expression was associated with poorer prognosis (Fig. 9L–N), which was validated in the patient cohort from Yangpu District Central Hospital (Fig. 9O). To better assess the prognostic value of SCAND1, stratified analyses were performed across various clinical subgroups. The results revealed that patients with high SCAND1 expression had significantly poorer prognosis in the female, T3&T4, M0, age > 65, Stage II&III&IV, as well as N0&N1 subgroups (Figs. 9P–9>U, Figure S12).
SCAND1 enhances malignant behaviors of CRC cells and confers resistance to T cell-mediated cytotoxicity
SCAND1 protein expression was first examined in the normal colonic epithelial cell line NCM460 and four CRC cell lines (HCT116, SW620, RKO, and LOVO). Compared with NCM460, all four cancer cell lines exhibited increased expression, with the lowest level observed in HCT116 and the highest in SW620 (Fig. 10A). In order to explore the functions of SCAND1 in CRC, it was knocked down by siRNA in SW620 and overexpressed in HCT116, and the efficiency was verified by western blotting and RT-qPCR (Fig. 10B-C). EDU results showed that SW620 had a decreased proliferation, while HCT116 had an increased viability (Fig. 10D). Furthermore, we detected the effect of SCAND1 on CRC apoptosis, and our study indicated that overexpression of SCAND1 significantly reduced the apoptosis rate of CRC cells, while knockdown of SCAND1 significantly increased the apoptosis rate of CRC cells (Fig. 10E). The wound healing assay showed a marked decrease in cell migration following SCAND1 knockdown (Fig. 10F) and an increase after its overexpression. Consistent with the results, transwell assays verified that SCAND1 knockdown inhibited SW620 invasion and migration, and its overexpression in HCT116 had the opposite trend (Fig. 10 G). In addition, we examined the expression of key epithelial–mesenchymal transition (EMT) markers, including ZEB2, E-cadherin, N-cadherin, vimentin, α-SMA, and Twist1. We found that SCAND1 overexpression reduced E-cadherin levels while increasing the expression of ZEB2, N-cadherin, vimentin, α-SMA, and Twist1. Conversely, SCAND1 knockdown produced the opposite effects. Furthermore, to investigate the changes in immune cells within the tumor microenvironment, we performed co-culture experiments of HCT116 cells overexpressing SCAND1 with Jurkat T cells, and subsequently assessed the proliferation and apoptosis of HCT116 cells. The results demonstrated that compared with the control group, Jurkat T cells significantly suppressed CRC cell proliferation and promoted apoptosis. However, SCAND1 overexpression markedly attenuated these inhibitory and pro-apoptotic effects of T cells on CRC cells (Fig. 10I-J).
Discussion
Discussion
CRC liver metastasis is a complex, multifactorial process. In this study, we leveraged large-scale integration of public datasets to revisit the biological characteristics of CRC, constructing a comprehensive, multidimensional single-cell and spatial transcriptomic atlas of colorectal cancer (CRC) and liver metastases (LM). Our dataset integrates over 200,000 high-quality cells from normal tissue, primary tumours, and metastatic lesions. Our findings reveal profound transcriptional reprogramming and cellular heterogeneity across disease stages, highlighting diverse epithelial and immune remodelling events that regulate tumour progression and metastatic adaptation.
A key observation is the dynamic remodelling of epithelial subpopulations. We found significant expansion of inflammatory enterocytes and proliferative transit-amplifying cells in CRC and LM, with concomitant loss of mature absorptive and secretory epithelial subtypes. These transitions reflect increased lineage plasticity and the acquisition of malignant traits. Notably, high-CNV epithelial cells exhibited increased activation of the EMT, oxidative phosphorylation, and interferon response pathways, suggesting a link between genomic instability and inflammatory remodelling in the metastatic microenvironment [96]. Importantly, CEBPB has emerged as a central transcriptional regulator of the inflammatory epithelial program, with progressive upregulation from CRC to liver metastasis. Multiple studies have confirmed its oncogenic roles in CRC, including promoting EMT via the TRIM2/p53 axis [38], enhancing chemoresistance through the CCL20/FOXO1/CEBPB/NF-κB pathway [53], and driving STAT3-mediated tumour progression by upregulating SERPINA1 [97], underscoring its multifaceted effects on CRC malignancy and metastatic potential. Conversely, the nuclear coactivator and RNA processing factor ARGLU1 was downregulated in metastatic epithelial cells. Although rarely studied in CRC, ARGLU1 loss has been associated with chromatin dysregulation and therapy resistance in head and neck squamous cell carcinoma and breast cancer [98–100], suggesting its involvement in transcriptional instability during metastasis. Similarly, the mitochondrial-derived peptide MTNR2L8—known for its antiapoptotic function—was suppressed in liver metastases, potentially reflecting impaired mitochondrial stress responses and increased epithelial vulnerability in the metastatic microenvironment. Consistently, multiplex immunofluorescence (mIHC) validated the spatial and disease-stage-specific expression patterns of CEBPB, ARGLU1, S100A4, and FOSL1 across normal epithelium, CRC, and LM, providing orthogonal support for the inferred TF and effector dynamics.
Through high-dimensional gene coexpression network analysis, we mapped epithelial modules associated with either homeostatic maintenance or metastatic adaptation. Modules enriched in normal tissue featured ion transport and vesicle trafficking, whereas LM-specific modules promoted Wnt signalling, ribosome biogenesis, and cell cycle progression. These shifts were accompanied by TF network remodelling, with activation of E2F, FOSL1, and AP-1 family members in high-CNV and metastatic cells, supporting a model of transcriptional plasticity underlying immune evasion and invasion. Trajectory and pseudotime analyses further revealed branched differentiation paths of epithelial cells, with distinct molecular features driving specific fates. Stress-responsive TFs (e.g., ATF3) and prometastatic factors (such as ANGPTL4 and RACK1) were enriched along metastatic trajectories, indicating early initiation of invasive programs. Spatial transcriptomics confirmed the localization of high-risk epithelial subpopulations (e.g., inflammatory enterocytes and TA cells) within tumour cores, which participate in intense signalling interactions via the MIF, VEGF, and TGFβ axes, thereby reshaping the immune microenvironment.
Importantly, to validate our bioinformatic findings, we selected SCAND1—a gene identified as a key player in our integrated analysis—for further functional investigation. We confirmed elevated expression of SCAND1 in tumour tissues at both the mRNA and protein levels, and demonstrated its robust diagnostic potential across multiple independent datasets, with SCAND1 expression significantly correlated with poor prognosis. Notably, SCAND1 was differentially expressed across tumour stages and also exhibited differences in survival predictions between different subgroups, suggesting a dynamic role during CRC progression, which needs to be further validated in a larger cohort. Functional assays revealed that SCAND1 promotes colorectal cancer cell proliferation, inhibits apoptosis, and enhances migration and invasion. Mechanistically, SCAND1 upregulation was associated with increased epithelial-mesenchymal transition (EMT), further supporting its pro-metastatic function. Extending these observations to tumour–immune interactions, co-culture of SCAND1-overexpressing tumour cells with T cells demonstrated that SCAND1 overexpression attenuates T cell–mediated cytotoxicity, indicating a potential immune-evasive mechanism linked to SCAND1 activity. These results not only validate the predictions from our multi-omics analysis, but also highlight SCAND1 as a potential biomarker and therapeutic target in CRC metastasis.
A recent study profiling the “normal–inflammatory polyp–adenoma–carcinoma” continuum reported key features of colorectal tumorigenesis, including cellular heterogeneity, the utility of ligand–receptor axes, marked Treg expansion during progression, stage-dependent elevation of heat-shock programs in immune cells, and prominent EGF/EGFR-mediated epithelial–stromal crosstalk [101]. In contrast, our work emphasizes the extended “primary–liver metastasis” dimension, with a larger dataset and complementary spatial transcriptomics. We resolve CNV-driven epithelial state stratification and TF-network rewiring (e.g., CEBPB, E2F, AP‑1) linked to metastatic plasticity, define LM-specific coexpression modules (Wnt, ribosome biogenesis, cell cycle) with tumor-core spatial enrichment, and functionally validate SCAND1 as a pro-tumorigenic factor. The two studies are temporally complementary: the recent study focuses on early-to-invasive tumorigenesis, whereas our study delineates state remodeling across primary tumors and distant metastases in space, together outlining a single-cell and spatial continuum from initiation to dissemination.
In summary, our study provides a spatially resolved single-cell framework for understanding CRC progression and metastasis. We demonstrate that transcriptional, genomic, and signalling alterations collectively reshape epithelial and immune compartments, enabling malignant cells to adapt to and remodel their microenvironment. These insights may inform therapeutic strategies targeting key epithelial states, immune hubs, and intercellular signalling to inhibit CRC metastasis. Nevertheless, our study has several limitations. Our findings require further validation in larger clinical cohorts. Additionally, our work is primarily based on bioinformatic analyses and lacks functional experimental confirmation. In future studies, we plan to investigate the functional characteristics and mechanisms of epithelial subtypes in CRC progression through in vitro and in vivo experiments and assess their potential as therapeutic targets. We will also explore the relationship between these subpopulations and responses to immunotherapy, aiming to provide new avenues for improving immunotherapeutic efficacy.
CRC liver metastasis is a complex, multifactorial process. In this study, we leveraged large-scale integration of public datasets to revisit the biological characteristics of CRC, constructing a comprehensive, multidimensional single-cell and spatial transcriptomic atlas of colorectal cancer (CRC) and liver metastases (LM). Our dataset integrates over 200,000 high-quality cells from normal tissue, primary tumours, and metastatic lesions. Our findings reveal profound transcriptional reprogramming and cellular heterogeneity across disease stages, highlighting diverse epithelial and immune remodelling events that regulate tumour progression and metastatic adaptation.
A key observation is the dynamic remodelling of epithelial subpopulations. We found significant expansion of inflammatory enterocytes and proliferative transit-amplifying cells in CRC and LM, with concomitant loss of mature absorptive and secretory epithelial subtypes. These transitions reflect increased lineage plasticity and the acquisition of malignant traits. Notably, high-CNV epithelial cells exhibited increased activation of the EMT, oxidative phosphorylation, and interferon response pathways, suggesting a link between genomic instability and inflammatory remodelling in the metastatic microenvironment [96]. Importantly, CEBPB has emerged as a central transcriptional regulator of the inflammatory epithelial program, with progressive upregulation from CRC to liver metastasis. Multiple studies have confirmed its oncogenic roles in CRC, including promoting EMT via the TRIM2/p53 axis [38], enhancing chemoresistance through the CCL20/FOXO1/CEBPB/NF-κB pathway [53], and driving STAT3-mediated tumour progression by upregulating SERPINA1 [97], underscoring its multifaceted effects on CRC malignancy and metastatic potential. Conversely, the nuclear coactivator and RNA processing factor ARGLU1 was downregulated in metastatic epithelial cells. Although rarely studied in CRC, ARGLU1 loss has been associated with chromatin dysregulation and therapy resistance in head and neck squamous cell carcinoma and breast cancer [98–100], suggesting its involvement in transcriptional instability during metastasis. Similarly, the mitochondrial-derived peptide MTNR2L8—known for its antiapoptotic function—was suppressed in liver metastases, potentially reflecting impaired mitochondrial stress responses and increased epithelial vulnerability in the metastatic microenvironment. Consistently, multiplex immunofluorescence (mIHC) validated the spatial and disease-stage-specific expression patterns of CEBPB, ARGLU1, S100A4, and FOSL1 across normal epithelium, CRC, and LM, providing orthogonal support for the inferred TF and effector dynamics.
Through high-dimensional gene coexpression network analysis, we mapped epithelial modules associated with either homeostatic maintenance or metastatic adaptation. Modules enriched in normal tissue featured ion transport and vesicle trafficking, whereas LM-specific modules promoted Wnt signalling, ribosome biogenesis, and cell cycle progression. These shifts were accompanied by TF network remodelling, with activation of E2F, FOSL1, and AP-1 family members in high-CNV and metastatic cells, supporting a model of transcriptional plasticity underlying immune evasion and invasion. Trajectory and pseudotime analyses further revealed branched differentiation paths of epithelial cells, with distinct molecular features driving specific fates. Stress-responsive TFs (e.g., ATF3) and prometastatic factors (such as ANGPTL4 and RACK1) were enriched along metastatic trajectories, indicating early initiation of invasive programs. Spatial transcriptomics confirmed the localization of high-risk epithelial subpopulations (e.g., inflammatory enterocytes and TA cells) within tumour cores, which participate in intense signalling interactions via the MIF, VEGF, and TGFβ axes, thereby reshaping the immune microenvironment.
Importantly, to validate our bioinformatic findings, we selected SCAND1—a gene identified as a key player in our integrated analysis—for further functional investigation. We confirmed elevated expression of SCAND1 in tumour tissues at both the mRNA and protein levels, and demonstrated its robust diagnostic potential across multiple independent datasets, with SCAND1 expression significantly correlated with poor prognosis. Notably, SCAND1 was differentially expressed across tumour stages and also exhibited differences in survival predictions between different subgroups, suggesting a dynamic role during CRC progression, which needs to be further validated in a larger cohort. Functional assays revealed that SCAND1 promotes colorectal cancer cell proliferation, inhibits apoptosis, and enhances migration and invasion. Mechanistically, SCAND1 upregulation was associated with increased epithelial-mesenchymal transition (EMT), further supporting its pro-metastatic function. Extending these observations to tumour–immune interactions, co-culture of SCAND1-overexpressing tumour cells with T cells demonstrated that SCAND1 overexpression attenuates T cell–mediated cytotoxicity, indicating a potential immune-evasive mechanism linked to SCAND1 activity. These results not only validate the predictions from our multi-omics analysis, but also highlight SCAND1 as a potential biomarker and therapeutic target in CRC metastasis.
A recent study profiling the “normal–inflammatory polyp–adenoma–carcinoma” continuum reported key features of colorectal tumorigenesis, including cellular heterogeneity, the utility of ligand–receptor axes, marked Treg expansion during progression, stage-dependent elevation of heat-shock programs in immune cells, and prominent EGF/EGFR-mediated epithelial–stromal crosstalk [101]. In contrast, our work emphasizes the extended “primary–liver metastasis” dimension, with a larger dataset and complementary spatial transcriptomics. We resolve CNV-driven epithelial state stratification and TF-network rewiring (e.g., CEBPB, E2F, AP‑1) linked to metastatic plasticity, define LM-specific coexpression modules (Wnt, ribosome biogenesis, cell cycle) with tumor-core spatial enrichment, and functionally validate SCAND1 as a pro-tumorigenic factor. The two studies are temporally complementary: the recent study focuses on early-to-invasive tumorigenesis, whereas our study delineates state remodeling across primary tumors and distant metastases in space, together outlining a single-cell and spatial continuum from initiation to dissemination.
In summary, our study provides a spatially resolved single-cell framework for understanding CRC progression and metastasis. We demonstrate that transcriptional, genomic, and signalling alterations collectively reshape epithelial and immune compartments, enabling malignant cells to adapt to and remodel their microenvironment. These insights may inform therapeutic strategies targeting key epithelial states, immune hubs, and intercellular signalling to inhibit CRC metastasis. Nevertheless, our study has several limitations. Our findings require further validation in larger clinical cohorts. Additionally, our work is primarily based on bioinformatic analyses and lacks functional experimental confirmation. In future studies, we plan to investigate the functional characteristics and mechanisms of epithelial subtypes in CRC progression through in vitro and in vivo experiments and assess their potential as therapeutic targets. We will also explore the relationship between these subpopulations and responses to immunotherapy, aiming to provide new avenues for improving immunotherapeutic efficacy.
Electronic supplementary material
Electronic supplementary material
Below is the link to the electronic supplementary material.
Below is the link to the electronic supplementary material.
출처: PubMed Central (JATS). 라이선스는 원 publisher 정책을 따릅니다 — 인용 시 원문을 표기해 주세요.
🏷️ 같은 키워드 · 무료전문 — 이 논문 MeSH/keyword 기반
- Opposing prognostic roles of tumor-associated and circulating MMP8 in colorectal cancer.
- Copper-enriched zinc peroxides induced cuproptosis through concurrent metabolic and oxidative dysregulation for boosting immunotherapy in colorectal cancer.
- Editorial: Altered metabolic traits in gastro-intestinal tract cancers, volume II.
- Macrophage deficiency discordantly regulated tumor growth and metastasis through increased thrombospondin-1 production.
- Time-Resolved Oxygen Dynamics Reveals Redox-Selective Apoptosis Induced by Cold Atmospheric Plasma in HT-29 Colorectal Cancer Cells.
- System-Wide Implementation of Colorectal Cancer Screening in a Value-Based Care Setting.