본문으로 건너뛰기
← 뒤로

Identification of tumor initiating cells and early marker genes in histologically normal colonic mucosa that lead to neoplastic transformation.

2/5 보강
Neoplasia (New York, N.Y.) 📖 저널 OA 100% 2024: 3/3 OA 2025: 29/29 OA 2026: 39/39 OA 2024~2026 2026 Vol.75() p. 101300 OA Genetic factors in colorectal cancer
Retraction 확인
출처
PubMed DOI PMC OpenAlex 마지막 보강 2026-04-29
OpenAlex 토픽 · Genetic factors in colorectal cancer Digestive system and related health Cancer Cells and Metastasis

Jaiswal S, The S, Chang TS, Shi J, Wang TD

📝 환자 설명용 한 줄

[BACKGROUND & AIMS] Colorectal cancer (CRC) remains a leading cause of cancer‑related morbidity and mortality worldwide.

이 논문을 인용하기

↓ .bib ↓ .ris
APA Sangeeta Jaiswal, Stephanie The, et al. (2026). Identification of tumor initiating cells and early marker genes in histologically normal colonic mucosa that lead to neoplastic transformation.. Neoplasia (New York, N.Y.), 75, 101300. https://doi.org/10.1016/j.neo.2026.101300
MLA Sangeeta Jaiswal, et al.. "Identification of tumor initiating cells and early marker genes in histologically normal colonic mucosa that lead to neoplastic transformation.." Neoplasia (New York, N.Y.), vol. 75, 2026, pp. 101300.
PMID 41903466 ↗

Abstract

[BACKGROUND & AIMS] Colorectal cancer (CRC) remains a leading cause of cancer‑related morbidity and mortality worldwide. Although the adenoma-carcinoma sequence and its genetic drivers are well described, the earliest cellular and molecular events initiating tumorigenesis within histologically normal colonic epithelium remain poorly defined. This study aims to identify tumor‑initiating cells (TICs), distinguish them from normal stem‑like cells (nSTMs), and delineate early transcriptional and signaling programs using single‑cell RNA sequencing (scRNA‑seq) from paired normal‑appearing and transformed human colonic tissues.

[METHODS] Fresh biopsies from histologically normal mucosa and matched polyps, including tubular adenomas, sessile serrated adenomas, and adenocarcinomas, were collected from seven subjects. Single‑cell transcriptomes were generated using the 10x Genomics platform and analyzed with Seurat, Monocle2, CytoTRACE, GSEA/GSVA, RNA velocity, InferCNV, CellChat, and NicheNet. Spatial validation was performed using RNA‑FISH.

[RESULTS] We resolved 51,054 high‑quality single‑cell transcriptomes into 33 clusters. Tumor-specific stem-like (tSTM) and deep crypt secretory (tDCS) populations were enriched in adenomas. Subclustering of tSTM identified TIC-like subsets predominantly derived from histologically normal mucosa that localized to the root of lineage trajectories leading to polyp-enriched tSTM states. Compared to nSTMs, TICs exhibited enhanced stemness potential, early epithelial-mesenchymal transition (EMT) and interferon signaling, suppression of oxidative phosphorylation, and distinct genomic and signaling features, indicating early neoplastic reprogramming. ETS2, SLC12A2, and LEFTY1 were identified as TIC‑specific markers; SOD3 and GPRC5A increased along the TIC‑to‑tSTM trajectory. RNA‑FISH confirmed candidate marker localization. Independent validation using the COLONMAP dataset (30 polyps, 35 normal samples) demonstrated that TIC-like cells were predominantly enriched in tubular adenomas but were scarce in serrated lesions. Across this independent cohort, TIC marker genes showed reproducible upregulation in TIC-like populations, supporting the robustness of these observations across cohorts.

[CONCLUSIONS] Our results identify TICs as the origin of neoplastic stem‑like states in the conventional tubular adenoma pathway and define early transcriptional, metabolic, and microenvironmental reprogramming events that distinguish TICs from nSTMs. In contrast to serrated pathways described in other atlases, our data support a stem‑like expansion model for tubular adenomas and nominate biomarkers with translational potential for early CRC detection and intervention.

🏷️ 키워드 / MeSH 📖 같은 키워드 OA만

… 외 2개

같은 제1저자의 인용 많은 논문 (5)

📖 전문 본문 읽기 PMC JATS · ~66 KB · 영문

Introduction

Introduction
Colorectal cancer (CRC) contributes substantially to the worldwide health care burden. Globally, over 1.9 million cases are diagnosed each year, leading to more than 900,000 deaths annually [1]. In the U.S., about 152,810 people are diagnosed yearly, and annual mortality is about 53,010 [2]. The adenoma-carcinoma sequence is widely accepted as the underlying molecular process that leads to sporadic CRC development [3]. A series of genetic mutations occur in normal colonic mucosa that result in spontaneous formation of adenomas followed by invasive cancer [4]. Inactivation of tumor suppressor genes, such as APC, leads to dysregulated WNT signaling [5], and activation of oncogenes, such as KRAS, stimulates the MAPK pathway [6]. Sequential genetic and epigenetic changes over time then drive proliferative changes that may result in adenocarcinoma [7]. Thus, new approaches to detect CRC at an early stage, when treatment options are more effective, are urgently needed. Such approaches may be enabled by defining precursor cell populations, identifying early marker genes, and clarifying molecular pathways that initiate malignant transformation.
Previously, bulk RNA sequencing methods have been used primarily to investigate CRC molecular genetics [8]. Transcriptome profiling, biomarker discovery, cancer heterogeneity characterization, and investigation of therapeutic resistance mechanisms have been performed using mucosal tissues. However, only average gene expression levels across diverse cell populations are measured [9]. Recently, cancer stem cells (CSCs) have been implicated to play a key role in CRC initiation and growth [[10], [11], [12]]. These self-renewable, pluripotent cells have an innate capacity to regenerate as well as initiate tumors. Unlike established CSCs that sustain growth within overt tumors, tumor-initiating cells (TICs) are hypothesized to represent early precursor states that arise within histologically normal epithelium and give rise to neoplastic stem-like populations. Marker genes for CSCs in CRC have been reported, and include CD44, CD133, LGR5, DCLK1, CD166, CD26, and CD24 [[13], [14], [15], [16], [17], [18]]. Single-cell RNA sequencing (scRNA-seq) is an emerging approach that provides gene expression at the level of individual cells and can provide a more detailed analysis of cellular diversity [[19], [20], [21]]. This approach can be used to identify rare cell subpopulations, such as CSCs, subtle transcriptional variations, and temporal gene expression dynamics [22,23]. This method provides an opportunity to distinguish marker genes in premalignant versus malignant epithelium that may drive cancer initiation.
Tumor initiation within histologically normal colonic epithelium is a complex, multistep process involving the emergence of precursor cell states, transitional programs, and progressive microenvironmental remodeling [[24], [25], [26]]. While single-cell RNA sequencing (scRNA-seq) has been widely applied to characterize colorectal tumor heterogeneity, build molecular atlases, and define cancer stem cell states, few studies have directly interrogated the earliest transcriptional events that precede histologic transformation. In this study, we apply scRNA-seq to paired human colorectal polyps and adjacent normal-appearing mucosa to address three specific objectives: (1) to identify tumor-initiating cell–like populations within histologically normal epithelium, (2) to define the early transcriptional, metabolic, and signaling programs associated with their progression toward neoplastic stem-like states, and (3) to validate candidate TIC markers and pathway specificity using spatial approaches and independent single-cell datasets.

Methods

Methods

Human subject and tissue collection
Fresh human colonic tissues were obtained from 7 adult patients undergoing routine screening colonoscopy at Michigan Medicine under IRB approved protocol HUM00102771 (IRBMED). Informed written consent was obtained from all participants prior to enrollment. For each subject, paired biopsies were collected from colonic polyps and adjacent normal appearing mucosa. Specimen ID and corresponding pathology is provided in Table S1. Each specimen was deidentified prior to processing to ensure patient confidentiality. All authors had full access to the study data and approved the final version of the manuscript.

Tissue processing and histological evaluation
Biopsied colonic tissues were bisected upon collection. One portion was immediately processed for scRNA-seq to preserve transcriptomic integrity while the other was fixed in 10% neutral buffered formalin and paraffin-embedded for histological evaluation. H&E staining was performed on 5-μm tissue sections, and diagnoses were confirmed by an expert GI pathologist (JS).

Single-cell suspension preparation and sequencing
Single-cell suspensions were generated from freshly collected colon tissues using the Neural Tissue Dissociation Kit (Miltenyi Biotec, #130-092-628) according to the manufacturer’s protocol. Following dissociation, single cells were encapsulated into nanodroplets using the Chromium Controller (10X Genomics), and single-cell libraries were constructed using the Chromium Single Cell 5′ Library & Gel Bead Kit (10X Genomics). High-throughput sequencing of the resulting libraries was performed on the NovaSeq 6000 sequencer to enable transcriptome-wide profiling at single-cell resolution.

Data preprocessing and integration
Raw sequencing reads were aligned and quantified using Cell Ranger (10X Genomics) against the GRCh38 human reference genome [27]. Processed data were analyzed using Seurat v3.1.0 within the R environment [28,29]. Standard quality control filtering was applied to exclude cells with fewer than 200 detected genes or with mitochondrial gene content <25% of total UMI counts. Following quality control, data were normalized, log-transformed, and integrated across all patient samples using the Seurat integration pipeline to correct for inter-sample batch effects. Batch correction was evaluated qualitatively through t-SNE visual inspection, confirming that cells from different patient samples were well mixed and did not segregate by batch.

Clustering and cell type annotation
Dimensionality reduction was performed using principal component analysis (PCA), with the top 18 principal components selected for downstream clustering based on variance explained. Unsupervised clustering was then applied to groups of transcriptionally distinct cell populations. Cluster identities were visualized using both UMAP and t-SNE projections to assess spatial separation. Differentially expressed genes (DEGs) for each cluster were identified using the Seurat function FindAllMarkers. Cluster annotation was guided by the expression of established canonical marker genes and led to the identification of major cell types.

Identification of tumor-specific clusters
Cell clusters were annotated based on composition across tissue types and the expression of canonical marker genes to identify tumor-associated epithelial subpopulations. Clusters enriched in polyp-derived epithelial cells were examined for differential abundance compared to clusters derived from histologically normal mucosa. Marker genes associated with intestinal stem cells, e.g. OLFM4 and LGR5, and secretory lineages, e.g. REG4 and MUC2, were used to classify stem-like and deep crypt secretory (DCS) cell populations. Corresponding clusters with similar marker profiles in both normal and polyp were annotated as normal stem-like (nSTM) and normal DCS (nDCS) populations. Immunofluorescence (IF) staining was performed on FFPE sections using antibodies against OLFM4 and REG4 to validate gene expression at the protein level. Confocal microscopy was used to image fluorescence signal and confirm spatial localization of marker expression in epithelial crypt.

Gene set enrichment analysis (GSEA)
Differentially expressed genes (DEGs) between tumor-specific and normal epithelial subtypes, specifically, tumor stem-like (tSTM) versus normal stem-like (nSTM), and tumor deep crypt secretory (tDCS) versus normal DCS (nDCS) cells, were subjected to gene set enrichment analysis (GSEA) using the clusterProfiler package in R [30]. Gene sets for Hallmark pathways were obtained from the Molecular Signatures Database (MSigDB) [[31], [32], [33], [34]]. DEGs with adjusted p-values <0.05 were ranked and used to compute enrichment scores. Enrichment was assessed using normalized enrichment scores (NES) and false discovery rate (FDR)-adjusted q-values. Visualization of enriched pathways was performed using dot plots to illustrate functional profiles of tumor-associated versus normal epithelial cell populations.

CytoTRACE analysis
CytoTRACE (Cellular Trajectory Reconstruction Analysis using gene Counts and Expression) was performed to infer the differentiation potential of single cells using a pre-processed Seurat object [35]. The Seurat object, containing filtered and normalized single-cell RNA-seq data, was converted to a gene expression matrix using the as.matrix() function on the raw counts slot (arkov_object[[“RNA”]]@counts). This matrix was used as input for the CytoTRACE() function from the CytoTRACE R package (v0.3.3). Default parameters were applied unless stated otherwise. The resulting CytoTRACE scores, which estimate cellular plasticity based on transcriptional diversity, were mapped back onto the Seurat object metadata. These scores were then visualized on UMAP embeddings to reveal differentiation gradients across clusters. Higher CytoTRACE scores indicated less differentiated, more stem-like cell states, and were used to support downstream trajectory analyses.

InferCNV analysis
The copy number variation (CNV) score in the stem cell population cells was calculated based on the single-cell transcriptomic profiles using InferCNV (https://github.com/broadinstitute/inferCNV (ver 1.22.0) [36]. Cells from cluster 11 and 12 obtained from normal specimens were selected as references. For the inferCNV analysis, the following parameters were used: “denoise,” default hidden arkov model settings, and a value of 0.1 as the “cutoff” value. Finally, the subclusters with relatively higher CNV scores were considered malignant cells. CNV scores were calculated at the single-cell level as the mean absolute deviation of inferred CNV signal across genes for each cell. Comparisons between TICs and tSTM populations were performed by aggregating per-cell CNV scores within each cluster or condition, without additional per-cluster normalization.

Subclustering and tumor-initiating cell (TIC) identification
Epithelial clusters were re-clustered into transcriptionally distinct subpopulations using Seurat to investigate the transcriptional evolution of tumor-specific stem-like (tSTM) cells. Subclusters were annotated based on tissue origin, and specific subpopulations predominantly derived from histologically normal epithelium were flagged for further analysis as candidate tumor-initiating cells (TICs). Differential gene expression analysis was performed between TIC-enriched subclusters and the main tSTM population to identify early molecular events associated with tumor initiation. GSVA was subsequently conducted to examine pathway alterations associated with this transition, focusing on Hallmark gene sets. Genes showing progressive upregulation along the TIC-to-tSTM continuum were selected for further analysis to identify early transformation markers. Expression trajectories of candidate genes were visualized using pseudotime mapping.

Pseudotime trajectory analysis
Trajectory inference was performed using the Monocle 2 R package to investigate transcriptional transitions during epithelial transformation [37]. For pseudotime analysis, stem cell-associated clusters were extracted. Subclusters within the tSTM population were also isolated to evaluate the potential origin and differentiation trajectories of tumor-initiating cells (TICs) to further resolve lineage dynamics. Dimensionality reduction was carried out using the DDRTree algorithm, and cells were ordered along inferred trajectories using the orderCells function. Differential gene expression across pseudotime was computed using differentialGeneTest. Principal component-based visualization was used to map transcriptional transitions across pseudotemporal space. Gene expression changes and pathway activity along trajectory components were used to characterize phenotypic shifts associated with early tumorigenesis.

GSVA and correlation with pseudotime
GSVA was performed to quantify cell-level pathway activity across pseudotime trajectories with a focus on key Hallmark pathways such as EMT and oxidative phosphorylation (PHOS). GSVA scores were computed for each cell using predefined gene sets from the Molecular Signatures Database (MsigDB). Pearson’s correlation was calculated between GSVA scores and the primary trajectory component to evaluate the relationship between pathway activity and transcriptional progression. Gene expression trends for candidate early transformation markers were visualized along pseudotime. These analyses were used to characterize temporal dynamics of transcriptional reprogramming during the transition from tumor-initiating cells (TICs) to tumor stem-like (tSTM) cells.

RNA velocity analysis
RNA velocity analysis was conducted using the VeloVAE framework to infer directional transcriptional dynamics and predict future cell state transitions. Spliced and unspliced transcript count matrices were generated using the Kallisto|Bustools pipeline with a pre-built human reference index optimized for RNA velocity inference. The resulting matrices were processed through VeloVAE, a variational autoencoder-based model that estimates latent time, kinetic parameters, and RNA velocity vectors across cells. Following standard preprocessing, dimensionality reduction and clustering were re-applied to ensure alignment between velocity-derived trajectories and existing cell annotations. The inferred velocity field and latent temporal ordering were used to assess lineage progression and validate pseudotime-based trajectories of tumor-initiating cell populations.

RNA in situ hybridization
RNA fluorescence in situ hybridization (RNA-FISH) was performed to validate the spatial expression of early transformation markers in both normal and polyp. FFPE sections were cut at 5 μm thickness and mounted on Superfrost Plus glass slides. Sections were deparaffinized, subjected to heat-mediated antigen retrieval, and hybridized with RNA probes. ViewRNA™ Tissue Fluorescence Assay (Thermo Scientific, QVT0646B) was performed to detect RNA expression. RNA probes for SOD3 and ETS2 (VX06. Assay ID: VA1-3004554-VT and VX01, Assay ID: VA6-3168063VT) were obtained from Thermo Scientific. FISH assay was performed per the manufacturer’s protocol. Nuclei were counterstained with DAPI for cellular localization. Fluorescence imaging was conducted using a confocal microscope equipped with a 40 × oil-immersion objective to assess marker expression patterns in situ. RNA-FISH was used to corroborate scRNA-seq–based identification of early transformation signatures in morphologically normal and polyp tissue compartments.

Analysis of publicly available data
The QC-filtered data from the Colorectal Molecular Atlas Project [38] was downloaded from the HTAN data portal: https://data.humantumoratlas.org. For this study, processed Seurat object for discovery datasets were downloaded. All the downstream processing was performed according to the methods described in previous sections.

Cellchat analysis
Intercellular communication was inferred using the R package CellChat [39], which models signaling interactions based on single-cell transcriptomic data and a curated ligand–receptor interaction database. Briefly, normalized gene expression data and cluster labels were used to create CellChat objects for each condition of interest. CellChat computes communication probabilities for ligand–receptor pairs between cell populations by integrating expression data with known interaction networks, accounting for complex ligand and receptor structures and cofactors. Significant signaling pathways and interactions were identified using permutation testing and compared between tumor-initiating cell (TIC) and normal stem-like (nSTM) conditions. Visualizations of communication networks and pathway-specific interactions were generated using built-in CellChat functions. This approach enabled quantitative inference and comparison of intercellular signaling landscapes from scRNA-seq data. Accordingly, inferred signaling differences between TIC and nSTM populations should be interpreted as putative communication programs that may represent stress-associated signaling, niche formation, or a combination of both.

Transcription factor activity inference
Transcription factor (TF) activity was inferred from single-cell RNA-seq data using the DoRothEA regulon collection coupled with the VIPER algorithm. TF-target interactions with high confidence (levels A–C) were used to estimate TF activities by evaluating the expression of their downstream targets rather than TF expression alone, yielding proxy activity scores for each TF across cells. Briefly, we subset the Seurat object to include TIC and nSTM cells and extracted normalized expression data. DoRothEA human regulons were filtered for confidence levels A–C and provided as the input network for the run_viper() function, which computes normalized enrichment scores representing TF activity. The resulting TF activity matrix was incorporated back into the Seurat object as a new assay, followed by scaling and dimensionality reduction to visualize TF activity patterns. Differential TF activity between TIC and nSTM populations was assessed using Seurat’s differential expression framework applied to the TF activity assay.

NicheNet ligand activity analysis
To infer which ligands expressed by TIC and nSTM cells may regulate gene expression in fibroblasts, we applied the NicheNet approach, which integrates prior knowledge of ligand–receptor interactions and ligand–target relationships with gene expression data to prioritize active ligands and their downstream targets. NicheNet predicts ligand activity by assessing how well the predicted targets of each ligand explain the observed expression patterns in a receiver cell population, allowing identification of candidate signaling drivers of intercellular communication. Briefly, TIC, nSTM, and fibroblast subsets were extracted from the Seurat object, and expressed genes were defined based on a minimum detection threshold. A curated ligand–receptor network (lr_network) and ligand–target regulatory prior (ligand_target_matrix) were used to identify ligands expressed in sender populations whose receptors are expressed in fibroblasts. For each ligand set, we performed ligand activity prediction using the predict_ligand_activities() function, ranking ligands by their predicted regulatory potential on a gene set of interest in fibroblasts.

Statistical analysis
All analyses were performed in R (v4.1.0) unless otherwise specified. Cells were analyzed as nested within patients, and no statistical test treated individual patients as independent replicates unless explicitly stated. Differentially expressed genes (DEGs) were identified using the FindAllMarkers function in Seurat (v3.1.0), which applies the Wilcoxon rank-sum test with Benjamini–Hochberg correction for multiple testing. Genes were required to be expressed in at least 10% of cells in either cluster (min.pct = 0.1) and to show an absolute log fold change greater than 0.25 (logfc.threshold = 0.25). Genes with adjusted p < 0.05 and average log2 fold-change > 0.25 were considered significant. GSEA was performed using clusterProfiler, with significance defined as FDR-adjusted q < 0.05. GSVA scores were correlated with pseudotime using Pearson’s correlation. ROC analyses were performed using the pROC package. For each gene, area-under-the-curve (AUC), sensitivity, and specificity across thresholds were calculated, and the optimal threshold was determined by Youden’s index. Visualizations were performed with ggplot2 and Seurat functions. Thresholds and statistical metrics are reported in figures and supplementary tables. ROC analyses were conducted using per-cell measurements, and resulting performance metrics reflect cell-level discrimination rather than patient-level classification.

Results

Results

Single-cell transcriptomic profiling identifies tumor-associated epithelial subpopulations
Single-cell RNA sequencing (scRNA-seq) was performed on paired colonic biopsies from 7 human subjects and captured both polyps and adjacent histologically normal mucosa, Fig. 1A. The polyp cohort contained diverse histopathologic subtypes, including tubular adenomas, sessile and traditional serrated adenomas, and adenocarcinoma, as confirmed by pathology, Fig. S1, Table S1. After quality control, batch correction, and integration, Fig. S2A,B, 51,054 high-quality single-cell transcriptomes were obtained, including 31,376 normal and 19,678 polyp, Fig. 1B. After clustering, 33 transcriptionally distinct cell populations were identified, Table S2. These clusters were annotated into major epithelial, stromal, and immune cell types and used to investigate tumor-associated transcriptional reprogramming, Table S3. Clusters 0 and 10 were markedly enriched in polyp-derived epithelial cells compared to normal, and defined tumor-associated subpopulations, Fig. 1B,C. Dot plot analysis revealed distinct transcriptional programs across clusters, Fig. 1D, with cluster 0 exhibiting high expression of OLFM4 and LGR5, consistent with a tumor-specific stem-like (tSTM) epithelial phenotype. Cluster 10 showed elevated REG4 and MUC2, characteristic of a tumor-specific deep crypt secretory (tDCS) identity. In contrast, clusters 11 and 12 (normal stem-like, nSTM) and cluster 1 (normal deep crypt secretory, nDCS) were present in both normal and polyp tissues, reflecting homeostatic epithelial populations. Immunofluorescence validated strong upregulation of OLFM4 and REG4 in adenomatous epithelium relative to minimal expression in paired normal mucosa, Fig. 1E.

Subclustering and lineage trajectory analyses identify TIC precursors of tSTM cells
To investigate the developmental origin of tumor-specific stem-like cells, tSTM (cluster 0) was further resolved into 8 epithelial subclusters, Fig. 2A. Among these, subclusters 4 and 6 were predominantly derived from histologically normal mucosa and identified as candidate tumor-initiating cells (TICs), whereas subcluster 0 was enriched in polyp tissue, Fig. 2B. Subcluster 4 exhibited the highest stemness potential using CytoTRACE analysis with intermediate levels in subclusters 6, Fig. 2C,D. Monocle 2 trajectory analysis of TIC (sub 4 and sub 6) and one of the tSTM clusters (sub 0) revealed a lineage continuum in which subclusters 4 and 6 mapped to early states and progressed directionally toward sub0, representing a polyp-enriched epithelial branch, Fig. 3A,B. Projection of CytoTRACE scores along pseudotime confirmed that cells at the trajectory root exhibited the greatest stem cell potential, Fig. 3C,D. RNA velocity analysis was performed to further support this model, and showed transcriptional flow from subclusters 4 and 6 toward subcluster 0, consistent with a unidirectional differentiation pathway from early-stage TICs to tumor-specific stem-like states, Fig. 3E,F.

Trajectory analysis reveals progressive transcriptional reprogramming
To explore how TICs transition toward more aggressive phenotypes, we reconstructed their transcriptional trajectories along the principal component continuum. The progression of these clusters along component 1, Fig. S3A revealed a coordinated increase in epithelial–mesenchymal transition (EMT) activity accompanied by a decline in oxidative phosphorylation (OXPHOS), Fig. S3B,C, consistent with a gradual shift toward mesenchymal-like and metabolically reprogrammed states. Gene set enrichment analysis of TICs further supported this trend, showing downregulation of OXPHOS and upregulation of EMT and interferon-α signaling pathways, indicative of a stress-responsive, pro-tumorigenic transcriptional program, Fig. S3D. Trajectory analysis reinforced these observations, revealing a progressive reorganization of cellular and metabolic pathways that accompanies the evolution of cancer stem cell–like states from TICs, Fig. S3E,F. Along the principal component 1 trajectory, OXPHOS-related genes (ATP5MC2, COX7C) showed a coordinated decrease, whereas EMT/stemness-associated genes (CD44, TGFBI) increased, peaking mid-trajectory, consistent with a transition from metabolic to mesenchymal-like programs in tumor-initiating cells, Fig. S3G. Together, these findings suggest that metabolic reprogramming and activation of stress signaling are integral features of the early transition from TICs to stem cell–like populations, potentially linking EMT dynamics to the establishment of tumor-initiating potential.

Early transformation markers revealed by pseudotime analysis and validated by RNA-FISH
Early transformation markers were identified by differential gene expression analysis and expression along trajectory. Two genes, SOD3 and GPRC5A, rose steadily along pseudotime, and highlight their role as candidate molecular indicators of neoplastic progression, Fig. 4A-C. The expression of SOD3 and GPRC5A was found to be significantly higher in tSTM compared to TIC, Fig. 4D. RNA-FISH analysis confirmed spatial upregulation of SOD3 in adenomatous crypts with minimal expression in adjacent normal mucosa, Fig. 4E.

CNV analysis reveals genomic instability in tumor-associated cells
Copy number variation (CNV) from the single-cell transcriptomes was evaluated using InferCNV to provide orthogonal evidence of neoplastic transformation. nSTM cells originating from normal epithelium were used as reference. Epithelial clusters from normal mucosa showed minimal CNV alterations, consistent with genomic stability, Fig. S4A. Polyp-derived tSTM and nSTM populations from the observation clusters displayed widespread chromosomal amplifications and deletions, Fig. S4B. Notably, TICs (Cluster 0_N) did not exhibit significant chromosomal aberrations, suggesting that CNV acquisition occurs downstream of TIC emergence. Accordingly, the mean CNV score was higher in the tumor cluster compared with normal clusters, Fig. S4C. These findings demonstrate that tumor-associated epithelial subtypes are defined not only by transcriptional and pathway reprogramming but also by underlying genomic instability, highlighting their central role in early neoplastic progression.

TICs are mostly associated with tubular adenoma
Previous work by Chen et al., 2021 reported that serrated polyps originate through metaplastic processes. The dataset used in our study included samples representing multiple histologic subtypes of colorectal polyps. To investigate whether the TIC associated clusters identified in our analysis were linked to particular histologic subtypes, we performed a correlation analysis. A Chi-square test revealed a significant enrichment of TIC-high cells in tubular adenomas, suggesting that tubular adenomas may arise from the expansion of stem-like cell populations, Fig. 5A. To validate these findings, we analyzed an independent scRNA-seq dataset from the COLONMAP study, which includes transcriptomic profiles from 30 polyp and 35 normal (NL) colorectal specimens. Histologically, the cohort comprised 14 adenomas (AD), 10 serrated lesions (6 hyperplastic polyps and 4 sessile serrated polyps) (SER), and 6 specimens with unclassified histology (UNC). The dataset was preprocessed to include cell-type annotations, Fig. 5B. Consistent with our primary dataset, two distinct epithelial clusters, ASC (adenoma stem-like cells) and SSC (serrated stem-like cells), were enriched in polyp specimens, Fig. 5C. Notably, ASCs were predominantly enriched in tubular adenomas, whereas SSCs were more abundant in serrated polyps, mirroring the histologic distinctions observed in Chen et al., 2021. Using the “AddModuleScore” function in Seurat, we calculated a TIC-associated gene signature score for each cell in the COLONMAP dataset, identifying TIC-high cells based on their module scores rather than via label transfer, Fig. 5D. The majority of TICs were predominantly derived from adenoma specimens, Fig. 5E. A subsequent Chi-square correlation analysis again demonstrated a significant association between TIC-high cells and tubular adenomas, reinforcing the notion that tubular adenomas likely originate from stem cell expansion, Fig. 5F.

Integrated characterization of cell cycle, transcriptional programs, and cell–cell communication in TIC vs nSTM cells

Cell cycle differences between nSTM and TIC populations
We conducted a multi‑layered comparison between TICs and nSTM cells within normal epithelium to define distinct cellular states. TICs exhibited a more quiescent cell cycle profile relative to nSTMs, with significantly lower S‑phase and G2/M‑phase scores (Wilcoxon rank‑sum tests, p < 2.2  ×  10⁻¹⁶ for both). The distribution of cell cycle phases also differed markedly (χ²(2) = 1296.6, p < 2.2  ×  10⁻¹⁶), as nSTMs were predominantly assigned to S and G2/M phases, whereas TICs were enriched in G1, Fig. 6A–C.

Differential gene expression and pathway enrichment
Differential expression analysis revealed distinct transcriptional programs between TICs and nSTMs (Fig. S5A). Gene Set Enrichment Analysis (GSEA) using Hallmark pathways showed that inflammatory and hypoxia-related signatures, including TNFA_SIGNALING_VIA_NFKB and HYPOXIA, were enriched in TICs. In contrast, cell cycle–associated programs such as G2M_CHECKPOINT, E2F_TARGETS, and MITOTIC_SPINDLE were enriched in nSTMs, Fig. 6D, Table S4). These results highlight the transcriptional divergence between quiescent TICs and proliferative nSTMs.

Transcription factor activity differences
We inferred transcription factor (TF) activities using the DoRothEA/VIPER framework. This analysis identified significant differential activity between TICs and nSTMs. Proliferation-linked TFs (e.g., E2F2, E2F3, E2F4, FOXM1, TFDP1) were attenuated in TICs, whereas TFs such as RUNX1, RARA, SMAD3, FOXO1 exhibited divergent activity patterns consistent with distinct cellular programs, Fig. 6E, Fig. S5B.

Cell–cell communication landscape
CellChat analysis revealed distinct intercellular signaling profiles between TICs and nSTMs, Fig. S6. Pathways including APP, LAMININ, THBS, MIF, and MK were among the most differentially engaged, Fig. 6F. Focusing on the APP signaling pathway, TICs exhibited stronger interaction strength with most target populations compared with nSTMs. Notably, APP–CD74 signaling to fibroblasts was absent in nSTMs but present in TICs, Fig. S7A,B. Quantification of the communication probability for the fibroblast APP–CD74 interaction revealed a higher probability from TICs (0.0370, p = 0.05) than from nSTMs (0.0323, p = 0.24), indicating stronger TIC–fibroblast signaling, Fig. S7C. Consistently, APP expression was significantly higher in TICs compared with nSTMs, while the APP-associated receptor ITGB1 was comparable between the populations, Fig. S7D. Expression of APLP2, a homolog of APP, did not differ between TICs and nSTMs, suggesting that reduced APP ligand availability rather than receptor expression or compensation by related family members underlies the loss of APP signaling from nSTMs to fibroblasts. To assess ligand-driven regulation of fibroblast gene expression, we performed targeted NicheNet analysis using TICs and nSTMs as sender populations and fibroblasts as receivers. Ligands expressed in TICs or nSTMs with cognate receptors detected in fibroblasts were prioritized, and ligand regulatory activity was inferred based on their ability to predict fibroblast variable gene expression. Comparison of predicted ligand activities revealed largely overlapping regulatory potentials between TIC and nSTM-derived ligands, with no major differences in top-ranked ligands influencing fibroblast transcriptional programs, Fig. S7E. Collectively, these results support that TICs engage distinct signaling pathways with the microenvironment, notably enhanced APP-mediated interactions with fibroblasts.

Integration of TF, cell–cell communication, and hallmark pathways
Integration of TF activity, differential signaling pathways, and enriched Hallmark programs revealed coordinated regulatory modules. Scaled associations between TFs and selected signaling pathways showed that classic cell cycle TFs link to specific communication axes, whereas other TFs associate with pathways reflecting stress and differentiation programs, Fig. 6G. Overlap analysis between CellChat ligand–receptor gene sets and Hallmark EMT and HYPOXIA pathways further supported functional links between intercellular signaling and core transcriptional processes, Fig. 6H. Finally, an integrated network combining differential TF activity, selected CellChat pathways, and Hallmark programs illustrated that key regulators (e.g., E2F family members, FOXM1, TFDP1) associate with both communication pathways (APP, LAMININ, THBS, MIF, MK) and enriched pathways such as epithelial–mesenchymal transition and hypoxia, reflecting coordinated shifts in regulatory and signaling states distinguishing TICs from nSTMs, Fig. 6I. Together, these analyses demonstrate that TICs and nSTMs occupy distinct molecular states defined by coordinated differences in cell cycle progression, transcriptional regulation, and intercellular communication.

TIC-specific biomarkers identified and validated by statistical and spatial analyses
To identify markers that specifically distinguish TICs from other epithelial cells in normal epithelium, we performed a differential gene expression analysis on normal epithelial cells within Cluster 0. These TIC-specific markers are distinct from the broader TIC gene signature used to score cells, providing a focused set of genes that uniquely define TIC identity in normal tissue. A differential gene expression analysis was performed on normal epithelial cells. Comparative analysis of TICs versus normal epithelial cells revealed strong enrichment of ETS2, SLC12A2, and LEFTY1 within TIC populations, Fig. 7A,B. Each gene demonstrated high diagnostic performance in distinguishing TICs from normal epithelium with sensitivities ranging from 0.73 to 0.86, Table S5. Spatial validation by RNA-FISH confirmed the presence of TICs in normal crypt epithelium marked by elevated ETS2 transcript levels, Fig. 7C. Notably, TICs were also identified in polyp specimens, Fig. 7C. The COLONMAP dataset was analyzed to evaluate the diagnostic performance of TIC marker genes. Supporting our primary data, the ETS2, SLC12A2 and LEFTY1 showed higher expression on TICs in comparison with normal epithelium, Fig. 7D, Table S5. Statistical analysis showed high sensitivity and specificity for the detection of TICs for these genes, Fig. 7E.

Discussion

Discussion
In this study, we applied single cell RNA sequencing (scRNA-seq) to paired biopsies of colonic adenomas and adjacent normal mucosa from seven patients to investigate the earliest cellular and molecular events in colorectal tumorigenesis. Our analysis uncovered two tumor associated epithelial populations, including a tumor specific stem like (tSTM, cluster 0) state marked by OLFM4 and LGR5, and a tumor specific deep crypt secretory (tDCS, cluster 10) state characterized by REG4 with limited MUC2 expression. Subclustering of the tSTM population revealed eight epithelial subsets, among which subclusters predominantly derived from histologically normal mucosa emerged as candidate tumor initiating cells (TICs). Although our dataset did not directly measure canonical Wnt activity (e.g., β catenin targets), the enrichment of LGR5 and OLFM4 in TICs is consistent with Wnt/β catenin pathway engagement, as LGR5 is a well characterized Wnt target and core marker of intestinal stem cells and has been repeatedly identified as a defining feature of intestinal stem cells and colorectal cancer stem cells [40,41]. CytoTRACE, pseudotime ordering, and RNA velocity consistently placed TICs at the root of lineage trajectories progressing toward polyp enriched tSTM cells, supporting their role as precursors of neoplastic stem like states.
The transcriptional trajectory from TICs to tSTM cells was marked by progressive reprogramming, including activation of epithelial–mesenchymal transition (EMT) and suppression of oxidative phosphorylation, consistent with mounting evidence that metabolic plasticity and EMT are integral to early neoplastic transformation and cancer stem like states in colorectal tissues [[42], [43], [44], [45], [46]]. GSVA and GSEA analyses further revealed enrichment of proliferative, inflammatory, and stress associated pathways, including E2F, MYC, KRAS, and TNFα/NFκB signaling. Copy number variation (CNV) profiling provided orthogonal evidence of genomic instability in polyp tSTM cells relative to their normal counterparts, while TICs exhibited largely stable genomes, consistent with their origin from normal epithelium. Together, these observations provide a framework for understanding early cellular and molecular events in colorectal neoplasia and generate testable hypotheses regarding the mechanisms driving the transition from TICs to tumor-like stem cells.
Pseudotime and differential expression analyses identified SOD3 and GPRC5A as early transformation markers, validated by RNA FISH in adenomatous crypts. Comparative analyses of TICs versus normal epithelium identified ETS2, SLC12A2, and LEFTY1 as robust TIC specific biomarkers with high diagnostic sensitivity and specificity. To address specimen heterogeneity and validate these findings, we analyzed an independent scRNA-seq dataset from the COLONMAP study, comprising 30 polyp and 35 normal colorectal specimens across multiple histologic subtypes [38]. This analysis confirmed that TIC like cells were enriched predominantly in tubular adenomas and that ETS2, SLC12A2, and LEFTY1 were consistently upregulated in these cells. By including specimens with varying histologies and from independent patients, the COLONMAP validation strengthens the generalizability of our observations, mitigating limitations associated with small cohort size and heterogeneity in the primary dataset.
The COLONMAP validation is particularly important in light of recent work by Chen et al., who generated a comprehensive single cell atlas of human colorectal precancers that revealed distinct origins and microenvironmental programs for conventional adenomas versus serrated polyps [38]. Their study showed that conventional adenomas arise from Wnt driven expansion of stem cells, whereas serrated polyps derive from differentiated cells through gastric metaplasia, with divergent immune microenvironments associated with these precancer pathways. Our data align with these observations by demonstrating that TICs in the conventional adenoma pathway resemble normal stem like cells and follow a progression toward stem like tumor states. The strong enrichment of TICs in tubular adenomas in both our primary and COLONMAP cohorts supports a model in which stem like expansion, rather than differentiated or metaplastic processes, underlies early transformation in this route, distinguishing the conventional adenoma pathway from serrated neoplasia.
Importantly, our study extends prior work by evaluating intercellular communication dynamics between TICs and the surrounding microenvironment. CellChat analysis revealed that TICs engage enhanced APP–CD74 signaling with fibroblasts, which was largely absent in nSTM cells. Specifically, APP–CD74 interactions were significantly more probable from TICs than nSTMs, while expression of APP itself was elevated in TICs and the receptor ITGB1 remained comparable between populations. Expression of the APP homolog APLP2 did not differ, suggesting that loss of APP ligand availability, rather than receptor downregulation or compensatory family member expression, underlies the absence of APP signaling from nSTMs. These findings indicate that TICs not only acquire intrinsic transcriptional and metabolic reprogramming but also actively remodel their local microenvironment via ligand mediated crosstalk, which may support early neoplastic progression and stem cell niche establishment. Collectively, these insights highlight the possibility that early stromal modulation may represent a targetable axis for intervention in the initial stages of colorectal neoplasia.
Functionally, TIC–fibroblast APP signaling may reinforce tumor initiating programs by influencing fibroblast behavior, consistent with prior studies linking APP–CD74 interactions to cellular adhesion, proliferation, and inflammatory responses. Our integrated analysis of transcription factor activity, signaling pathways, and Hallmark programs further revealed coordinated regulatory modules in TICs, where classic cell cycle TFs were coupled to communication pathways and stress/differentiation associated TFs aligned with EMT and hypoxia signatures. This underscores the interplay between intrinsic regulatory networks and extrinsic microenvironmental signaling in establishing early tumorigenic states.
In summary, our study maps the earliest molecular events in colorectal tumorigenesis, revealing the emergence of TICs from normal epithelium, their progressive reprogramming toward stem-like and mesenchymal states, and their selective engagement of microenvironmental signaling such as APP–CD74. Validation in the COLONMAP cohort confirms these patterns across patient samples and histologic subtypes. These findings highlight TIC-associated programs as potential targets for early detection, prevention, and intervention, and motivate future functional studies using organoid or xenograft models to dissect their role in adenoma initiation and progression.

Abbreviations

Abbreviations

•AUC – Area Under the Curve

•CRC – Colorectal Cancer

•CSC – Cancer Stem Cell

•DCS – Deep Crypt Secretory

•DEG – Differentially Expressed Gene

•EMT – Epithelial-to-Mesenchymal Transition

•FFPE – Formalin-Fixed Paraffin-Embedded

•FISH (RNA-FISH) – RNA Fluorescence In Situ Hybridization

•GSEA – Gene Set Enrichment Analysis

•GS – Gene Signature

•GSVA – Gene Set Variation Analysis

•H&E – Hematoxylin & Eosin

•MSigDB – Molecular Signatures Database

•nDCS – Normal Deep Crypt Secretory

•NES – Normalized Enrichment Score

•nSTM – Normal Stem

•PCA – Principal Component Analysis

•OXPHOS – Oxidative Phosphorylation

•R – R Statistical Computing Environment

•RNA-FISH – RNA Fluorescence In Situ Hybridization

•scRNA-seq – single cell RNA sequencing

•STM – Stem

•tDCS – Tumor-Specific Deep Crypt Secretory

•TIC – Tumor Initiating Cell

•t-SNE – t-distributed Stochastic Neighbor Embedding

•tSTM – Tumor-Specific Stem

•UMAP – Uniform Manifold Approximation and Projection

•UMI – Unique Molecular Identifier

Declarations

Declarations

Ethics approval and consent to participate
Fresh human colon specimens were obtained with informed written consent from patients undergoing routine colonoscopy at the University of Michigan Hospital. All patient reports and human tissues were deidentified prior to the study. Patient specimens were collected with the approval of the Michigan Medicine IRB under protocol HUM00102771.

Consent for publication
Not Applicable

Availability of data and materials
The raw transcriptome data will be deposited in the Genome Sequence Archive (GSA). All other relevant data are available on request from the authors. The code used to generate the graphic presentation is available on GitHub (https://github.com/tstephie/Jaiswal_scRNA-seq_colon).

Funding

Funding
This study was funded in part by the National Institutes of Health (NIH)
U01 CA230669 and R01 CA249851 (TDW), and R37CA262209 (JS).

Model for early colorectal tumor initiation and progression

Model for early colorectal tumor initiation and progression
Schematic illustrating a proposed stepwise trajectory from histologically normal colonic mucosa to adenoma through the emergence of tumor-initiating cells (TICs) and subsequent tumor-specific stem-like cells (tSTMs). Within normal mucosa, a subset of epithelial cells acquires a TIC state characterized by enhanced stemness, early activation of epithelial–mesenchymal transition (EMT) programs, and suppression of oxidative phosphorylation. TICs then progress toward tSTMs, marked by upregulation of early transformation markers (e.g., SOD3, GPRC5A), acquisition of copy number variations (CNVs), and increased interaction with the microenvironment. Notably, tSTMs engage fibroblasts through APP–CD74–mediated signaling, contributing to niche remodeling and aberrant epithelial expansion, ultimately culminating in adenoma formation.

CRediT authorship contribution statement

CRediT authorship contribution statement
Sangeeta Jaiswal: Conceptualization, Formal analysis, Investigation, Visualization, Validation, Writing – original draft. Stephanie The: Formal analysis, Investigation, Validation, Visualization. Tse-Shao Chang: Data curation, Project administration. Jiaqi Shi: Funding acquisition, Supervision. Thomas D Wang: Supervision, Writing – review & editing.

Declaration of competing interest

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

출처: PubMed Central (JATS). 라이선스는 원 publisher 정책을 따릅니다 — 인용 시 원문을 표기해 주세요.

🏷️ 같은 키워드 · 무료전문 — 이 논문 MeSH/keyword 기반

🟢 PMC 전문 열기