Spatial gene expression analysis reveals drivers of extremely early lymph node metastasis in breast cancer.
1/5 보강
Lymph node metastasis correlates with breast cancer prognosis; however, the cellular mechanisms underlying the earliest metastatic events remain unclear.
APA
Nagasawa S, Kajiya K, et al. (2026). Spatial gene expression analysis reveals drivers of extremely early lymph node metastasis in breast cancer.. NPJ breast cancer, 12(1). https://doi.org/10.1038/s41523-026-00897-1
MLA
Nagasawa S, et al.. "Spatial gene expression analysis reveals drivers of extremely early lymph node metastasis in breast cancer.." NPJ breast cancer, vol. 12, no. 1, 2026.
PMID
41578129 ↗
Abstract 한글 요약
Lymph node metastasis correlates with breast cancer prognosis; however, the cellular mechanisms underlying the earliest metastatic events remain unclear. In spatial transcriptomic analysis of a patient with breast cancer at single-cell resolution, we identified 30 tumor cells representing the initial metastatic seeding in a lymph node. These cells originated from multiple epithelial-mesenchymal (EM) transition status and included six distinct subpopulations with biological significance. Only cells exhibiting a metabolic shift toward fatty acid metabolism successfully established lymph node colonies, implicating this shift in metastatic fitness. The tumor microenvironment surrounding these cells showed immunosuppressive and tumor-promoting features, supporting metastasis establishment. Cross-referencing these expression profiles with public datasets revealed that poor prognosis correlated not with fully mesenchymal or metastatic populations, but with hybrid EM cells exhibiting epithelial and mesenchymal traits. These findings highlight the metabolic and phenotypic plasticity of metastatic cells and serve as translational bridges between the spatial evolution of tumor cells in the extremely early stages of lymph node metastasis and clinical prognosis in breast cancer.
같은 제1저자의 인용 많은 논문 (1)
📖 전문 본문 읽기 PMC JATS · ~58 KB · 영문
Introduction
Introduction
Axillary lymph nodes are considered the “first stop” in breast cancer metastasis1,2. The presence of metastatic tumor cells in these nodes correlates with poor prognosis3–7. Tumor cell profiles in metastases have been studied extensively. For example, Lee et al. used a mouse model and found that a shift toward fatty acid oxidation (FAO), driven by the selective activation of the transcription coactivator YAP in lymph node metastases, is essential for their establishment8. Liu et al. used single-cell RNA sequencing (RNA-seq) and spatial transcriptomics to compare the microenvironments of primary and lymph node metastatic tumors, reporting that in lymph node metastases, compared with primary tumors, T cell activation, cytotoxicity, and proliferation are markedly suppressed while dendritic cells show reduced ability to prime and activate T cells9.
Although the requirements for lymph node metastasis and the niche environment are becoming clearer, how these molecular characteristics relate to cellular plasticity, possibly involving EMT and MET, remains uncertain. Notably, almost no studies have reported the first metastatic event in the lymph nodes. This is because it is difficult to track the trajectory of metastasis in humans, and it is extremely difficult to observe transient and plastic EMT states in vivo.
Substantial evidence supports the role of epithelial-to-mesenchymal transition (EMT) and mesenchymal-to-epithelial transition (MET) in metastasis10,11. EMT is a cellular process in which epithelial cells acquire mesenchymal cell characteristics, causing dramatic changes in tissue structure and function. Cells take on a fibroblast-like morphology and increase their motility (ability to move) and invasiveness (ability to invade surrounding tissues). MET is a process that reverses the changes in cellular phenotype induced by EMT. In cancer cells, suppression of EMT-Transcription factors like TWIST1 is necessary to promote MET, which is required for proliferation after metastasis12. Despite extensive debate, it remains unclear how EMT or MET contributes to metastatic potential and whether it is a necessary condition for metastasis10,13–16. A major challenge in addressing this question lies in EMT’s broad and evolving definition. EMT was once considered a binary switch between epithelial and mesenchymal states. However, several in vitro studies have shown that EMT progresses gradually, forming hybrid subpopulations that occupy intermediate states between epithelial and mesenchymal identities17,18. Interestingly, tumor cell subpopulations nearest to the mesenchymal state do not necessarily exhibit the highest metastatic potential19. Indeed, high metastatic potential often lies in the hybrid state. Conversely, evidence suggests that metastatic potential and the reverse process, MET, are not correlated, indicating that EMT and MET may need to be considered separately17. To understand the molecular events driving metastasis, it is essential to examine how EMT progresses in the primary tumor by dissecting its intermediate states and how, and to what extent, metastasized cells restore epithelial traits in the lymph node20.
In the present study, we used the Xenium in situ hybridization platform, which enables spatial gene expression profiling at the single-cell resolution, to track tumor cells in vivo during the metastasis process. The purpose of the study was to clarify the spatial evolution of tumor cells in the extremely early stages of breast cancer lymph node metastasis.
Axillary lymph nodes are considered the “first stop” in breast cancer metastasis1,2. The presence of metastatic tumor cells in these nodes correlates with poor prognosis3–7. Tumor cell profiles in metastases have been studied extensively. For example, Lee et al. used a mouse model and found that a shift toward fatty acid oxidation (FAO), driven by the selective activation of the transcription coactivator YAP in lymph node metastases, is essential for their establishment8. Liu et al. used single-cell RNA sequencing (RNA-seq) and spatial transcriptomics to compare the microenvironments of primary and lymph node metastatic tumors, reporting that in lymph node metastases, compared with primary tumors, T cell activation, cytotoxicity, and proliferation are markedly suppressed while dendritic cells show reduced ability to prime and activate T cells9.
Although the requirements for lymph node metastasis and the niche environment are becoming clearer, how these molecular characteristics relate to cellular plasticity, possibly involving EMT and MET, remains uncertain. Notably, almost no studies have reported the first metastatic event in the lymph nodes. This is because it is difficult to track the trajectory of metastasis in humans, and it is extremely difficult to observe transient and plastic EMT states in vivo.
Substantial evidence supports the role of epithelial-to-mesenchymal transition (EMT) and mesenchymal-to-epithelial transition (MET) in metastasis10,11. EMT is a cellular process in which epithelial cells acquire mesenchymal cell characteristics, causing dramatic changes in tissue structure and function. Cells take on a fibroblast-like morphology and increase their motility (ability to move) and invasiveness (ability to invade surrounding tissues). MET is a process that reverses the changes in cellular phenotype induced by EMT. In cancer cells, suppression of EMT-Transcription factors like TWIST1 is necessary to promote MET, which is required for proliferation after metastasis12. Despite extensive debate, it remains unclear how EMT or MET contributes to metastatic potential and whether it is a necessary condition for metastasis10,13–16. A major challenge in addressing this question lies in EMT’s broad and evolving definition. EMT was once considered a binary switch between epithelial and mesenchymal states. However, several in vitro studies have shown that EMT progresses gradually, forming hybrid subpopulations that occupy intermediate states between epithelial and mesenchymal identities17,18. Interestingly, tumor cell subpopulations nearest to the mesenchymal state do not necessarily exhibit the highest metastatic potential19. Indeed, high metastatic potential often lies in the hybrid state. Conversely, evidence suggests that metastatic potential and the reverse process, MET, are not correlated, indicating that EMT and MET may need to be considered separately17. To understand the molecular events driving metastasis, it is essential to examine how EMT progresses in the primary tumor by dissecting its intermediate states and how, and to what extent, metastasized cells restore epithelial traits in the lymph node20.
In the present study, we used the Xenium in situ hybridization platform, which enables spatial gene expression profiling at the single-cell resolution, to track tumor cells in vivo during the metastasis process. The purpose of the study was to clarify the spatial evolution of tumor cells in the extremely early stages of breast cancer lymph node metastasis.
Results
Results
Multi-omics analysis in each breast cancer region
Before analyzing spatial patterns at the primary site and metastatic lymph node, we first examined the molecular features of the primary cancer via multiregional bulk analysis. The patient, an 80-year-old female (BRC-26), underwent surgery without neoadjuvant therapy. Her clinical diagnosis was stage II invasive breast carcinoma of the HER2 type, i.e., ER-negative, PgR-negative, and HER2-positive (for full clinicopathologic details, refer to Supplementary Table 1). For high-quality molecular analysis, especially RNA-seq, we freshly froze part of the tumor tissue harvested during surgery. We roughly dissected samples from histologically noncancerous regions (NC regions; NC-1 and NC-2) and cancerous regions (Ca regions; Ca-3 and Ca-4) (Fig. 1a and Supplementary Fig. 1a). Histologically, Ca-4 appeared sparser and less adhesive compared with Ca-3 (Supplementary Fig. 1b). We subjected these tissues to RNA-seq, enzymatic methylation sequencing (EM-seq), and whole-genome sequencing (WGS) (refer to Supplementary Table 2 for statistical analyses).
We first performed transcriptome analysis using RNA-seq to compare gene expression between the NC and Ca regions. Of 4,584 differentially expressed genes (DEGs), 2,248 and 2,336 were significantly upregulated and downregulated in the cancer region (Supplementary Fig. 1c). Gene pathway enrichment analysis of these DEGs showed that “cell adhesion” pathways were enriched in NC regions, whereas pathways for “carboxylic acid metabolism,” and “cell cycle” were enriched in Ca regions (Fig. 1b), suggesting progressive malignant transformation from NC to Ca regions21. DNA methylation analysis revealed more extensive genome-wide hypomethylation in Ca regions compared with NC regions (Fig. 1c). Furthermore, promoters, intergenic regions, and gene bodies all showed reduced methylation (Supplementary Fig. 1d). Hypomethylation of intergenic and intron regions, especially repetitive sequences, can promote chromosomal instability and mutations, potentially increasing cancer risk22–24. These results confirm that breast cells undergo malignant transformation as they progress from NC to Ca regions.
Within Ca regions, we observed significant changes in key metastasis-related genes (Fig. 1d). Expression of EMT markers, such as Vimentin25, and transcription factors Snail and ZEB2, considered known EMT regulators26,27, increased from Ca-3 to Ca-4. Correspondingly, DNA methylation levels declined from Ca-3 to Ca-4 (Supplementary Fig. 1e), consistent with previous reports that EMT-related genes are epigenetically regulated28–30. Similarly, methylation of the THY1 gene, which was highly expressed in Ca-4, decreased from Ca-3 to Ca-4 (Fig. 1e, f). THY1, a GPI-anchored protein involved in cell adhesion, migration, and polarity, can suppress lung metastasis when its integrin signaling is inhibited31,32. THY1, which regulates cancer cell migration and invasion, also appears epigenetically controlled. These molecular profiles indicate that malignant transformation from NC to Ca regions, along with DNA methylation differences within Ca regions, may underlie varying EMT phenotypes in tumor cells33,34.
WGS results further supported cancer progression from NC to Ca regions. We detected mutations in KMT2C (Y987H), the most frequently mutated histone methyltransferase in breast cancer35, across all regions (Fig. 1g). PIK3CA (H1047R), a known cancer driver, was mutated in all regions except NC-1. Mutations in transcription factor FOXA1 (R262P) and histone demethylase KDM6A (P281R) were specific to Ca regions (Fig. 1h). Copy number variation (CNV) analysis revealed chromosomal abnormalities in Ca regions (Fig. 1i). These findings suggest that tumor development occurred against a background of clonal mammary gland expansion36. Overall, multiregional analyses indicated that breast cancer follows a well-characterized molecular trajectory from NC to Ca regions. Importantly, these molecular features aligned closely with histological observations, especially regarding gene regulation.
Spatial analysis of breast cancer at single-cell resolution
To investigate the molecular mechanisms of metastasis in detail, we performed spatial gene expression analysis at single-cell resolution on primary tumors and paired axillary metastatic lymph nodes from the patient shown in Fig. 1. In addition to the primary tumor adjacent to the bulk analysis region and lymph node metastases from the same case, we included a tumor-draining lymph node (TdL) as a comparison specimen (Fig. 2a). The TdL was clinically diagnosed as metastasis-negative by rapid intraoperative evaluation.
We applied Xenium spatial gene expression profiling using a custom 380-gene breast cancer panel (refer to Supplementary Table 3 for the gene list). Across the primary tumor, metastatic lymph node, and TdL, Xenium detected 1,005,436 cells (Fig. 2b). Using gene expression signatures37–39 and histological location information, we annotated clusters by major cell types. The luminal cell cluster was subdivided into two clusters, (i) and (ii), with (i) further branching into four subclusters: L1: luminal 1; L2: luminal 2; B: basal; and M: mesenchymal-like (Fig. 2c).
We identified genes significantly upregulated within each cluster. KRT8 and KRT18, markers of mammary ductal epithelial cells, were enriched in L1 and L2, whereas KRT5 and KRT14, myoepithelial markers, were enriched in cluster B (Supplementary Fig. 2). Consistent with expression information, spatial localization of L1, L2, and B clusters matched the lobular architecture of the mammary gland (Fig. 2d). High ANKRD30A expression in L2 and elevated KIT and KRT23 expression in L1 aligned with two previously reported mature luminal cell types40: mature luminal cell 2 (KRT18+/ANKRD30A+) and mature luminal cell 1 (KRT18+/KRT23 + /KIT+). L1, L2, and B clustered on the lobules, whereas M cells were distributed sporadically throughout the tumor, suggesting reduced cell adhesion (Fig. 2e). Consistent with this, the M cluster showed increased expression of EMT markers, including THY1, an EMT-related gene shown in Fig. 1e. Collectively, these findings support a role for THY1 expression in EMT initiation from normal lobule cells (L1, L2, and B), potentially regulated through DNA methylation.
Although cluster (i) encompassed lobular and early EMT-stage cells, cluster (ii) localized pathologically to tumor regions and comprised definitive cancer cells. Strong ANKRD30A expression in cluster (ii) (Fig. 2f) suggested that L2 may be the tumor’s cell of origin40. Uniform manifold approximation and projection (UMAP) showed that metastatic lymph node tumor cells formed distinct clusters. Based on high ANKRD30A expression, we defined L2 as the tumor-originating cell and applied pseudotemporal trajectory analysis using Monocle 341. The inferred trajectory showed branching from L2 into (ii)-1, progressing to (ii)-2, and finally differentiating into metastatic tumor cells in lymph nodes (Fig. 2g). Taken together, these findings suggest a progression pathway in which cancer cells originate from L2, acquire EMT potential in the M cluster, and ultimately establish metastases within cluster (ii) in lymph nodes.
Spatial analysis of the lymph nodes reveals six transcriptionally distinct EMT states
When inspecting spatial gene expression data to further investigate the mechanisms underlying metastasis, we unexpectedly identified a small cluster of only 30 cells within the TdL, a site clinically diagnosed as metastasis-free; these cells spanned a region approximately 200 μm in diameter (Fig. 3a). Gene expression analysis revealed clear KRT19 expression, an epithelial marker absent in normal lymph node cells. Additional hematoxylin and eosin (HE) staining confirmed their morphology was consistent with cancer cells. The identification of these cells in the TdL provided a rare opportunity to examine molecular events at an extremely early stage of metastasis. This form of early dissemination is clinically referred to as isolated tumor cells (ITCs), a term we adopt hereafter.
When we visualized ITCs on the UMAP plane, we observed transcriptional heterogeneity (Fig. 3b, left panel), indicating that ITCs comprise several distinct cell types. To better characterize the biology of ITCs and the roles of different cell states in metastasis, we applied nonhierarchical clustering to subclassify tumor cells into six groups (C1–C6) and assigned each ITC to one of these clusters (Fig. 3b, right panel). Owing to the small number of ITCs detected, subsequent analyses focused on the clusters to which each ITC mapped. For each subpopulation associated with an ITC, we identified distinct signaling pathways linking histology, spatial location, and gene expression (Fig. 3c–e and Supplementary Table 4).
C6 occupied a position on the UMAP closest to EMT initiation (corresponding to the M cluster in Fig. 2c). Its scattered distribution within the primary tumor suggested low intercellular adhesion. C6 cells expressed high levels of ETM-related markers, such as VIM and CDH3, as well as stemness markers characteristic of breast cancer, including CD24low and CD44high (Fig. 3d). In contrast, epithelial marker expression was low, indicating a sustained mesenchymal phenotype. Gene enrichment analysis showed activation of EMT, stem cell signaling, TGF-β signaling, WNT signaling, immune checkpoint pathways, and antianoikis signaling (Fig. 3e). Signals from the microenvironment, such as TGF-β and Wnt ligands, induce various EMT-related transcription factors42,43. EMT also promotes PD-L1 expression, activating immune checkpoint pathways44,45. These findings suggest that C6 represents tumor cells in an extreme mesenchymal phenotype, capable of evading anoikis and immune clearance. In contrast, C4 most strongly retained epithelial features and was the most predominant cell population within lymph node metastases, forming the tumor mass (Fig. 3c). C4 cells showed enrichment of cholesterol homeostasis and fatty acid metabolism pathways, with upregulation of fatty acid metabolism–related molecules in metastatic lymph nodes (Fig. 3e, f). A previous mouse-based lymph node metastasis study demonstrated that a metabolic shift toward FAO in tumor cells is required for lymph node colonization by cancer cells8. Thus, C4 represents a cancer cell population that has acquired an extreme epithelial phenotype and metabolic reprogramming suited for metastatic growth.
Clusters C1, C2, C3, and C5 represented hybrid epithelial–mesenchymal (EM) cell populations that coexpressed epithelial and mesenchymal markers (Fig. 3d). The degree of coexpression differs among clusters, each cluster exhibited distinct features. C5 showed the strongest enrichment for the G2M checkpoint pathway (Fig. 3g). Prior studies have indicated that breast cancers with G2M activity display higher proliferative activity, increased MYC pathway activation, earlier metastasis, and worse survival46, suggesting that C5 may represent a more aggressive hybrid EMT subpopulation. C3 was enriched for VEGF signaling and glycolysis, likely reflecting a hypoxic environment in the tumor core that promotes angiogenesis and glycolysis47. C2, localized at the tumor periphery, i.e., the invasive front, displayed activation of matrix metalloproteinases (MMPs) and canonical Wnt signaling. Cells at tumor margins exhibit EM plasticity and migratory behavior48, with MMP signaling42 and the Wnt/β catenin pathway49 known to promote invasion. Canonical Wnt signaling was also enriched in C2 and hybrid EMT subpopulations (C3 and C6), consistent with reports that Wnt7A/B maintains the hybrid EM state12,48.
C1 showed activation of retinoid metabolism and lipid transport pathways. Retinoic acid activates retinoic acid receptors and retinoid X receptors, driving expression of fatty acid metabolism genes50. Considering that C1 gene expression aligns closely with C4 (the most significantly proliferating cluster in lymph node metastases, involving elevated fatty acid metabolism), we speculate that C1 may act as a precursor population to C4.
To validate these results, we added spatial transcriptome and protein expression analysis of lymph node metastases in different patients. As a result, expression of the FASN, which was the molecule shown in Fig. 3f was confirmed in metastatic lymph nodes in both transcriptome and protein expression(Supplementary fig. 3). We also performed multiplex fluorescent immunostaining of EMT-associated molecules in tumor cells. As a result, we identified tumor cells co-expressing both epithelial markers and mesenchymal markers simultaneously(Supplementary Fig. 4). This observation in a patient of breast cancer demonstrates that a genuine EMT continuum does indeed exist.
Collectively, these findings suggest that hybrid EM cells undergo functional adaptation within the primary tumor, with C6 as a possible EMT origin, C2 as an intermediate, C3 and C5 representing divergent aggressive states, and C1 transitioning toward the epithelialized, proliferative C4. The important thing is that each hybrid EMT subtype has different functional capabilities and that these cells are present in the primary tumor from the very early stages of metastasis (Fig. 3h).
Analysis of the tumor microenvironment in the metastasized lymph node
The major bottleneck for cancer cells infiltrating a lymph node is the new tumor microenvironment (TME)51. To address this, we compared interactions between tumor cells and their surrounding TME in the TdL and the metastatic lymph node (Fig. 4a). Notably, we detected distinct interaction patterns specific to each TME (Fig. 4b). One interaction detected uniquely in the TdL was the CD45–MRC1 ligand–receptor interaction, occurring between CD45 on T cells and MRC1 on macrophages (Fig. 4c). MRC1, an endocytosis receptor belonging to the C-type lectin family, is expressed on cells like dendritic cells, macrophages, and endothelial cells52. It has been shown to inhibit CD45 activity on T cells via direct interaction, leading to upregulation of the immune checkpoint protein CTLA4 and induction of antigen-specific T cell tolerance53–55. Therefore, immune escape may begin through this mechanism in the TdL.
Metastatic lymph nodes uniquely exhibited coinhibitory ligand–receptor interactions, such as VEGFA–KDR and CD86–CTLA4, consistent with the promotion of angiogenesis and the establishment of a T cell–exhausted environment already formed at this stage (Fig. 4d). Among interactions involving collagen, cancer-associated fibroblasts (CAFs) exhibited the highest number of interactions (Supplementary Fig. 5). Representative interactions in metastatic lymph nodes included COL1A1–CD44 and COL4A1–CD44, both known to promote tumor proliferation and progression56,57. In the chemokine signaling category, the CXCL16–CXCR6 interaction was particularly prominent in metastatic lymph nodes (Supplementary Fig. 5). High CXCL16 expression is associated with histological malignancy and is implicated in tumor progression and metastasis via activation of the CXCL16–CXCR6 axis58. These TME-derived signals appear to constitute necessary conditions for successful metastasis.
Transcriptional profiling of hybrid EMT and its association with patient outcomes
Prompted by the preceding analyses, we sought to determine whether the transcriptional profiles we identified are associated with patient prognosis in a broader clinical context. To this end, we reanalyzed a previously reported scRNA-seq dataset from metastatic lymph nodes of patients with breast cancer4 (Fig. 5a). Further clustering of cancer cells from the dataset revealed five clusters (sc8,sc4,sc2,sc15 and sc10) (Fig. 5b). According to publicly available information, sc2,sc4 and sc8 were metastatic lymph node-derived samples, and sc15, sc10 was derived from the primary tumor. A metabolic shift toward fatty acids was observed in metastatic lymph node-derived samples(sc2,sc4, and sc8) (Fig. 5c). These observations are consistent with our results shown in Fig. 3 and support our findings that a metabolic shift toward fatty acid metabolism is necessary for the formation of metastatic colonies in lymph nodes. To investigate whether molecular subtype signatures derived from our Xenium data are associated with clinical outcomes, we evaluated their prognostic relevance using the METABRIC dataset59. Interestingly, the subtypes associated with poor prognosis were not clusters that formed metastatic colonies in lymph node (e.g., C4) or those showing strong mesenchymal features (C6) but rather clusters C5 and C3, which exhibited signatures of G2M cell cycle progression and increased glycolytic activity (Supplementary Fig. 6).
To validate this finding, we mapped the hybrid EM clusters identified in our Xenium analysis (C1–C6) to the corresponding cancer cell subclusters (sc2, sc4, sc8, sc10, and sc15) in the public scRNA-seq dataset. This revealed a one-to-one correspondence between C5 and sc8 and between C6 and sc15 (Fig. 5d). Consistent with our findings, only sc8 (corresponding to the C5 signature in our Xenium data) was significantly associated with poor prognosis [log-rank P = 2e−09, hazard ratio (HR) = 1.527, 95% confidence interval (CI) = 1.329–1.755] (Fig. 5e). In contrast, sc15, which displayed the highest mesenchymal programming and corresponded to the C6 signature in our Xenium data, tended to more favorable prognosis (log-rank P = 0.01, HR = 0.836, 95% CI = 0.727–0.961, Supplementary Fig. 7).
Multi-omics analysis in each breast cancer region
Before analyzing spatial patterns at the primary site and metastatic lymph node, we first examined the molecular features of the primary cancer via multiregional bulk analysis. The patient, an 80-year-old female (BRC-26), underwent surgery without neoadjuvant therapy. Her clinical diagnosis was stage II invasive breast carcinoma of the HER2 type, i.e., ER-negative, PgR-negative, and HER2-positive (for full clinicopathologic details, refer to Supplementary Table 1). For high-quality molecular analysis, especially RNA-seq, we freshly froze part of the tumor tissue harvested during surgery. We roughly dissected samples from histologically noncancerous regions (NC regions; NC-1 and NC-2) and cancerous regions (Ca regions; Ca-3 and Ca-4) (Fig. 1a and Supplementary Fig. 1a). Histologically, Ca-4 appeared sparser and less adhesive compared with Ca-3 (Supplementary Fig. 1b). We subjected these tissues to RNA-seq, enzymatic methylation sequencing (EM-seq), and whole-genome sequencing (WGS) (refer to Supplementary Table 2 for statistical analyses).
We first performed transcriptome analysis using RNA-seq to compare gene expression between the NC and Ca regions. Of 4,584 differentially expressed genes (DEGs), 2,248 and 2,336 were significantly upregulated and downregulated in the cancer region (Supplementary Fig. 1c). Gene pathway enrichment analysis of these DEGs showed that “cell adhesion” pathways were enriched in NC regions, whereas pathways for “carboxylic acid metabolism,” and “cell cycle” were enriched in Ca regions (Fig. 1b), suggesting progressive malignant transformation from NC to Ca regions21. DNA methylation analysis revealed more extensive genome-wide hypomethylation in Ca regions compared with NC regions (Fig. 1c). Furthermore, promoters, intergenic regions, and gene bodies all showed reduced methylation (Supplementary Fig. 1d). Hypomethylation of intergenic and intron regions, especially repetitive sequences, can promote chromosomal instability and mutations, potentially increasing cancer risk22–24. These results confirm that breast cells undergo malignant transformation as they progress from NC to Ca regions.
Within Ca regions, we observed significant changes in key metastasis-related genes (Fig. 1d). Expression of EMT markers, such as Vimentin25, and transcription factors Snail and ZEB2, considered known EMT regulators26,27, increased from Ca-3 to Ca-4. Correspondingly, DNA methylation levels declined from Ca-3 to Ca-4 (Supplementary Fig. 1e), consistent with previous reports that EMT-related genes are epigenetically regulated28–30. Similarly, methylation of the THY1 gene, which was highly expressed in Ca-4, decreased from Ca-3 to Ca-4 (Fig. 1e, f). THY1, a GPI-anchored protein involved in cell adhesion, migration, and polarity, can suppress lung metastasis when its integrin signaling is inhibited31,32. THY1, which regulates cancer cell migration and invasion, also appears epigenetically controlled. These molecular profiles indicate that malignant transformation from NC to Ca regions, along with DNA methylation differences within Ca regions, may underlie varying EMT phenotypes in tumor cells33,34.
WGS results further supported cancer progression from NC to Ca regions. We detected mutations in KMT2C (Y987H), the most frequently mutated histone methyltransferase in breast cancer35, across all regions (Fig. 1g). PIK3CA (H1047R), a known cancer driver, was mutated in all regions except NC-1. Mutations in transcription factor FOXA1 (R262P) and histone demethylase KDM6A (P281R) were specific to Ca regions (Fig. 1h). Copy number variation (CNV) analysis revealed chromosomal abnormalities in Ca regions (Fig. 1i). These findings suggest that tumor development occurred against a background of clonal mammary gland expansion36. Overall, multiregional analyses indicated that breast cancer follows a well-characterized molecular trajectory from NC to Ca regions. Importantly, these molecular features aligned closely with histological observations, especially regarding gene regulation.
Spatial analysis of breast cancer at single-cell resolution
To investigate the molecular mechanisms of metastasis in detail, we performed spatial gene expression analysis at single-cell resolution on primary tumors and paired axillary metastatic lymph nodes from the patient shown in Fig. 1. In addition to the primary tumor adjacent to the bulk analysis region and lymph node metastases from the same case, we included a tumor-draining lymph node (TdL) as a comparison specimen (Fig. 2a). The TdL was clinically diagnosed as metastasis-negative by rapid intraoperative evaluation.
We applied Xenium spatial gene expression profiling using a custom 380-gene breast cancer panel (refer to Supplementary Table 3 for the gene list). Across the primary tumor, metastatic lymph node, and TdL, Xenium detected 1,005,436 cells (Fig. 2b). Using gene expression signatures37–39 and histological location information, we annotated clusters by major cell types. The luminal cell cluster was subdivided into two clusters, (i) and (ii), with (i) further branching into four subclusters: L1: luminal 1; L2: luminal 2; B: basal; and M: mesenchymal-like (Fig. 2c).
We identified genes significantly upregulated within each cluster. KRT8 and KRT18, markers of mammary ductal epithelial cells, were enriched in L1 and L2, whereas KRT5 and KRT14, myoepithelial markers, were enriched in cluster B (Supplementary Fig. 2). Consistent with expression information, spatial localization of L1, L2, and B clusters matched the lobular architecture of the mammary gland (Fig. 2d). High ANKRD30A expression in L2 and elevated KIT and KRT23 expression in L1 aligned with two previously reported mature luminal cell types40: mature luminal cell 2 (KRT18+/ANKRD30A+) and mature luminal cell 1 (KRT18+/KRT23 + /KIT+). L1, L2, and B clustered on the lobules, whereas M cells were distributed sporadically throughout the tumor, suggesting reduced cell adhesion (Fig. 2e). Consistent with this, the M cluster showed increased expression of EMT markers, including THY1, an EMT-related gene shown in Fig. 1e. Collectively, these findings support a role for THY1 expression in EMT initiation from normal lobule cells (L1, L2, and B), potentially regulated through DNA methylation.
Although cluster (i) encompassed lobular and early EMT-stage cells, cluster (ii) localized pathologically to tumor regions and comprised definitive cancer cells. Strong ANKRD30A expression in cluster (ii) (Fig. 2f) suggested that L2 may be the tumor’s cell of origin40. Uniform manifold approximation and projection (UMAP) showed that metastatic lymph node tumor cells formed distinct clusters. Based on high ANKRD30A expression, we defined L2 as the tumor-originating cell and applied pseudotemporal trajectory analysis using Monocle 341. The inferred trajectory showed branching from L2 into (ii)-1, progressing to (ii)-2, and finally differentiating into metastatic tumor cells in lymph nodes (Fig. 2g). Taken together, these findings suggest a progression pathway in which cancer cells originate from L2, acquire EMT potential in the M cluster, and ultimately establish metastases within cluster (ii) in lymph nodes.
Spatial analysis of the lymph nodes reveals six transcriptionally distinct EMT states
When inspecting spatial gene expression data to further investigate the mechanisms underlying metastasis, we unexpectedly identified a small cluster of only 30 cells within the TdL, a site clinically diagnosed as metastasis-free; these cells spanned a region approximately 200 μm in diameter (Fig. 3a). Gene expression analysis revealed clear KRT19 expression, an epithelial marker absent in normal lymph node cells. Additional hematoxylin and eosin (HE) staining confirmed their morphology was consistent with cancer cells. The identification of these cells in the TdL provided a rare opportunity to examine molecular events at an extremely early stage of metastasis. This form of early dissemination is clinically referred to as isolated tumor cells (ITCs), a term we adopt hereafter.
When we visualized ITCs on the UMAP plane, we observed transcriptional heterogeneity (Fig. 3b, left panel), indicating that ITCs comprise several distinct cell types. To better characterize the biology of ITCs and the roles of different cell states in metastasis, we applied nonhierarchical clustering to subclassify tumor cells into six groups (C1–C6) and assigned each ITC to one of these clusters (Fig. 3b, right panel). Owing to the small number of ITCs detected, subsequent analyses focused on the clusters to which each ITC mapped. For each subpopulation associated with an ITC, we identified distinct signaling pathways linking histology, spatial location, and gene expression (Fig. 3c–e and Supplementary Table 4).
C6 occupied a position on the UMAP closest to EMT initiation (corresponding to the M cluster in Fig. 2c). Its scattered distribution within the primary tumor suggested low intercellular adhesion. C6 cells expressed high levels of ETM-related markers, such as VIM and CDH3, as well as stemness markers characteristic of breast cancer, including CD24low and CD44high (Fig. 3d). In contrast, epithelial marker expression was low, indicating a sustained mesenchymal phenotype. Gene enrichment analysis showed activation of EMT, stem cell signaling, TGF-β signaling, WNT signaling, immune checkpoint pathways, and antianoikis signaling (Fig. 3e). Signals from the microenvironment, such as TGF-β and Wnt ligands, induce various EMT-related transcription factors42,43. EMT also promotes PD-L1 expression, activating immune checkpoint pathways44,45. These findings suggest that C6 represents tumor cells in an extreme mesenchymal phenotype, capable of evading anoikis and immune clearance. In contrast, C4 most strongly retained epithelial features and was the most predominant cell population within lymph node metastases, forming the tumor mass (Fig. 3c). C4 cells showed enrichment of cholesterol homeostasis and fatty acid metabolism pathways, with upregulation of fatty acid metabolism–related molecules in metastatic lymph nodes (Fig. 3e, f). A previous mouse-based lymph node metastasis study demonstrated that a metabolic shift toward FAO in tumor cells is required for lymph node colonization by cancer cells8. Thus, C4 represents a cancer cell population that has acquired an extreme epithelial phenotype and metabolic reprogramming suited for metastatic growth.
Clusters C1, C2, C3, and C5 represented hybrid epithelial–mesenchymal (EM) cell populations that coexpressed epithelial and mesenchymal markers (Fig. 3d). The degree of coexpression differs among clusters, each cluster exhibited distinct features. C5 showed the strongest enrichment for the G2M checkpoint pathway (Fig. 3g). Prior studies have indicated that breast cancers with G2M activity display higher proliferative activity, increased MYC pathway activation, earlier metastasis, and worse survival46, suggesting that C5 may represent a more aggressive hybrid EMT subpopulation. C3 was enriched for VEGF signaling and glycolysis, likely reflecting a hypoxic environment in the tumor core that promotes angiogenesis and glycolysis47. C2, localized at the tumor periphery, i.e., the invasive front, displayed activation of matrix metalloproteinases (MMPs) and canonical Wnt signaling. Cells at tumor margins exhibit EM plasticity and migratory behavior48, with MMP signaling42 and the Wnt/β catenin pathway49 known to promote invasion. Canonical Wnt signaling was also enriched in C2 and hybrid EMT subpopulations (C3 and C6), consistent with reports that Wnt7A/B maintains the hybrid EM state12,48.
C1 showed activation of retinoid metabolism and lipid transport pathways. Retinoic acid activates retinoic acid receptors and retinoid X receptors, driving expression of fatty acid metabolism genes50. Considering that C1 gene expression aligns closely with C4 (the most significantly proliferating cluster in lymph node metastases, involving elevated fatty acid metabolism), we speculate that C1 may act as a precursor population to C4.
To validate these results, we added spatial transcriptome and protein expression analysis of lymph node metastases in different patients. As a result, expression of the FASN, which was the molecule shown in Fig. 3f was confirmed in metastatic lymph nodes in both transcriptome and protein expression(Supplementary fig. 3). We also performed multiplex fluorescent immunostaining of EMT-associated molecules in tumor cells. As a result, we identified tumor cells co-expressing both epithelial markers and mesenchymal markers simultaneously(Supplementary Fig. 4). This observation in a patient of breast cancer demonstrates that a genuine EMT continuum does indeed exist.
Collectively, these findings suggest that hybrid EM cells undergo functional adaptation within the primary tumor, with C6 as a possible EMT origin, C2 as an intermediate, C3 and C5 representing divergent aggressive states, and C1 transitioning toward the epithelialized, proliferative C4. The important thing is that each hybrid EMT subtype has different functional capabilities and that these cells are present in the primary tumor from the very early stages of metastasis (Fig. 3h).
Analysis of the tumor microenvironment in the metastasized lymph node
The major bottleneck for cancer cells infiltrating a lymph node is the new tumor microenvironment (TME)51. To address this, we compared interactions between tumor cells and their surrounding TME in the TdL and the metastatic lymph node (Fig. 4a). Notably, we detected distinct interaction patterns specific to each TME (Fig. 4b). One interaction detected uniquely in the TdL was the CD45–MRC1 ligand–receptor interaction, occurring between CD45 on T cells and MRC1 on macrophages (Fig. 4c). MRC1, an endocytosis receptor belonging to the C-type lectin family, is expressed on cells like dendritic cells, macrophages, and endothelial cells52. It has been shown to inhibit CD45 activity on T cells via direct interaction, leading to upregulation of the immune checkpoint protein CTLA4 and induction of antigen-specific T cell tolerance53–55. Therefore, immune escape may begin through this mechanism in the TdL.
Metastatic lymph nodes uniquely exhibited coinhibitory ligand–receptor interactions, such as VEGFA–KDR and CD86–CTLA4, consistent with the promotion of angiogenesis and the establishment of a T cell–exhausted environment already formed at this stage (Fig. 4d). Among interactions involving collagen, cancer-associated fibroblasts (CAFs) exhibited the highest number of interactions (Supplementary Fig. 5). Representative interactions in metastatic lymph nodes included COL1A1–CD44 and COL4A1–CD44, both known to promote tumor proliferation and progression56,57. In the chemokine signaling category, the CXCL16–CXCR6 interaction was particularly prominent in metastatic lymph nodes (Supplementary Fig. 5). High CXCL16 expression is associated with histological malignancy and is implicated in tumor progression and metastasis via activation of the CXCL16–CXCR6 axis58. These TME-derived signals appear to constitute necessary conditions for successful metastasis.
Transcriptional profiling of hybrid EMT and its association with patient outcomes
Prompted by the preceding analyses, we sought to determine whether the transcriptional profiles we identified are associated with patient prognosis in a broader clinical context. To this end, we reanalyzed a previously reported scRNA-seq dataset from metastatic lymph nodes of patients with breast cancer4 (Fig. 5a). Further clustering of cancer cells from the dataset revealed five clusters (sc8,sc4,sc2,sc15 and sc10) (Fig. 5b). According to publicly available information, sc2,sc4 and sc8 were metastatic lymph node-derived samples, and sc15, sc10 was derived from the primary tumor. A metabolic shift toward fatty acids was observed in metastatic lymph node-derived samples(sc2,sc4, and sc8) (Fig. 5c). These observations are consistent with our results shown in Fig. 3 and support our findings that a metabolic shift toward fatty acid metabolism is necessary for the formation of metastatic colonies in lymph nodes. To investigate whether molecular subtype signatures derived from our Xenium data are associated with clinical outcomes, we evaluated their prognostic relevance using the METABRIC dataset59. Interestingly, the subtypes associated with poor prognosis were not clusters that formed metastatic colonies in lymph node (e.g., C4) or those showing strong mesenchymal features (C6) but rather clusters C5 and C3, which exhibited signatures of G2M cell cycle progression and increased glycolytic activity (Supplementary Fig. 6).
To validate this finding, we mapped the hybrid EM clusters identified in our Xenium analysis (C1–C6) to the corresponding cancer cell subclusters (sc2, sc4, sc8, sc10, and sc15) in the public scRNA-seq dataset. This revealed a one-to-one correspondence between C5 and sc8 and between C6 and sc15 (Fig. 5d). Consistent with our findings, only sc8 (corresponding to the C5 signature in our Xenium data) was significantly associated with poor prognosis [log-rank P = 2e−09, hazard ratio (HR) = 1.527, 95% confidence interval (CI) = 1.329–1.755] (Fig. 5e). In contrast, sc15, which displayed the highest mesenchymal programming and corresponded to the C6 signature in our Xenium data, tended to more favorable prognosis (log-rank P = 0.01, HR = 0.836, 95% CI = 0.727–0.961, Supplementary Fig. 7).
Discussion
Discussion
In this study, we tracked the entire metastatic cascade of breast cancer at single-cell resolution and identified the relationship between the spatial evolution of tumor cells and dynamic heterogeneity within the EMT spectrum. Our capture of early-stage metastatic events, including exceptionally early dissemination to lymph nodes, provides a rare in vivo snapshot of EMT dynamics during initial colonization. Thus, we were able to uncover distinct subpopulations with specific biological roles and their potential clinical relevance.
Through detailed analysis of pioneering metastatic tumor cells within the metastatic cascade, we identified six distinct types of hybrid epithelial–mesenchymal (EM) cells that play key roles in the early stages of metastasis. The subpopulation that underwent a lipid metabolic shift successfully formed colonies in lymph nodes. This is consistent with previous studies indicating that FAO supports lymph node metastasis in breast cancer8. Interestingly, the subpopulation responsible for forming lymph node colonies that underwent a lipid metabolic shift, as well as the subpopulation with the most mesenchymal features, was not associated with poor clinical prognosis. This supports previous evidence that mesenchymal-like phenotypes may correlate with favorable outcomes in patients with breast cancer60. Instead, poor prognosis was linked to transcriptional programs associated with MYC and E2F signaling, along with increased aerobic glycolysis and G2/M cell cycle activity. In particular, MYC reprograms metabolism toward enhanced glycolytic flux61, and breast cancers with enriched G2M signaling have been associated with elevated MYC pathway activity, increased proliferation, early distant metastasis, and worse survival46. Collectively, these findings suggest that distinct EMT subpopulations drive different aspects of tumor progression: some mediate niche colonization, whereas others may drive aggressive metastasis and recurrence, implying the need for tailored therapeutic strategies11. A key question concerns how hybrid EM cells disseminate in lymph nodes: do metastases arise via the sequential model, where fully EMT-induced M cells colonize distant sites by MET, or via the cooperative model, in which mesenchymal cells support Epithelial cells, which serve as metastasis-initiating cells62? Both models are considered plausible based on prior studies. In the present study, the observation of multiple EMT states in ITCs at an extremely early metastatic stage (Fig. 3a), along with similar mixtures of hybrid EM cells in the primary tumor (Fig. 3h), supports the cooperative metastasis model, where different cell states collaborate to initiate metastasis.
We also examined the early interactions between metastatic tumor cells and the lymph node microenvironment. Comparisons between the TdL and metastatic lymph nodes showed that a coordinated, immunosuppressive environment forms around tumor cells in newly colonized niches. These environments are likely shaped, in part, by the metabolic phenotypes of the tumor cells themselves. For example, lactic acid, a byproduct of glycolysis, has been shown to promote alternative splicing in T cells and enhance CTLA-4 expression in a Foxp3-dependent manner63. The plasticity of EMT observed in this study may contribute to the ecosystem that supports metastasis outgrowth, including immunomodulatory effects64.
Despite the novel insights in this study, it has several limitations. First, our in-depth analysis was conducted on tissue from a single patient, limiting the generalizability of our conclusions. Second, owing to the limited number of cells available, we were unable to perform molecular or biochemical validations. However, by integrating our findings with prior knowledge and publicly available single-cell RNA-seq datasets, we were able to infer the likely roles of key subpopulations and validate their clinical relevance. On the other hand, it is important to note that prognostic validation is indirect. In this analysis, we attempted to correlate the observed gene expression modules with the patient outcomes.
We have to carefully evaluate the confounding factor when the detected prognostic signatures of the tumor cells at the single cell level should be applied to the signature of the bulk tumor tissues. Neoplastic heterogeneity as well as the non-malignant cell types, are represented in the latter profiles. Nevertheless, we believe this analysis has an important meaning, since the single-cell analysis at this scale is not easy. Finally, we could not definitively distinguish whether the observed metastasis occurred via hematogenous or lymphatic dissemination. Future studies should also aim to elucidate whether distinct molecular features exist between these metastatic pathways.
In this study, we tracked the entire metastatic cascade of breast cancer at single-cell resolution and identified the relationship between the spatial evolution of tumor cells and dynamic heterogeneity within the EMT spectrum. Our capture of early-stage metastatic events, including exceptionally early dissemination to lymph nodes, provides a rare in vivo snapshot of EMT dynamics during initial colonization. Thus, we were able to uncover distinct subpopulations with specific biological roles and their potential clinical relevance.
Through detailed analysis of pioneering metastatic tumor cells within the metastatic cascade, we identified six distinct types of hybrid epithelial–mesenchymal (EM) cells that play key roles in the early stages of metastasis. The subpopulation that underwent a lipid metabolic shift successfully formed colonies in lymph nodes. This is consistent with previous studies indicating that FAO supports lymph node metastasis in breast cancer8. Interestingly, the subpopulation responsible for forming lymph node colonies that underwent a lipid metabolic shift, as well as the subpopulation with the most mesenchymal features, was not associated with poor clinical prognosis. This supports previous evidence that mesenchymal-like phenotypes may correlate with favorable outcomes in patients with breast cancer60. Instead, poor prognosis was linked to transcriptional programs associated with MYC and E2F signaling, along with increased aerobic glycolysis and G2/M cell cycle activity. In particular, MYC reprograms metabolism toward enhanced glycolytic flux61, and breast cancers with enriched G2M signaling have been associated with elevated MYC pathway activity, increased proliferation, early distant metastasis, and worse survival46. Collectively, these findings suggest that distinct EMT subpopulations drive different aspects of tumor progression: some mediate niche colonization, whereas others may drive aggressive metastasis and recurrence, implying the need for tailored therapeutic strategies11. A key question concerns how hybrid EM cells disseminate in lymph nodes: do metastases arise via the sequential model, where fully EMT-induced M cells colonize distant sites by MET, or via the cooperative model, in which mesenchymal cells support Epithelial cells, which serve as metastasis-initiating cells62? Both models are considered plausible based on prior studies. In the present study, the observation of multiple EMT states in ITCs at an extremely early metastatic stage (Fig. 3a), along with similar mixtures of hybrid EM cells in the primary tumor (Fig. 3h), supports the cooperative metastasis model, where different cell states collaborate to initiate metastasis.
We also examined the early interactions between metastatic tumor cells and the lymph node microenvironment. Comparisons between the TdL and metastatic lymph nodes showed that a coordinated, immunosuppressive environment forms around tumor cells in newly colonized niches. These environments are likely shaped, in part, by the metabolic phenotypes of the tumor cells themselves. For example, lactic acid, a byproduct of glycolysis, has been shown to promote alternative splicing in T cells and enhance CTLA-4 expression in a Foxp3-dependent manner63. The plasticity of EMT observed in this study may contribute to the ecosystem that supports metastasis outgrowth, including immunomodulatory effects64.
Despite the novel insights in this study, it has several limitations. First, our in-depth analysis was conducted on tissue from a single patient, limiting the generalizability of our conclusions. Second, owing to the limited number of cells available, we were unable to perform molecular or biochemical validations. However, by integrating our findings with prior knowledge and publicly available single-cell RNA-seq datasets, we were able to infer the likely roles of key subpopulations and validate their clinical relevance. On the other hand, it is important to note that prognostic validation is indirect. In this analysis, we attempted to correlate the observed gene expression modules with the patient outcomes.
We have to carefully evaluate the confounding factor when the detected prognostic signatures of the tumor cells at the single cell level should be applied to the signature of the bulk tumor tissues. Neoplastic heterogeneity as well as the non-malignant cell types, are represented in the latter profiles. Nevertheless, we believe this analysis has an important meaning, since the single-cell analysis at this scale is not easy. Finally, we could not definitively distinguish whether the observed metastasis occurred via hematogenous or lymphatic dissemination. Future studies should also aim to elucidate whether distinct molecular features exist between these metastatic pathways.
Methods
Methods
Clinical sample
The analysis of tumor samples was performed in accordance with relevant national laws and recognized ethical guidelines (Declaration of Helsinki) for the protection of people participating in biomedical research. This study was approved by the Clinical Ethics Committee of St. Marianna University School of Medicine (approval number: 2297-i103). Informed consent was obtained from the patient. The surgical specimen was from an 80-year-old female patient who received surgery for breast cancer (BRC-26). This study is based on samples taken from surgical residues that were available after histopathological analyses and were not required for diagnosis. There is no interference with clinical practice. The patient had not received neoadjuvant or adjuvant chemotherapy (detailed clinicopathologic findings are provided in Supplementary Table 1). The same specimen was used in our previous study65. For multiregional, multi-omics analyses, fresh frozen samples of the primary tumor and adjacent normal tissue were used. Two nontumor regions (NC1 and NC2) and two tumor regions (Ca3, a solid tumor region, and Ca4, a sparse tumor region), as well as normal breast tissue from the same specimen (used as the control), were microdissected using the AVENIO MilliSect system (Roche, Pleasanton, CA) and subjected to multi-omics analysis, including RNA-seq, WGS, and EM-seq. For spatial analyses via Xenium, formalin-fixed paraffin-embedded (FFPE) samples of the primary tumor, lymph node metastases, and TdL were used.
RNA-seq
RNA-seq was conducted as previously described65. Briefly, total RNA was extracted from frozen normal breast tissue using the RNeasy Micro Kit (Qiagen). RNA-seq libraries were prepared using the SMART-Seq Stranded Kit (Takara Bio). Paired-end 150-bp sequencing was performed on the NovaSeq 6000 system (Illumina).
RNA-seq data analysis
RNA-seq data from the normal tissue obtained in this study and four tumor regions from our previous study65 were analyzed. Adapter trimming was performed using fastp (v0.23.2)66. Reads mapping to rRNA were removed using Bowtie 2 (v2.3.4.3)67. The retained reads were aligned to the human reference genome GRCh38.p12 using STAR (v2.7.5c)68. Gene-level read counts were obtained using featureCounts (v2.0.2)69, and RPKM values were calculated. DEGs between histological classes were identified using DESeq2 (v1.42.1)70 applying the Wald test with an adjusted P-value cutoff of <0.1. Gene enrichment analysis was performed using Metascape71.
Preparation of EM-seq and WGS libraries
EM-seq and WGS library preparations were conducted as described previously72. Briefly, gDNA was extracted from normal tissue using NucleoSpin Tissue XS (MACHEREY-NAGEL), and 100 ng of this gDNA was fragmented using the M220 Focused-ultrasonicator (Covaris). Adapter ligation was conducted using the NEBNext Enzymatic Methyl-seq Kit (New England BioLabs). Half of the adapter-ligated DNA was used for WGS library preparation (five cycles of PCR amplification), and the other half was employed for EM-seq library preparation, which included TET oxidation, APOBEC conversion, and PCR amplification (six cycles). Libraries were sequenced as paired-end 150-bp reads on the NovaSeq 6000 system (Illumina).
EM-seq data analysis
EM-seq data from the normal tissue obtained in this study and four tumor regions from our previous study were analyzed following established protocols65. Briefly, adapter trimming was conducted using Trim Galore (v0.6.4_dev; https://github.com/FelixKrueger/TrimGalore). Trimmed reads were aligned to the human reference genome using Bismark (v0.22.1)73. Duplicate reads were removed using deduplicate_bismark, and methylation information in a CpG context was extracted via bismark_methylation_extractor. Genome-wide methylation profiles, including patterns in CpG islands, CpG shores, and promoter regions, were visualized as described previously74. Differentially methylated regions (DMRs) among NC-1, NC-2, Ca-3, and Ca-4 and the broader NC and Ca regions were identified using metilene v0.2-875, as per our previous study65. Promotor methylation was analyzed among NC-1, NC-2, Ca-3, and Ca-4 spanning 1 kb upstream to 500 bp downstream of transcription start sites, focusing on DMRs with methylation differences >10%. Visualization was performed using IGV76.
Point mutation detection
WGS reads were aligned to the human reference genome using BWA-MEM (v0.7.17)77 with default settings. Duplicate reads were marked using Picard MarkDuplicates (v2.23.8) (https://broadinstitute.github.io/picard/). Somatic mutations were called using GATK Mutect2 and filtered with FilterMutectCalls (v4.1.3.0)78. Variants were annotated via ANNOVAR79.
CNV detection
CNVs in NC-1, NC-2, Ca-3, Ca-4, and normal samples were detected using FACETS (v0.6.2), as described previously74. Copy number gains and losses were defined as ≥4 copies and ≤1 copy, respectively.
Spatial heatmaps of gene expression levels and average methylation rates
Spatial heatmaps of gene expression levels and average methylation rates were generated using a custom workflow implemented on Nikon’s pilot analysis platform. Specifically, microscopic images (Olympus BX53, 2× objective) with substantial overlap were aligned using similarity transformation based on scale-invariant feature transform–detected feature point pairs. This alignment produced a composite image of the entire specimen with a common coordinate system. Manually dissected region contours were aligned to this coordinate system. Regions of interest were color-coded by gene expression levels or average methylation rates. The average methylation rate was calculated as the mean frequency of methylation at CpG sites covered by at least five reads within the predefined promoter regions.
In situ gene expression analysis via Xenium
Spatial subclonal analysis at single-cell resolution was performed using the Xenium Slides & Sample Prep Reagents (10× Genomics), a predesigned human breast panel, and a custom panel, as previously described80. Briefly, FFPE tissue sections (5 µm thickness) of the primary tumor, lymph node metastases, and TdL were mounted onto Xenium slides. Following deparaffinization and decrosslinking, probe hybridization, probe circularization, and rolling circle amplification were performed. Detection of amplified probes was conducted using the Xenium Analyzer (10× Genomics). In total, 380 target genes, comprising 280 from the predesigned panel and 100 from the custom panel, are listed in Supplementary Table 1.
Computational processing of Xenium in situ expression data and analysis
Raw output files from the Xenium Analyzer were processed using Seurat (v5.0.2)81. Data normalization was performed using the SCTransform method. Clustering analysis and UMAP visualization were conducted using the first 30 principal components, with the former performed via the FindNeighbors and FindClustering functions. The same parameters were applied to subclustering analyses (Figs. 2c and 3). For each cluster relevant to downstream analysis, DEGs were identified using the FindMarkers() function in Seurat with the default parameters (only.pos = TRUE, min.pct = 0.25, logfc.threshold = 0.25). Cluster annotation was based on previously published gene expression signatures37–39 and spatial histological context.
Trajectory analysis
Trajectory and pseudotime analysis of luminal cells was performed using Monocle 3 (v1.3.4)41. Preprocessing was performed using the preprocess_cds function, with the number of dimensions set to 100. Dimensionality reduction and clustering were implemented using the reduce_dimension and cluster_cells functions, respectively. Principal graph learning was conducted with the learn_graph function. Based on spatial and Xenium data, mature luminal cell 2 was designated as the trajectory root. Pseudotime ordering was visualized using the plot_cells function.
Identifying enriched gene signatures in Xenium clusters
To investigate the functional characteristics of Xenium clusters (C1–C6 in Fig. 3e), pathway and gene set enrichment analyses were performed using Metascape71 based on the top 50 DEGs per cluster (Supplementary Table 5). Enrichment was assessed across GO categories, including Biological Process, Cellular Component, and Molecular Function, as well as other biological pathways.
Analysis of intercellular communication networks
Intercellular communication within metastatic lymph nodes and the TdL was analyzed using the CellChat package (v1.6.1)82, which quantitatively identifies cell–cell interactions and communications. Following the official pipeline (https://github.com/jinworks/CellChat), analyses were performed using default settings. Statistically significant ligand–receptor interactions (P < 0.05) were extracted. Visualization of cell–cell interactions was achieved using circle plots, bar plots, heatmaps, and violin plots.
Multiplexed immunostaining by PhenoCycler
Multiplexed immunostaining was performed using the PhenoCycler system (Akoya Biosciences) according to the manufacturer’s instructions. Briefly, FFPE tissue sections were prepared at 5-µm thickness adjacent to those for Visium analysis and mounted onto the coverslip. The section was deparaffinized, and antigen activation was performed using pressure cooker for 20 min. Then, tissue sections were stained with 11 antibodies for 3 h (Supplementary Table 8). The section was washed, and the antibodies were fixed. Imaging analysis was conducted using the PhenoCycler instrument (Akoya Biosciences) andthe BZ-X810 fluorescence microscope (Keyence).
Computational processing of PhenoCycler multiplexed immunostaining data
Data processing was performed using the CODEX Processor (version 1.8). Visualization was performed using the obtained QPTIFF file by QuPath (version 0.3.2)83. Cell segmentation was performed using StarDist (QuPath StarDist extension, version 0.3.2)84 on the QuPath software. Pixels in the QPTIFF images encoded as 8-bit integers (0–255) were used as expression levels of each protein.
Published dataset processing
We downloaded a publicly available single-cell RNA-seq dataset (GSE180286 from Guan et al. 4). Data integration and clustering were performed in Seurat (v5.0.2)81, focusing on HER2-positive breast tumors and two lymph node samples (GSM5457205, GSM5457206, and GSM5457207). Louvain clustering at a resolution of 0.1 yielded nine clusters: one T cell cluster, three B cell clusters, one NK cell cluster, one CAF1 cluster, and one macrophage cluster. Clusters of the same cell type were merged (Fig. 5a). Subclustering was performed on the cancer cell populations (Fig. 5b).
Clinical validation using METABRIC data
Clinical validation was conducted using the METABRIC dataset59. METABRIC gene transcriptome data, as well as clinical and sample-level metadata, were downloaded from cBioPortal (https://www.cbioportal.org). Analysis was performed on 1,980 breast cancer cases with available survival outcomes. For each cancer cluster defined in Fig. 5c, the top 100 DEGs (Supplementary Table 6) were used to score individual METABRIC samples. ROC curve analysis was used to define optimal cutoffs for stratifying patients into high and low score groups. Disease-free survival was analyzed using the survival (v3.5-7) and survminer (v0.4.9) R packages. HRs and 95% CIs were calculated using Cox proportional hazards models. Statistical significance was evaluated via the log-rank test.
Statistical analysis
Statistical methods and tests are detailed in the figure legends. All analyses were performed using R (v4.3.2) or Python.
Clinical sample
The analysis of tumor samples was performed in accordance with relevant national laws and recognized ethical guidelines (Declaration of Helsinki) for the protection of people participating in biomedical research. This study was approved by the Clinical Ethics Committee of St. Marianna University School of Medicine (approval number: 2297-i103). Informed consent was obtained from the patient. The surgical specimen was from an 80-year-old female patient who received surgery for breast cancer (BRC-26). This study is based on samples taken from surgical residues that were available after histopathological analyses and were not required for diagnosis. There is no interference with clinical practice. The patient had not received neoadjuvant or adjuvant chemotherapy (detailed clinicopathologic findings are provided in Supplementary Table 1). The same specimen was used in our previous study65. For multiregional, multi-omics analyses, fresh frozen samples of the primary tumor and adjacent normal tissue were used. Two nontumor regions (NC1 and NC2) and two tumor regions (Ca3, a solid tumor region, and Ca4, a sparse tumor region), as well as normal breast tissue from the same specimen (used as the control), were microdissected using the AVENIO MilliSect system (Roche, Pleasanton, CA) and subjected to multi-omics analysis, including RNA-seq, WGS, and EM-seq. For spatial analyses via Xenium, formalin-fixed paraffin-embedded (FFPE) samples of the primary tumor, lymph node metastases, and TdL were used.
RNA-seq
RNA-seq was conducted as previously described65. Briefly, total RNA was extracted from frozen normal breast tissue using the RNeasy Micro Kit (Qiagen). RNA-seq libraries were prepared using the SMART-Seq Stranded Kit (Takara Bio). Paired-end 150-bp sequencing was performed on the NovaSeq 6000 system (Illumina).
RNA-seq data analysis
RNA-seq data from the normal tissue obtained in this study and four tumor regions from our previous study65 were analyzed. Adapter trimming was performed using fastp (v0.23.2)66. Reads mapping to rRNA were removed using Bowtie 2 (v2.3.4.3)67. The retained reads were aligned to the human reference genome GRCh38.p12 using STAR (v2.7.5c)68. Gene-level read counts were obtained using featureCounts (v2.0.2)69, and RPKM values were calculated. DEGs between histological classes were identified using DESeq2 (v1.42.1)70 applying the Wald test with an adjusted P-value cutoff of <0.1. Gene enrichment analysis was performed using Metascape71.
Preparation of EM-seq and WGS libraries
EM-seq and WGS library preparations were conducted as described previously72. Briefly, gDNA was extracted from normal tissue using NucleoSpin Tissue XS (MACHEREY-NAGEL), and 100 ng of this gDNA was fragmented using the M220 Focused-ultrasonicator (Covaris). Adapter ligation was conducted using the NEBNext Enzymatic Methyl-seq Kit (New England BioLabs). Half of the adapter-ligated DNA was used for WGS library preparation (five cycles of PCR amplification), and the other half was employed for EM-seq library preparation, which included TET oxidation, APOBEC conversion, and PCR amplification (six cycles). Libraries were sequenced as paired-end 150-bp reads on the NovaSeq 6000 system (Illumina).
EM-seq data analysis
EM-seq data from the normal tissue obtained in this study and four tumor regions from our previous study were analyzed following established protocols65. Briefly, adapter trimming was conducted using Trim Galore (v0.6.4_dev; https://github.com/FelixKrueger/TrimGalore). Trimmed reads were aligned to the human reference genome using Bismark (v0.22.1)73. Duplicate reads were removed using deduplicate_bismark, and methylation information in a CpG context was extracted via bismark_methylation_extractor. Genome-wide methylation profiles, including patterns in CpG islands, CpG shores, and promoter regions, were visualized as described previously74. Differentially methylated regions (DMRs) among NC-1, NC-2, Ca-3, and Ca-4 and the broader NC and Ca regions were identified using metilene v0.2-875, as per our previous study65. Promotor methylation was analyzed among NC-1, NC-2, Ca-3, and Ca-4 spanning 1 kb upstream to 500 bp downstream of transcription start sites, focusing on DMRs with methylation differences >10%. Visualization was performed using IGV76.
Point mutation detection
WGS reads were aligned to the human reference genome using BWA-MEM (v0.7.17)77 with default settings. Duplicate reads were marked using Picard MarkDuplicates (v2.23.8) (https://broadinstitute.github.io/picard/). Somatic mutations were called using GATK Mutect2 and filtered with FilterMutectCalls (v4.1.3.0)78. Variants were annotated via ANNOVAR79.
CNV detection
CNVs in NC-1, NC-2, Ca-3, Ca-4, and normal samples were detected using FACETS (v0.6.2), as described previously74. Copy number gains and losses were defined as ≥4 copies and ≤1 copy, respectively.
Spatial heatmaps of gene expression levels and average methylation rates
Spatial heatmaps of gene expression levels and average methylation rates were generated using a custom workflow implemented on Nikon’s pilot analysis platform. Specifically, microscopic images (Olympus BX53, 2× objective) with substantial overlap were aligned using similarity transformation based on scale-invariant feature transform–detected feature point pairs. This alignment produced a composite image of the entire specimen with a common coordinate system. Manually dissected region contours were aligned to this coordinate system. Regions of interest were color-coded by gene expression levels or average methylation rates. The average methylation rate was calculated as the mean frequency of methylation at CpG sites covered by at least five reads within the predefined promoter regions.
In situ gene expression analysis via Xenium
Spatial subclonal analysis at single-cell resolution was performed using the Xenium Slides & Sample Prep Reagents (10× Genomics), a predesigned human breast panel, and a custom panel, as previously described80. Briefly, FFPE tissue sections (5 µm thickness) of the primary tumor, lymph node metastases, and TdL were mounted onto Xenium slides. Following deparaffinization and decrosslinking, probe hybridization, probe circularization, and rolling circle amplification were performed. Detection of amplified probes was conducted using the Xenium Analyzer (10× Genomics). In total, 380 target genes, comprising 280 from the predesigned panel and 100 from the custom panel, are listed in Supplementary Table 1.
Computational processing of Xenium in situ expression data and analysis
Raw output files from the Xenium Analyzer were processed using Seurat (v5.0.2)81. Data normalization was performed using the SCTransform method. Clustering analysis and UMAP visualization were conducted using the first 30 principal components, with the former performed via the FindNeighbors and FindClustering functions. The same parameters were applied to subclustering analyses (Figs. 2c and 3). For each cluster relevant to downstream analysis, DEGs were identified using the FindMarkers() function in Seurat with the default parameters (only.pos = TRUE, min.pct = 0.25, logfc.threshold = 0.25). Cluster annotation was based on previously published gene expression signatures37–39 and spatial histological context.
Trajectory analysis
Trajectory and pseudotime analysis of luminal cells was performed using Monocle 3 (v1.3.4)41. Preprocessing was performed using the preprocess_cds function, with the number of dimensions set to 100. Dimensionality reduction and clustering were implemented using the reduce_dimension and cluster_cells functions, respectively. Principal graph learning was conducted with the learn_graph function. Based on spatial and Xenium data, mature luminal cell 2 was designated as the trajectory root. Pseudotime ordering was visualized using the plot_cells function.
Identifying enriched gene signatures in Xenium clusters
To investigate the functional characteristics of Xenium clusters (C1–C6 in Fig. 3e), pathway and gene set enrichment analyses were performed using Metascape71 based on the top 50 DEGs per cluster (Supplementary Table 5). Enrichment was assessed across GO categories, including Biological Process, Cellular Component, and Molecular Function, as well as other biological pathways.
Analysis of intercellular communication networks
Intercellular communication within metastatic lymph nodes and the TdL was analyzed using the CellChat package (v1.6.1)82, which quantitatively identifies cell–cell interactions and communications. Following the official pipeline (https://github.com/jinworks/CellChat), analyses were performed using default settings. Statistically significant ligand–receptor interactions (P < 0.05) were extracted. Visualization of cell–cell interactions was achieved using circle plots, bar plots, heatmaps, and violin plots.
Multiplexed immunostaining by PhenoCycler
Multiplexed immunostaining was performed using the PhenoCycler system (Akoya Biosciences) according to the manufacturer’s instructions. Briefly, FFPE tissue sections were prepared at 5-µm thickness adjacent to those for Visium analysis and mounted onto the coverslip. The section was deparaffinized, and antigen activation was performed using pressure cooker for 20 min. Then, tissue sections were stained with 11 antibodies for 3 h (Supplementary Table 8). The section was washed, and the antibodies were fixed. Imaging analysis was conducted using the PhenoCycler instrument (Akoya Biosciences) andthe BZ-X810 fluorescence microscope (Keyence).
Computational processing of PhenoCycler multiplexed immunostaining data
Data processing was performed using the CODEX Processor (version 1.8). Visualization was performed using the obtained QPTIFF file by QuPath (version 0.3.2)83. Cell segmentation was performed using StarDist (QuPath StarDist extension, version 0.3.2)84 on the QuPath software. Pixels in the QPTIFF images encoded as 8-bit integers (0–255) were used as expression levels of each protein.
Published dataset processing
We downloaded a publicly available single-cell RNA-seq dataset (GSE180286 from Guan et al. 4). Data integration and clustering were performed in Seurat (v5.0.2)81, focusing on HER2-positive breast tumors and two lymph node samples (GSM5457205, GSM5457206, and GSM5457207). Louvain clustering at a resolution of 0.1 yielded nine clusters: one T cell cluster, three B cell clusters, one NK cell cluster, one CAF1 cluster, and one macrophage cluster. Clusters of the same cell type were merged (Fig. 5a). Subclustering was performed on the cancer cell populations (Fig. 5b).
Clinical validation using METABRIC data
Clinical validation was conducted using the METABRIC dataset59. METABRIC gene transcriptome data, as well as clinical and sample-level metadata, were downloaded from cBioPortal (https://www.cbioportal.org). Analysis was performed on 1,980 breast cancer cases with available survival outcomes. For each cancer cluster defined in Fig. 5c, the top 100 DEGs (Supplementary Table 6) were used to score individual METABRIC samples. ROC curve analysis was used to define optimal cutoffs for stratifying patients into high and low score groups. Disease-free survival was analyzed using the survival (v3.5-7) and survminer (v0.4.9) R packages. HRs and 95% CIs were calculated using Cox proportional hazards models. Statistical significance was evaluated via the log-rank test.
Statistical analysis
Statistical methods and tests are detailed in the figure legends. All analyses were performed using R (v4.3.2) or Python.
Supplementary information
Supplementary information
출처: PubMed Central (JATS). 라이선스는 원 publisher 정책을 따릅니다 — 인용 시 원문을 표기해 주세요.