SpNeigh: spatial neighborhood and differential expression analysis for high-resolution spatial transcriptomics.
3/5 보강
OpenAlex 토픽 ·
Single-cell and spatial transcriptomics
Gene expression and cancer classification
Bioinformatics and Genomic Networks
Spatial transcriptomics technologies such as Xenium, MERFISH, and Visium HD enable high-resolution profiling of gene expression while preserving tissue architecture.
APA
Jinming Cheng, Pierce K. H. Chow, Nan Liu (2026). SpNeigh: spatial neighborhood and differential expression analysis for high-resolution spatial transcriptomics.. NAR genomics and bioinformatics, 8(2), lqag039. https://doi.org/10.1093/nargab/lqag039
MLA
Jinming Cheng, et al.. "SpNeigh: spatial neighborhood and differential expression analysis for high-resolution spatial transcriptomics.." NAR genomics and bioinformatics, vol. 8, no. 2, 2026, pp. lqag039.
PMID
41972009 ↗
Abstract 한글 요약
Spatial transcriptomics technologies such as Xenium, MERFISH, and Visium HD enable high-resolution profiling of gene expression while preserving tissue architecture. However, most computational methods for spatial analysis do not explicitly model local tissue context, such as boundaries, neighborhoods, or gradients. Here, we present SpNeigh (https://github.com/jinming-cheng/SpNeigh/), an R package for spatial neighborhood analysis and spatially aware differential expression modeling. SpNeigh includes tools for boundary detection, spatial neighborhood extraction, distance-based weighting, and gradient-based statistical testing. It supports both region-based differential expression and smooth spatial modeling using spline-based regression, along with a spatial enrichment index that identifies genes enriched near defined spatial features. We demonstrate the utility of SpNeigh across multiple platforms and tissues, including mouse brain, human breast cancer, and human liver, revealing intermediate populations at tissue interfaces, immune microenvironment differences, and spatially zonated gene expression patterns. SpNeigh offers a flexible and interpretable framework for dissecting spatial gene expression dynamics in complex tissues.
🏷️ 키워드 / MeSH 📖 같은 키워드 OA만
같은 제1저자의 인용 많은 논문 (5)
- Real-World Survival Outcomes of Patients With High PD-L1 Advanced NSCLC Who Received Chemoimmunotherapy Versus Immunotherapy.
- Hypoxia-induced acidic microenvironment alters cellular pharmacokinetics of gastric cancer in a Mongolian population.
- [Biological characteristics of p63 and the strategies and pitfalls in the application of pathological diagnosis of breast diseases].
- USP20, a Super-enhancer Regulated Gene, Promotes Acute Myeloid Leukemia Progression through CTNNB1 Deubiquitination.
- Association of different autoimmune diseases with gastric cancer incidence and risk: a global population-based analysis.
📖 전문 본문 읽기 PMC JATS · ~96 KB · 영문
Introduction
Introduction
Spatial transcriptomics technologies such as 10× Genomics Xenium [1], Vizgen MERFISH (MERSCOPE) [2], and 10× Visium HD [3] enable high-throughput measurement of spatially resolved gene expression across hundreds to thousands of genes, preserving spatial context within tissues. These platforms provide rich opportunities to investigate cellular organization, tissue structure, and microenvironmental interactions at unprecedented resolution. To analyze such data, a growing number of computational frameworks have been developed, including general-purpose toolkits such as Giotto [4] and Squidpy [5], which support spatial clustering, neighborhood graph construction, visualization, and exploratory analysis.
While these frameworks offer broad analytical capabilities, many spatial transcriptomics analyses ultimately rely on differential expression (DE) to detect transcriptional differences across spatial contexts. Existing DE methods for spatial data, such as SpatialDE [6] and SPARK [7], are primarily designed for the global detection of spatially variable genes (SVGs) using Gaussian process regression or spatial autocorrelation. However, they do not explicitly model local spatial contexts—such as boundaries between regions or proximity to specific tissue structures—which are critical for understanding tissue interfaces, transitional zones, and niche-specific gene regulation.
To address this gap, we developed SpNeigh, an R package that provides a spatially informed neighborhood analysis framework tailored for high-resolution spatial transcriptomics data. SpNeigh introduces a novel design that leverages boundary-aware and centroid-aware spatial weights to quantify local context, enabling downstream analyses such as differential expression between boundary-adjacent and interior cells, gradient-based modeling of gene expression along spatial gradients, and spatial enrichment scoring. The package integrates multiple components—including spatial boundary detection, neighborhood interaction analysis, and spatial weight computation—into a unified and reproducible workflow, with built-in visualization and statistical testing functionality.
Building on this framework, SpNeigh introduces a novel analytical approach that explicitly models spatial neighborhoods and structural boundaries. Unlike existing methods that treat spatial variation in a global or continuous manner, SpNeigh enables directional, region-aware comparisons by incorporating spatial geometry into gene expression modeling. This allows for more interpretable analyses of tissue organization and cellular interactions at boundaries and interfaces.
SpNeigh supports data from multiple spatial transcriptomics platforms, including Xenium, MERFISH, and Visium HD, and can be flexibly applied across a range of tissues and biological contexts. By modeling local spatial neighborhoods and continuous gradients, SpNeigh complements existing tools and provides a unified framework for dissecting region-specific transcriptional variation.
We demonstrate the utility of SpNeigh across diverse datasets, including mouse brain, human breast cancer, and healthy human liver. In each case, SpNeigh reveals spatially enriched genes and local transcriptional gradients that are not easily captured by global or unsupervised methods. Together, these results highlight the value of neighborhood-aware spatial modeling and establish SpNeigh as a versatile toolkit for spatial expression analysis.
Spatial transcriptomics technologies such as 10× Genomics Xenium [1], Vizgen MERFISH (MERSCOPE) [2], and 10× Visium HD [3] enable high-throughput measurement of spatially resolved gene expression across hundreds to thousands of genes, preserving spatial context within tissues. These platforms provide rich opportunities to investigate cellular organization, tissue structure, and microenvironmental interactions at unprecedented resolution. To analyze such data, a growing number of computational frameworks have been developed, including general-purpose toolkits such as Giotto [4] and Squidpy [5], which support spatial clustering, neighborhood graph construction, visualization, and exploratory analysis.
While these frameworks offer broad analytical capabilities, many spatial transcriptomics analyses ultimately rely on differential expression (DE) to detect transcriptional differences across spatial contexts. Existing DE methods for spatial data, such as SpatialDE [6] and SPARK [7], are primarily designed for the global detection of spatially variable genes (SVGs) using Gaussian process regression or spatial autocorrelation. However, they do not explicitly model local spatial contexts—such as boundaries between regions or proximity to specific tissue structures—which are critical for understanding tissue interfaces, transitional zones, and niche-specific gene regulation.
To address this gap, we developed SpNeigh, an R package that provides a spatially informed neighborhood analysis framework tailored for high-resolution spatial transcriptomics data. SpNeigh introduces a novel design that leverages boundary-aware and centroid-aware spatial weights to quantify local context, enabling downstream analyses such as differential expression between boundary-adjacent and interior cells, gradient-based modeling of gene expression along spatial gradients, and spatial enrichment scoring. The package integrates multiple components—including spatial boundary detection, neighborhood interaction analysis, and spatial weight computation—into a unified and reproducible workflow, with built-in visualization and statistical testing functionality.
Building on this framework, SpNeigh introduces a novel analytical approach that explicitly models spatial neighborhoods and structural boundaries. Unlike existing methods that treat spatial variation in a global or continuous manner, SpNeigh enables directional, region-aware comparisons by incorporating spatial geometry into gene expression modeling. This allows for more interpretable analyses of tissue organization and cellular interactions at boundaries and interfaces.
SpNeigh supports data from multiple spatial transcriptomics platforms, including Xenium, MERFISH, and Visium HD, and can be flexibly applied across a range of tissues and biological contexts. By modeling local spatial neighborhoods and continuous gradients, SpNeigh complements existing tools and provides a unified framework for dissecting region-specific transcriptional variation.
We demonstrate the utility of SpNeigh across diverse datasets, including mouse brain, human breast cancer, and healthy human liver. In each case, SpNeigh reveals spatially enriched genes and local transcriptional gradients that are not easily captured by global or unsupervised methods. Together, these results highlight the value of neighborhood-aware spatial modeling and establish SpNeigh as a versatile toolkit for spatial expression analysis.
Materials and methods
Materials and methods
SpNeigh framework for spatial neighborhood modeling
Spatial boundary detection and outlier removal
To identify the spatial boundaries of a cell population or cluster, we first remove spatial outliers based on local density. Let denote the spatial coordinate of cell . For each cell, we compute the average Euclidean distance to its nearest neighbors using the -nearest neighbor (k-NN) algorithm:
where denotes the -th nearest neighbor of cell , and is the mean k-NN distance. Cells with greater than a user-defined threshold are considered spatial outliers and removed.
After filtering, spatial boundaries are identified using the concave hull algorithm [8], which estimates a smoothed outer boundary encompassing each connected spatial region. If a cluster contains multiple disjoint regions, subregions can be detected using either DBSCAN [9] (density-based clustering) or -means. Each subregion is then processed independently to construct its boundary polygon.
The resulting boundaries are represented as a set of spatial polygons or LINESTRINGs, which are used for computing spatial weights, enrichment indices, or visualization.
To remove spatial outliers prior to boundary detection, we used a default of 5 for the k-NN search. This choice aims to capture immediate local neighborhoods while minimizing the influence of spurious or isolated cells. The optimal may vary depending on dataset resolution, cell density, and tissue architecture. We recommend adjusting this parameter based on the spatial scale and expected continuity of the cell populations.
Spatial neighborhood interaction matrix
To characterize spatial relationships between distinct cell clusters, we compute a neighborhood interaction matrix based on k-NN counts. For each cell, we identify its nearest spatial neighbors using Euclidean distance and record their cluster identities.
Let denote the set of all clusters, and define as the set of all cells belonging to cluster . Let be the set of the nearest neighbors of cell , and let denote the cluster identity of neighbor cell .
We define the spatial interaction matrix such that:
where is the indicator function, equal to 1 when the condition is true and 0 otherwise.
This matrix quantifies how often cells from cluster are spatially adjacent to cells from cluster in their local neighborhood. It can be visualized directly or row-normalized (e.g. Z-scored) to highlight enriched spatial associations.
For constructing the spatial neighborhood interaction matrix, we used a default of 10 for defining each cell’s nearest neighbors. This setting provides a balance between capturing sufficient local context and avoiding overly diffuse neighborhoods. As with outlier removal, the appropriate depends on the spatial resolution and structure of the tissue. Users are encouraged to tune this parameter based on their specific data and biological questions.
Spatial weights from boundaries and centroids
To model gene expression variation relative to spatial structures, we assign each cell a spatial weight based on its distance to a reference geometry such as a boundary or a centroid. Let denote the 2D spatial coordinate of cell .
For centroid-based weighting, we compute the centroid position of a selected cell population as:
where is the number of selected cells. The Euclidean distance between each cell and the centroid is
For boundary-based weighting, the shortest distance from cell to the nearest boundary location is
where denotes the set of all boundary points, and represents a specific boundary location.
To ensure comparability across regions of varying scale, we apply min-max normalization:
The spatial weight for cell is then computed using one of several decay functions applied to the normalized distance
where is a user-defined scale parameter used in the Gaussian kernel. These weights can be used directly in enrichment analysis or as input covariates in downstream modeling of gene expression.
Gradient-based spatial differential expression
To identify genes with expression patterns that vary smoothly along a spatial distance gradient, we modeled gene expression as a continuous function of spatial distance using a reparameterized spline basis, similar to approaches used in pseudotime trajectory analysis [10]. Let be a vector of spatial distances for cells, such as proximity to a boundary or centroid. We construct a natural cubic spline basis using the splines::ns() function with degrees of freedom (default ), yielding a design matrix .
An augmented matrix is formed by concatenating an intercept vector , the distance vector , and the spline basis. We then apply QR decomposition to orthonormalize the matrix (excluding the intercept), producing a final design matrix , with orthogonal and unit-norm columns. The first column of is adjusted to be positively correlated with , ensuring alignment with the primary gradient direction.
For each gene , expression is modeled as a linear combination:
where is the expression vector for gene , is the coefficient vector, and represents residual errors. Hypothesis testing is performed across all coefficients using the limma [11] framework.
To aid interpretability, we report the sign of the first coefficient as the primary trend direction. A positive value indicates increasing expression with distance, while a negative value indicates decreasing expression.
Spatial enrichment index
To quantify the spatial bias of a gene toward spatially defined regions (e.g. near boundaries or centroids), we compute a SEI for each gene. For gene and cell , let denote the expression level of gene in cell , and let denote the spatial weight assigned to cell . The SEI is defined as the weighted mean expression of gene across all cells
where is the unweighted mean expression of gene , is the number of cells, and is a small constant (e.g. ) added to avoid division by zero. The normalized SEI, , is used to compare spatial enrichment independently of absolute gene expression levels. Genes with large normalized SEI values are considered spatially enriched relative to the provided spatial weights.
Spatial transcriptomics and single-cell datasets
The following spatial transcriptomics and scRNA-seq datasets have been used in this study:
The raw mouse brain tiny Xenium dataset was downloaded from 10× Genomics dataset website: https://www.10xgenomics.com/datasets/fresh-frozen-mouse-brain-for-xenium-explorer-demo-1-standard.
The processed human breast cancer Xenium dataset (Sample 1) was obtained from Cheng et al. [12], and the raw data were from 10× Genomics dataset website: https://www.10xgenomics.com/products/xenium-in-situ/preview-dataset-human-breast [1].
The processed human healthy liver MERFISH dataset was downloaded from https://datadryad.org/stash/dataset/doi:10.5061/dryad.37pvmcvsg [13], and the hepatocytes were used in this study.
The processed mouse brain scRNA-seq dataset was downloaded from Gene Expression Omnibus with accession number of GSE185862 [14]. A subset of 10 000 cells was extracted from the processed scRNA-seq data as a reference for cell type annotation of mouse brain Xenium data.
The mouse brain Visium HD dataset was downloaded from 10× Genomics dataset website: https://www.10xgenomics.com/datasets/visium-hd-cytassist-gene-expression-mouse-brain-fresh-frozen.
Data preparation for SpNeigh
The mouse brain tiny Xenium dataset was processed using the Seurat (v5.1.0) pipeline [15]. Cells with fewer than 10 detected genes or fewer than 15 total counts were excluded as part of the quality control procedure. The filtered dataset was normalized using NormalizeData, followed by variance stabilization with ScaleData. Dimensionality reduction was performed via RunPCA and RunUMAP, and clustering was carried out using FindNeighbors and FindClusters.
To perform cell type annotation at the single-cell level, a small reference scRNA-seq dataset consisting of 10 000 cells was sampled from the mouse brain dataset published by Yao et al. [14], which contains ~1 million cells. During the sampling process, at least 50 cells were retained for each cell type to ensure sufficient representation of all annotated populations in the reference. Annotation of the Xenium dataset was then conducted using SingleR (v2.8.0) [16], and predicted cell labels were manually merged into broader categories following the taxonomy defined by Yao et al. to reduce annotation granularity.
The processed human breast cancer Xenium dataset from Cheng et al., including SingleR-based cell type annotations, was used directly for downstream analysis.
The human healthy liver MERFISH dataset underwent similar quality control, removing cells with low gene or UMI counts. After normalization with NormalizeData, author-annotated hepatocyte populations were extracted for downstream analysis.
For the mouse brain Visium HD dataset, a similar Seurat preprocessing and clustering pipeline was applied. The dataset was read using the Load10X_Spatial function with bin.size = 8. Clustering was performed at a resolution of 0.2.
For all datasets, the spatial coordinate data frame (with columns x, y, cell, and cluster) was extracted from the Seurat object using the ExtractCoords function from SpNeigh. The normalized expression matrix was retrieved using GetAssayData from Seurat. These coordinate and expression inputs were used throughout the SpNeigh analysis pipeline.
Analysis of mouse brain Xenium dataset using SpNeigh
To identify spatial boundaries for cluster 2 in the mouse brain Xenium dataset, we applied the GetBoundary function with a refined eps parameter of 120 to detect spatially coherent outlier-free contours. Boundary polygons were subsequently constructed from these points using the BuildBoundaryPoly function. To define the surrounding neighborhood, ring regions were generated using the GetRingRegion function, which by default applies a 100-unit buffer to the original boundary polygons via GetOuterBoundary.
Cells located within the cluster 2 boundaries or neighborhood rings were extracted using the GetCellsInside function. Cell composition statistics for these regions were computed with StatsCellsInside. Differential expression analysis between cluster 2 cells located inside the boundaries and those in the surrounding rings was conducted using RunLimmaDE.
To further dissect spatial structure, the primary boundary polygon of cluster 2 was segmented into individual edges using the SplitBoundaryPolyByAnchor function. Focusing on the cerebral cortex, spatial weights to a specific edge (edge 2) were computed for relevant cells using ComputeBoundaryWeights. Finally, spatial differential expression along this boundary gradient was assessed using the RunSpatialDE function.
Analysis of mouse brain Visium HD dataset using SpNeigh
To identify spatial boundaries for cluster 3 in the mouse brain Visium HD dataset, we applied the GetBoundary function with refined parameters (eps = 100, k = 100, and distance_cutoff = 200). The GetRingRegion function was then used to construct the neighborhood ring surrounding the primary boundary, with a buffer distance of dist = 1000. Cells located inside the primary boundary and within the corresponding neighborhood ring were extracted using GetCellsInside and used to examine spatial expression patterns of selected marker genes.
Benchmarking differential expression and spatial modeling in SpNeigh
To evaluate the performance of differential expression analysis in SpNeigh, we compared the RunLimmaDE function against the widely used FindAllMarkers function from the Seurat package. Both methods were applied to the mouse brain Xenium dataset using consistent parameter settings (min.pct = 0) to identify marker genes for each cell type based on cluster-level annotations. We recorded the total runtime for each method and computed the Spearman correlation between log fold changes (logFC) across all identified marker genes to assess the concordance between the two approaches.
The spatial modeling functions RunSpatialDE and ComputeSEI represent novel contributions introduced in SpNeigh, and are not directly comparable to existing methods. While SpatialDE [6] is a popular tool for identifying SVGs, it is primarily designed for unsupervised detection of broad spatial patterns and does not explicitly account for spatial boundaries or region-specific structures. Moreover, SpatialDE’s reliance on Gaussian process modeling introduces substantial computational overhead, limiting its scalability for large-scale datasets such as Xenium or MERFISH. In contrast, SpNeigh models spatial gradients directly using user-defined spatial weights derived from boundaries or centroids, enabling region-aware and computationally efficient analysis.
We assessed the concordance between RunSpatialDE and ComputeSEI by analyzing cluster 0 cells from the mouse brain Xenium dataset. Spatial weights were computed based on both boundary and centroid distances. Genes were ranked by Z coefficients from RunSpatialDE and by normalized SEI from ComputeSEI. Spearman correlation was used to quantify agreement between rankings. Additionally, top-ranked genes were manually inspected to confirm spatial enrichment patterns and biological relevance. As RunSpatialDE shares the same limma-based modeling framework as RunLimmaDE, its runtime was not reported separately due to similar computational efficiency.
Analysis of human breast cancer Xenium dataset using SpNeigh
Neighborhood analysis of tumor and DCIS regions
To characterize the local neighborhoods of tumor and DCIS in the human breast cancer dataset, boundaries were computed for all spatially distinct tumor and DCIS regions using the GetBoundary function. Neighborhood ring regions surrounding the tumor and DCIS were then generated using GetRingRegion, which constructs outer boundaries at a specified distance from the original boundaries. Cells located within these ring regions were extracted with GetCellsInside, and their cell type composition was quantified using StatsCellsInside. Donut plots were generated with PlotStatsPie (plot_donut = TRUE) to visualize the proportion of each cell type in the tumor and DCIS neighborhoods. Additionally, Student’s t-test was used to assess statistically significant differences in the proportion of each cell type between the tumor and DCIS ring regions.
To quantify cell–cell interactions within these local neighborhoods, the ComputeSpatialInteractionMatrix function was applied to the ring-localized cells, generating a matrix of interactions between focal clusters and their neighboring clusters. This matrix was row-scaled and visualized using PlotInteractionMatrix.
Differential expression and spatial modeling of tumor and DCIS cells
To explore molecular differences between DCIS cells in distinct spatial locations, differential expression analysis was performed using RunLimmaDE to compare DCIS cells located inside versus outside of tumor boundaries.
To further investigate intra-tumoral heterogeneity and spatial gradients in gene expression, ComputeCentroidWeights was used to compute spatial weights based on proximity to the tumor centroid. These weights were used as a covariate in RunSpatialDE to identify genes with spatially varying expression patterns along the centroid-based gradient. Differentially expressed genes were ranked by adjusted P-values. In parallel, ComputeSEI was applied to the same cells using centroid weights to identify spatially enriched genes, which were ranked based on normalized SEI scores.
Analysis of human healthy liver MERFISH dataset using SpNeigh
To investigate spatially varying gene expression along liver zonation gradients, boundaries were first identified for hepatocyte populations corresponding to the periportal, mid-lobular, and pericentral zones using the GetBoundary function. Spatial weights reflecting proximity to each zone-specific boundary were then computed with ComputeBoundaryWeights. These weights were used as input to RunSpatialDE, which models gene expression as a smooth function of spatial distance using natural splines. To assess the strength and direction of spatial variation, Spearman correlation coefficients were calculated between normalized gene expression and spatial weights for the top-ranked genes identified by RunSpatialDE. Finally, PlotSpatialExpression was used to visualize the average expression trends of selected genes along the spatial gradients.
SpNeigh framework for spatial neighborhood modeling
Spatial boundary detection and outlier removal
To identify the spatial boundaries of a cell population or cluster, we first remove spatial outliers based on local density. Let denote the spatial coordinate of cell . For each cell, we compute the average Euclidean distance to its nearest neighbors using the -nearest neighbor (k-NN) algorithm:
where denotes the -th nearest neighbor of cell , and is the mean k-NN distance. Cells with greater than a user-defined threshold are considered spatial outliers and removed.
After filtering, spatial boundaries are identified using the concave hull algorithm [8], which estimates a smoothed outer boundary encompassing each connected spatial region. If a cluster contains multiple disjoint regions, subregions can be detected using either DBSCAN [9] (density-based clustering) or -means. Each subregion is then processed independently to construct its boundary polygon.
The resulting boundaries are represented as a set of spatial polygons or LINESTRINGs, which are used for computing spatial weights, enrichment indices, or visualization.
To remove spatial outliers prior to boundary detection, we used a default of 5 for the k-NN search. This choice aims to capture immediate local neighborhoods while minimizing the influence of spurious or isolated cells. The optimal may vary depending on dataset resolution, cell density, and tissue architecture. We recommend adjusting this parameter based on the spatial scale and expected continuity of the cell populations.
Spatial neighborhood interaction matrix
To characterize spatial relationships between distinct cell clusters, we compute a neighborhood interaction matrix based on k-NN counts. For each cell, we identify its nearest spatial neighbors using Euclidean distance and record their cluster identities.
Let denote the set of all clusters, and define as the set of all cells belonging to cluster . Let be the set of the nearest neighbors of cell , and let denote the cluster identity of neighbor cell .
We define the spatial interaction matrix such that:
where is the indicator function, equal to 1 when the condition is true and 0 otherwise.
This matrix quantifies how often cells from cluster are spatially adjacent to cells from cluster in their local neighborhood. It can be visualized directly or row-normalized (e.g. Z-scored) to highlight enriched spatial associations.
For constructing the spatial neighborhood interaction matrix, we used a default of 10 for defining each cell’s nearest neighbors. This setting provides a balance between capturing sufficient local context and avoiding overly diffuse neighborhoods. As with outlier removal, the appropriate depends on the spatial resolution and structure of the tissue. Users are encouraged to tune this parameter based on their specific data and biological questions.
Spatial weights from boundaries and centroids
To model gene expression variation relative to spatial structures, we assign each cell a spatial weight based on its distance to a reference geometry such as a boundary or a centroid. Let denote the 2D spatial coordinate of cell .
For centroid-based weighting, we compute the centroid position of a selected cell population as:
where is the number of selected cells. The Euclidean distance between each cell and the centroid is
For boundary-based weighting, the shortest distance from cell to the nearest boundary location is
where denotes the set of all boundary points, and represents a specific boundary location.
To ensure comparability across regions of varying scale, we apply min-max normalization:
The spatial weight for cell is then computed using one of several decay functions applied to the normalized distance
where is a user-defined scale parameter used in the Gaussian kernel. These weights can be used directly in enrichment analysis or as input covariates in downstream modeling of gene expression.
Gradient-based spatial differential expression
To identify genes with expression patterns that vary smoothly along a spatial distance gradient, we modeled gene expression as a continuous function of spatial distance using a reparameterized spline basis, similar to approaches used in pseudotime trajectory analysis [10]. Let be a vector of spatial distances for cells, such as proximity to a boundary or centroid. We construct a natural cubic spline basis using the splines::ns() function with degrees of freedom (default ), yielding a design matrix .
An augmented matrix is formed by concatenating an intercept vector , the distance vector , and the spline basis. We then apply QR decomposition to orthonormalize the matrix (excluding the intercept), producing a final design matrix , with orthogonal and unit-norm columns. The first column of is adjusted to be positively correlated with , ensuring alignment with the primary gradient direction.
For each gene , expression is modeled as a linear combination:
where is the expression vector for gene , is the coefficient vector, and represents residual errors. Hypothesis testing is performed across all coefficients using the limma [11] framework.
To aid interpretability, we report the sign of the first coefficient as the primary trend direction. A positive value indicates increasing expression with distance, while a negative value indicates decreasing expression.
Spatial enrichment index
To quantify the spatial bias of a gene toward spatially defined regions (e.g. near boundaries or centroids), we compute a SEI for each gene. For gene and cell , let denote the expression level of gene in cell , and let denote the spatial weight assigned to cell . The SEI is defined as the weighted mean expression of gene across all cells
where is the unweighted mean expression of gene , is the number of cells, and is a small constant (e.g. ) added to avoid division by zero. The normalized SEI, , is used to compare spatial enrichment independently of absolute gene expression levels. Genes with large normalized SEI values are considered spatially enriched relative to the provided spatial weights.
Spatial transcriptomics and single-cell datasets
The following spatial transcriptomics and scRNA-seq datasets have been used in this study:
The raw mouse brain tiny Xenium dataset was downloaded from 10× Genomics dataset website: https://www.10xgenomics.com/datasets/fresh-frozen-mouse-brain-for-xenium-explorer-demo-1-standard.
The processed human breast cancer Xenium dataset (Sample 1) was obtained from Cheng et al. [12], and the raw data were from 10× Genomics dataset website: https://www.10xgenomics.com/products/xenium-in-situ/preview-dataset-human-breast [1].
The processed human healthy liver MERFISH dataset was downloaded from https://datadryad.org/stash/dataset/doi:10.5061/dryad.37pvmcvsg [13], and the hepatocytes were used in this study.
The processed mouse brain scRNA-seq dataset was downloaded from Gene Expression Omnibus with accession number of GSE185862 [14]. A subset of 10 000 cells was extracted from the processed scRNA-seq data as a reference for cell type annotation of mouse brain Xenium data.
The mouse brain Visium HD dataset was downloaded from 10× Genomics dataset website: https://www.10xgenomics.com/datasets/visium-hd-cytassist-gene-expression-mouse-brain-fresh-frozen.
Data preparation for SpNeigh
The mouse brain tiny Xenium dataset was processed using the Seurat (v5.1.0) pipeline [15]. Cells with fewer than 10 detected genes or fewer than 15 total counts were excluded as part of the quality control procedure. The filtered dataset was normalized using NormalizeData, followed by variance stabilization with ScaleData. Dimensionality reduction was performed via RunPCA and RunUMAP, and clustering was carried out using FindNeighbors and FindClusters.
To perform cell type annotation at the single-cell level, a small reference scRNA-seq dataset consisting of 10 000 cells was sampled from the mouse brain dataset published by Yao et al. [14], which contains ~1 million cells. During the sampling process, at least 50 cells were retained for each cell type to ensure sufficient representation of all annotated populations in the reference. Annotation of the Xenium dataset was then conducted using SingleR (v2.8.0) [16], and predicted cell labels were manually merged into broader categories following the taxonomy defined by Yao et al. to reduce annotation granularity.
The processed human breast cancer Xenium dataset from Cheng et al., including SingleR-based cell type annotations, was used directly for downstream analysis.
The human healthy liver MERFISH dataset underwent similar quality control, removing cells with low gene or UMI counts. After normalization with NormalizeData, author-annotated hepatocyte populations were extracted for downstream analysis.
For the mouse brain Visium HD dataset, a similar Seurat preprocessing and clustering pipeline was applied. The dataset was read using the Load10X_Spatial function with bin.size = 8. Clustering was performed at a resolution of 0.2.
For all datasets, the spatial coordinate data frame (with columns x, y, cell, and cluster) was extracted from the Seurat object using the ExtractCoords function from SpNeigh. The normalized expression matrix was retrieved using GetAssayData from Seurat. These coordinate and expression inputs were used throughout the SpNeigh analysis pipeline.
Analysis of mouse brain Xenium dataset using SpNeigh
To identify spatial boundaries for cluster 2 in the mouse brain Xenium dataset, we applied the GetBoundary function with a refined eps parameter of 120 to detect spatially coherent outlier-free contours. Boundary polygons were subsequently constructed from these points using the BuildBoundaryPoly function. To define the surrounding neighborhood, ring regions were generated using the GetRingRegion function, which by default applies a 100-unit buffer to the original boundary polygons via GetOuterBoundary.
Cells located within the cluster 2 boundaries or neighborhood rings were extracted using the GetCellsInside function. Cell composition statistics for these regions were computed with StatsCellsInside. Differential expression analysis between cluster 2 cells located inside the boundaries and those in the surrounding rings was conducted using RunLimmaDE.
To further dissect spatial structure, the primary boundary polygon of cluster 2 was segmented into individual edges using the SplitBoundaryPolyByAnchor function. Focusing on the cerebral cortex, spatial weights to a specific edge (edge 2) were computed for relevant cells using ComputeBoundaryWeights. Finally, spatial differential expression along this boundary gradient was assessed using the RunSpatialDE function.
Analysis of mouse brain Visium HD dataset using SpNeigh
To identify spatial boundaries for cluster 3 in the mouse brain Visium HD dataset, we applied the GetBoundary function with refined parameters (eps = 100, k = 100, and distance_cutoff = 200). The GetRingRegion function was then used to construct the neighborhood ring surrounding the primary boundary, with a buffer distance of dist = 1000. Cells located inside the primary boundary and within the corresponding neighborhood ring were extracted using GetCellsInside and used to examine spatial expression patterns of selected marker genes.
Benchmarking differential expression and spatial modeling in SpNeigh
To evaluate the performance of differential expression analysis in SpNeigh, we compared the RunLimmaDE function against the widely used FindAllMarkers function from the Seurat package. Both methods were applied to the mouse brain Xenium dataset using consistent parameter settings (min.pct = 0) to identify marker genes for each cell type based on cluster-level annotations. We recorded the total runtime for each method and computed the Spearman correlation between log fold changes (logFC) across all identified marker genes to assess the concordance between the two approaches.
The spatial modeling functions RunSpatialDE and ComputeSEI represent novel contributions introduced in SpNeigh, and are not directly comparable to existing methods. While SpatialDE [6] is a popular tool for identifying SVGs, it is primarily designed for unsupervised detection of broad spatial patterns and does not explicitly account for spatial boundaries or region-specific structures. Moreover, SpatialDE’s reliance on Gaussian process modeling introduces substantial computational overhead, limiting its scalability for large-scale datasets such as Xenium or MERFISH. In contrast, SpNeigh models spatial gradients directly using user-defined spatial weights derived from boundaries or centroids, enabling region-aware and computationally efficient analysis.
We assessed the concordance between RunSpatialDE and ComputeSEI by analyzing cluster 0 cells from the mouse brain Xenium dataset. Spatial weights were computed based on both boundary and centroid distances. Genes were ranked by Z coefficients from RunSpatialDE and by normalized SEI from ComputeSEI. Spearman correlation was used to quantify agreement between rankings. Additionally, top-ranked genes were manually inspected to confirm spatial enrichment patterns and biological relevance. As RunSpatialDE shares the same limma-based modeling framework as RunLimmaDE, its runtime was not reported separately due to similar computational efficiency.
Analysis of human breast cancer Xenium dataset using SpNeigh
Neighborhood analysis of tumor and DCIS regions
To characterize the local neighborhoods of tumor and DCIS in the human breast cancer dataset, boundaries were computed for all spatially distinct tumor and DCIS regions using the GetBoundary function. Neighborhood ring regions surrounding the tumor and DCIS were then generated using GetRingRegion, which constructs outer boundaries at a specified distance from the original boundaries. Cells located within these ring regions were extracted with GetCellsInside, and their cell type composition was quantified using StatsCellsInside. Donut plots were generated with PlotStatsPie (plot_donut = TRUE) to visualize the proportion of each cell type in the tumor and DCIS neighborhoods. Additionally, Student’s t-test was used to assess statistically significant differences in the proportion of each cell type between the tumor and DCIS ring regions.
To quantify cell–cell interactions within these local neighborhoods, the ComputeSpatialInteractionMatrix function was applied to the ring-localized cells, generating a matrix of interactions between focal clusters and their neighboring clusters. This matrix was row-scaled and visualized using PlotInteractionMatrix.
Differential expression and spatial modeling of tumor and DCIS cells
To explore molecular differences between DCIS cells in distinct spatial locations, differential expression analysis was performed using RunLimmaDE to compare DCIS cells located inside versus outside of tumor boundaries.
To further investigate intra-tumoral heterogeneity and spatial gradients in gene expression, ComputeCentroidWeights was used to compute spatial weights based on proximity to the tumor centroid. These weights were used as a covariate in RunSpatialDE to identify genes with spatially varying expression patterns along the centroid-based gradient. Differentially expressed genes were ranked by adjusted P-values. In parallel, ComputeSEI was applied to the same cells using centroid weights to identify spatially enriched genes, which were ranked based on normalized SEI scores.
Analysis of human healthy liver MERFISH dataset using SpNeigh
To investigate spatially varying gene expression along liver zonation gradients, boundaries were first identified for hepatocyte populations corresponding to the periportal, mid-lobular, and pericentral zones using the GetBoundary function. Spatial weights reflecting proximity to each zone-specific boundary were then computed with ComputeBoundaryWeights. These weights were used as input to RunSpatialDE, which models gene expression as a smooth function of spatial distance using natural splines. To assess the strength and direction of spatial variation, Spearman correlation coefficients were calculated between normalized gene expression and spatial weights for the top-ranked genes identified by RunSpatialDE. Finally, PlotSpatialExpression was used to visualize the average expression trends of selected genes along the spatial gradients.
Results
Results
Overview of the SpNeigh framework
To characterize local tissue architecture and spatial gene expression patterns, we developed SpNeigh, a modular R package for spatial neighborhood analysis and spatially aware differential expression analysis. The SpNeigh workflow consists of five core components (Fig. 1): (i) data input, (ii) boundary detection and neighborhood construction, (iii) spatial weighting, (iv) neighborhood interaction analysis, and (v) spatial differential and enrichment analysis.
Input data.SpNeigh operates on a coordinate data frame containing the x, y, cell, and cluster columns, together with a normalized gene expression matrix or dgCMatrix. These inputs can be extracted from commonly used spatial transcriptomics platforms, including 10× Genomics Xenium, Vizgen MERFISH, and 10× Visium HD..
Boundary detection and neighborhood construction. Spatial boundaries are identified for a specified cell population using clustering and density-based outlier removal. Optional buffer regions can be constructed around these boundaries to define concentric neighborhood rings, enabling comparative analysis between intra-region and peri-region cells.
Spatial weighting. For cells within a region, SpNeigh computes spatial weights based on distance to either the region boundary or the population centroid. These weights are scaled between 0 and 1, with higher values indicating closer proximity to the spatial feature of interest. Multiple decay schemes (e.g. inverse, Gaussian) are supported.
Neighborhood analysis. Cells located in neighborhood rings are extracted, and their composition is visualized using pie charts and bar plots. A spatial interaction matrix is computed by counting the nearest neighbors for each focal cluster and tabulating the frequencies of neighboring cluster identities.
Downstream spatial modeling. SpNeigh supports differential expression testing between cells inside and outside a region via RunLimmaDE, which leverages the widely used limma framework for linear modeling and empirical Bayes moderation [11]. Spatial expression gradients are modeled using RunSpatialDE, which fits gene expression as a smooth function of spatial weights using a spline-based linear model, also implemented within the limma framework. In parallel, spatial enrichment is quantified using the spatial enrichment index (SEI), which measures the weighted average expression of each gene relative to spatial proximity.
Together, these components provide a flexible and interpretable framework for dissecting fine-grained spatial organization and gene expression dynamics in diverse tissues and technologies. For convenience, several functions such as boundary detection and spatial plotting support direct input from Seurat and SpatialExperiment objects, though core modeling functions require explicit expression matrices and spatial weights.
To our knowledge, SpNeigh is the first method to combine boundary detection with directional distance-based weighting and spline-based spatial differential expression modeling in a unified framework. These components enable biologically meaningful comparisons of cells located near tissue interfaces or across spatial gradients.
SpNeigh reveals intermediate cell populations at brain region boundaries in mouse cortex
The mouse brain is anatomically well-characterized and widely used for benchmarking spatial transcriptomics tools, with extensive references available for cell type classification and regional annotations [17–19]. To demonstrate the utility of boundary-aware analysis in SpNeigh, we analyzed a publicly available mouse brain Xenium dataset (tiny version) from 10× Genomics. After quality control filtering to remove low-quality cells (total count or gene count thresholds), 36 402 high-quality cells remained out of 36 602 total. The Seurat [15] pipeline was applied for preprocessing and clustering, resulting in 12 clusters at a resolution of 0.1. These clusters were manually annotated based on known brain anatomical regions. To refine cellular identity, we also performed cell-level annotation using SingleR [16], referencing a curated subset of the large-scale mouse brain single-cell RNA sequencing (scRNA-seq) dataset from Yao et al. [14]. The resulting SingleR annotations (Supplementary Fig. S1a) were manually merged based on known taxonomy to reduce redundancy (see Supplementary Table S1). The clustering, region-level, and cell-level annotations are shown in Fig. 2a.
We focused our initial analysis on cluster 2, corresponding to the white matter region dominated by oligodendrocytes. This area connects the outer cerebral cortex to deeper brain regions and lies between the cortex and hippocampus. Using SpNeigh, two spatial boundaries and their corresponding neighborhood ring regions were detected for cluster 2. Cells within the primary (largest) boundary and surrounding ring were extracted for comparison (Fig. 2b). Inside the boundary, 70% of cells belonged to cluster 2, with clusters 10 and 0 contributing 13% and 10%, respectively. In contrast, within the neighborhood ring, cluster 4 accounted for 33% of cells, followed by clusters 2 (21%), 3 (16%), and 0 (15%). These distinct neighborhood compositions suggest that cluster 2 cells located within the boundary and in the surrounding ring may reside in different microenvironments.
To investigate this difference, we applied RunLimmaDE to compare cluster 2 cells inside the boundary versus those in the ring. We found that the cortical marker gene Slc17a7 was highly expressed in cluster 2 cells within the neighborhood ring but nearly absent within the boundary (Fig. 2c and Supplementary Fig. S1b). In contrast, the oligodendrocyte marker gene Sox10 was consistently expressed in both groups. This suggests that cluster 2 cells in the ring may represent intermediate cells at the interface between cortical neurons and white matter oligodendrocytes. We further examined the expression patterns of Slc17a7 and Sox10 in an independent mouse brain Visium HD dataset. Cells located near the boundary between cortical neurons and white matter oligodendrocytes exhibited concurrent expression of both Slc17a7 and Sox10 (Supplementary Fig. S2). This observation is consistent with the results obtained from the mouse brain Xenium dataset and supports the presence of transcriptionally intermediate cell populations at the cortex–white matter interface.
Given the layered architecture of the cortex, we next explored expression gradients in cerebral cortex cells adjacent to the white matter. The primary boundary of cluster 2 was split into two directional edges; edge 2 delineates the boundary between cortex and white matter (Fig. 2d). Spatial weights relative to edge 2 were computed and used as input for gradient modeling with RunSpatialDE (Fig. 2e). A heatmap of the top 10 spatially varying genes showed distinct expression trends along this axis (Fig. 2f). Notably, Ccn2 (also known as Ctgf) and Cplx3 were enriched near edge 2, consistent with prior studies identifying these genes as markers of layer 6b (L6b)—the deepest cortical layer [14, 20–22]. Both genes exhibited strong expression in L6b cells, with intermediate levels in neighboring L4/5/6 neurons, astrocytes, and oligodendrocytes (Fig. 2g and Supplementary Fig. S1c), further supporting the presence of transitional or mixed-identity cells near the white matter boundary.
These findings highlight the ability of SpNeigh to detect spatial gene expression gradients and reveal boundary-associated intermediate populations that may otherwise be overlooked. This boundary-aware modeling is especially valuable for understanding the fine-grained organization and developmental transitions in complex tissues such as the brain.
Benchmarking differential expression and spatial modeling methods in SpNeigh
To evaluate the performance of the DE function RunLimmaDE in SpNeigh, we benchmarked it against the widely used FindAllMarkers function in Seurat using the mouse brain Xenium dataset. Both methods were applied to identify marker genes for five major cell types based on cluster-level annotations. RunLimmaDE completed in ~2 s, >10 times faster than FindAllMarkers, which required 26 seconds (Fig. 3a). The top marker genes identified by RunLimmaDE clearly separated distinct cell types and showed strong agreement with the results from FindAllMarkers, as illustrated in the dot plot of average expression and detection percentage (Fig. 3b). DE results from both methods were highly concordant, with an overall Spearman correlation of 0.81 between log fold changes across all cell types (Fig. 3c).
The spatial modeling methods implemented in SpNeigh (RunSpatialDE and ComputeSEI) are not directly comparable to tools such as SpatialDE or SPARK, which are designed for detecting general SVGs but do not incorporate spatial features such as boundaries or centroids. To assess the concordance between the two SpNeigh approaches, we analyzed cluster 0 (meningeal fibroblasts) using both boundary- and centroid-derived spatial weights (Fig. 3d). We then compared the ranks of genes based on Z1 coefficients from RunSpatialDE and normalized SEI values from ComputeSEI. The rank correlation was high, with Spearman coefficients of 0.75 (boundary weights) and 0.80 (centroid weights) (Fig. 3e), suggesting strong agreement between the two methods.
Notably, RunSpatialDE identified genes with increasing or decreasing trends along spatial gradients. For example, Dcn and Col1a1, markers of fibroblasts enriched near boundaries, were ranked among the top positive spatial DE genes, whereas Slc17a7 and Car4, markers of cortical neurons, showed negative spatial trends (Fig. 3f). The same fibroblast markers were also recovered by ComputeSEI (Fig. 3g), reinforcing the biological relevance of both approaches.
Both functions are computationally efficient and completed in under one second per run. While RunSpatialDE provides statistical inference and directionality of expression changes, ComputeSEI offers a complementary enrichment-based perspective that may capture biologically relevant genes overlooked by hypothesis testing. Together, these methods offer a robust and flexible toolkit for modeling spatial expression gradients.
SpNeigh reveals immune microenvironment differences between breast tumor and DCIS neighborhoods
The tumor microenvironment plays a critical role in cancer progression by influencing tumor growth, immune surveillance, and mechanisms of immune evasion [23, 24]. Although most invasive breast cancers arise from ductal carcinoma in situ (DCIS), only 10%–30% of DCIS cases eventually progress to invasive disease [25]. To demonstrate the utility of neighborhood analysis in SpNeigh, we applied it to a processed human breast cancer Xenium dataset with SingleR-based cell type annotations from Cheng et al. [12].
We focused on the spatial neighborhoods surrounding tumor and DCIS regions (Fig. 4a). Neighborhood ring regions were computed for tumor and DCIS separately, and cells located within these rings were extracted for downstream analysis (Fig. 4b). Cell type composition within each neighborhood was summarized (Fig. 4c). In the tumor neighborhood, fibroblasts were most abundant (29%), followed by T cells (14%), DCIS cells (13%), tumor cells (12%), and macrophages (11%). In the DCIS neighborhood, fibroblasts (34%) and T cells (17%) remained the top two cell types, with macrophages (11%) and DCIS cells (7%) also present.
To investigate cellular interactions within these neighborhoods, we computed spatial interaction matrices using function ComputeSpatialInteractionMatrix. The row-scaled heatmap (Fig. 4d) revealed distinct interaction patterns: DCIS cells in the tumor neighborhood primarily interacted with other DCIS cells, whereas DCIS in the DCIS neighborhood showed interactions with basal cells, DCIS, and fibroblasts. These differences suggest potential heterogeneity in DCIS subpopulations depending on their proximity to tumor regions. Moreover, immune cells such as T, B, plasma, and macrophage populations were observed to interact with T cells in both neighborhoods, indicating potential immune modulation.
To assess immune infiltration visually, we plotted the spatial distributions of major immune cell types (T, B, plasma, and macrophages) relative to tumor and DCIS boundaries (Fig. 4e). Notably, T, B, and plasma cells appeared more enriched around DCIS regions. To quantify these differences, we compared cell type proportions using a two-sided Student’s t-test. T cells were significantly more abundant in DCIS neighborhoods than in tumor neighborhoods (P = .012; Fig. 4f), whereas no significant differences were observed for macrophages (P = .166), plasma cells (P = .413), or B cells (P = .448). These results suggest a potentially protective immune environment surrounding DCIS, which may contribute to its limited progression into invasive carcinoma.
Overall, this case study demonstrates how SpNeigh can reveal subtle yet biologically meaningful differences in the cellular and immune landscapes between tumor and pre-invasive lesions using neighborhood-aware spatial analysis.
SpNeigh reveals expression changes during early-stage tumor development from DCIS
To investigate transcriptional changes during the transition from DCIS to invasive breast cancer, we analyzed tumor and DCIS cells in the same human breast cancer Xenium dataset. Boundaries were identified for both tumor and DCIS populations, revealing one major tumor region and multiple spatially distinct DCIS regions (Fig. 5a). As only a subset of DCIS cases progress to invasive carcinoma, we hypothesized that DCIS cells near tumors might show distinct gene expression signatures.
One DCIS region was located entirely within the tumor boundary (highlighted in red), while the remaining DCIS regions were outside the tumor region (green) (Fig. 5b). We performed differential expression analysis using RunLimmaDE to compare DCIS cells located inside versus outside the tumor boundary. Compared to DCIS cells distant from the tumor, tumor-adjacent DCIS cells showed elevated expression of TENT5C and reduced expression of MZB1 (Fig. 5c). Notably, expression levels of these two genes in the tumor-adjacent DCIS cells resembled those in tumor cells, suggesting early transcriptional reprogramming. Previous work based on the same dataset also identified MZB1 as a specific marker of DCIS regions distant from the tumor [1]. Interestingly, both TENT5C and MZB1 are known plasma cell markers [26, 27] (Supplementary Fig. S3), and the observed shifts in their expression may reflect immune microenvironment remodeling associated with early tumor progression.
To further examine intratumoral heterogeneity, we focused on spatial gradients within the tumor itself. We reasoned that tumor cells near the boundary may represent earlier developmental stages, while cells closer to the tumor centroid may be more advanced. We therefore computed centroid-based spatial weights for tumor cells (Fig. 5d) and applied both RunSpatialDE and ComputeSEI to identify spatially varying and enriched genes along this inferred developmental axis.
CEACAM6 and AGR3 were among the top genes showing decreasing expression along the centroid weights (Fig. 5e), while DUSP2 emerged as a highly enriched gene in cells near the centroid with high average expression across tumor cells (Fig. 5f). Visualization of these genes in both DCIS and tumor cells showed distinct spatial expression patterns (Fig. 5g). Notably, the DCIS region located inside the tumor boundary contained both tumor and DCIS cells. CEACAM6, a canonical DCIS marker [1], was highly expressed in all DCIS cells, including those within the tumor region, but was downregulated in neighboring tumor cells, showing a clear negative trend (Z1 <0) along the centroid weights. AGR3 expression also decreased progressively from DCIS cells to adjacent tumor cells and was lowest in tumor cells near the centroid. In contrast, DUSP2 was most highly expressed in centroid-proximal tumor cells and nearly absent in DCIS cells.
These results highlight the ability of SpNeigh to reveal spatial expression gradients and transitional states during tumor evolution. By modeling spatial context and neighborhood relationships, SpNeigh provides valuable insights into early tumor development and cellular reprogramming at the invasive front.
SpNeigh reveals gene expression changes along spatial gradients of liver zonation
The liver exhibits a highly structured lobular architecture, consisting of three zones: periportal (zone 1), midlobular (zone 2), and pericentral (zone 3) [28]. Midlobular hepatocytes are thought to contribute to hepatocyte renewal during homeostasis and regeneration [29]. To investigate zonation-associated gene expression patterns, we applied the spatial differential expression analysis in SpNeigh to a publicly available MERFISH dataset of healthy human liver tissue [13].
Low-quality cells were filtered out, and 23,144 hepatocytes were retained for downstream analysis. The hepatocyte subpopulations annotated by the original authors—hepatocyte 1, 2, and 3—correspond to periportal, midlobular, and pericentral zones, respectively. Boundaries for each hepatocyte population were computed and visualized in spatial plots (Fig. 6a). We then computed spatial weights to the boundaries of each population (Fig. 6b) and applied RunSpatialDE to identify spatially varying genes along these gradients.
The top 10 spatially differential genes for each hepatocyte population, identified using boundary-based weights, showed clear zonation-related expression trends (Supplementary Fig. S4a). For example, ASS1 and SDS were enriched near the boundaries of hepatocyte 1, with highest expression in hepatocyte 1 and intermediate expression in hepatocyte 2. Similarly, CYP1A1 and CYP3A4 were enriched near the boundaries of hepatocytes 2 and 3, with highest expression in hepatocyte 3 and intermediate levels in hepatocyte 2 (Fig. 6c and Supplementary Fig. S4a).
Expression levels of representative genes were correlated with their corresponding spatial weights using Spearman correlation. The correlations for ASS1, CYP1A1, and CYP3A4 with boundary weights for hepatocytes 1, 2, and 3 were 0.48, 0.21, and 0.41, respectively (Fig. 6d). Binned expression plots further demonstrated smooth gradients in gene expression along these spatial axes (Fig. 6e). Additional examples include GLS2, which showed peak expression in hepatocyte 1 and intermediate expression in hepatocyte 2, and CYP2E1, which showed the opposite trend, peaking in hepatocyte 3 (Supplementary Fig. S4b).
These findings are consistent with previously reported liver zonation markers, including periportal genes (GLS2, ASS1, SDS) and pericentral genes (CYP1A1, CYP3A4, CYP2E1) [13,30–33]. Interestingly, although canonical midlobular markers such as HAMP, HAMP2, and CCND1 [29,30] were not included in the 317-gene MERFISH panel, we identified COL1A1 as a potential negative marker for hepatocyte 2 (midlobular zone). Its expression was lowest in hepatocyte 2 and showed a negative trend along the spatial weights to hepatocyte 2 boundaries (Supplementary Fig. S4a and b). As COL1A1 is a known marker of hepatic stellate cells—cell types implicated in regulating liver zonation [34]—its expression in hepatocytes may reflect zone-specific regulatory interactions.
Together, these results demonstrate the ability of SpNeigh to reveal spatial expression gradients aligned with liver zonation and highlight its utility in boundary-aware analysis of spatial transcriptomics data.
Overview of the SpNeigh framework
To characterize local tissue architecture and spatial gene expression patterns, we developed SpNeigh, a modular R package for spatial neighborhood analysis and spatially aware differential expression analysis. The SpNeigh workflow consists of five core components (Fig. 1): (i) data input, (ii) boundary detection and neighborhood construction, (iii) spatial weighting, (iv) neighborhood interaction analysis, and (v) spatial differential and enrichment analysis.
Input data.SpNeigh operates on a coordinate data frame containing the x, y, cell, and cluster columns, together with a normalized gene expression matrix or dgCMatrix. These inputs can be extracted from commonly used spatial transcriptomics platforms, including 10× Genomics Xenium, Vizgen MERFISH, and 10× Visium HD..
Boundary detection and neighborhood construction. Spatial boundaries are identified for a specified cell population using clustering and density-based outlier removal. Optional buffer regions can be constructed around these boundaries to define concentric neighborhood rings, enabling comparative analysis between intra-region and peri-region cells.
Spatial weighting. For cells within a region, SpNeigh computes spatial weights based on distance to either the region boundary or the population centroid. These weights are scaled between 0 and 1, with higher values indicating closer proximity to the spatial feature of interest. Multiple decay schemes (e.g. inverse, Gaussian) are supported.
Neighborhood analysis. Cells located in neighborhood rings are extracted, and their composition is visualized using pie charts and bar plots. A spatial interaction matrix is computed by counting the nearest neighbors for each focal cluster and tabulating the frequencies of neighboring cluster identities.
Downstream spatial modeling. SpNeigh supports differential expression testing between cells inside and outside a region via RunLimmaDE, which leverages the widely used limma framework for linear modeling and empirical Bayes moderation [11]. Spatial expression gradients are modeled using RunSpatialDE, which fits gene expression as a smooth function of spatial weights using a spline-based linear model, also implemented within the limma framework. In parallel, spatial enrichment is quantified using the spatial enrichment index (SEI), which measures the weighted average expression of each gene relative to spatial proximity.
Together, these components provide a flexible and interpretable framework for dissecting fine-grained spatial organization and gene expression dynamics in diverse tissues and technologies. For convenience, several functions such as boundary detection and spatial plotting support direct input from Seurat and SpatialExperiment objects, though core modeling functions require explicit expression matrices and spatial weights.
To our knowledge, SpNeigh is the first method to combine boundary detection with directional distance-based weighting and spline-based spatial differential expression modeling in a unified framework. These components enable biologically meaningful comparisons of cells located near tissue interfaces or across spatial gradients.
SpNeigh reveals intermediate cell populations at brain region boundaries in mouse cortex
The mouse brain is anatomically well-characterized and widely used for benchmarking spatial transcriptomics tools, with extensive references available for cell type classification and regional annotations [17–19]. To demonstrate the utility of boundary-aware analysis in SpNeigh, we analyzed a publicly available mouse brain Xenium dataset (tiny version) from 10× Genomics. After quality control filtering to remove low-quality cells (total count or gene count thresholds), 36 402 high-quality cells remained out of 36 602 total. The Seurat [15] pipeline was applied for preprocessing and clustering, resulting in 12 clusters at a resolution of 0.1. These clusters were manually annotated based on known brain anatomical regions. To refine cellular identity, we also performed cell-level annotation using SingleR [16], referencing a curated subset of the large-scale mouse brain single-cell RNA sequencing (scRNA-seq) dataset from Yao et al. [14]. The resulting SingleR annotations (Supplementary Fig. S1a) were manually merged based on known taxonomy to reduce redundancy (see Supplementary Table S1). The clustering, region-level, and cell-level annotations are shown in Fig. 2a.
We focused our initial analysis on cluster 2, corresponding to the white matter region dominated by oligodendrocytes. This area connects the outer cerebral cortex to deeper brain regions and lies between the cortex and hippocampus. Using SpNeigh, two spatial boundaries and their corresponding neighborhood ring regions were detected for cluster 2. Cells within the primary (largest) boundary and surrounding ring were extracted for comparison (Fig. 2b). Inside the boundary, 70% of cells belonged to cluster 2, with clusters 10 and 0 contributing 13% and 10%, respectively. In contrast, within the neighborhood ring, cluster 4 accounted for 33% of cells, followed by clusters 2 (21%), 3 (16%), and 0 (15%). These distinct neighborhood compositions suggest that cluster 2 cells located within the boundary and in the surrounding ring may reside in different microenvironments.
To investigate this difference, we applied RunLimmaDE to compare cluster 2 cells inside the boundary versus those in the ring. We found that the cortical marker gene Slc17a7 was highly expressed in cluster 2 cells within the neighborhood ring but nearly absent within the boundary (Fig. 2c and Supplementary Fig. S1b). In contrast, the oligodendrocyte marker gene Sox10 was consistently expressed in both groups. This suggests that cluster 2 cells in the ring may represent intermediate cells at the interface between cortical neurons and white matter oligodendrocytes. We further examined the expression patterns of Slc17a7 and Sox10 in an independent mouse brain Visium HD dataset. Cells located near the boundary between cortical neurons and white matter oligodendrocytes exhibited concurrent expression of both Slc17a7 and Sox10 (Supplementary Fig. S2). This observation is consistent with the results obtained from the mouse brain Xenium dataset and supports the presence of transcriptionally intermediate cell populations at the cortex–white matter interface.
Given the layered architecture of the cortex, we next explored expression gradients in cerebral cortex cells adjacent to the white matter. The primary boundary of cluster 2 was split into two directional edges; edge 2 delineates the boundary between cortex and white matter (Fig. 2d). Spatial weights relative to edge 2 were computed and used as input for gradient modeling with RunSpatialDE (Fig. 2e). A heatmap of the top 10 spatially varying genes showed distinct expression trends along this axis (Fig. 2f). Notably, Ccn2 (also known as Ctgf) and Cplx3 were enriched near edge 2, consistent with prior studies identifying these genes as markers of layer 6b (L6b)—the deepest cortical layer [14, 20–22]. Both genes exhibited strong expression in L6b cells, with intermediate levels in neighboring L4/5/6 neurons, astrocytes, and oligodendrocytes (Fig. 2g and Supplementary Fig. S1c), further supporting the presence of transitional or mixed-identity cells near the white matter boundary.
These findings highlight the ability of SpNeigh to detect spatial gene expression gradients and reveal boundary-associated intermediate populations that may otherwise be overlooked. This boundary-aware modeling is especially valuable for understanding the fine-grained organization and developmental transitions in complex tissues such as the brain.
Benchmarking differential expression and spatial modeling methods in SpNeigh
To evaluate the performance of the DE function RunLimmaDE in SpNeigh, we benchmarked it against the widely used FindAllMarkers function in Seurat using the mouse brain Xenium dataset. Both methods were applied to identify marker genes for five major cell types based on cluster-level annotations. RunLimmaDE completed in ~2 s, >10 times faster than FindAllMarkers, which required 26 seconds (Fig. 3a). The top marker genes identified by RunLimmaDE clearly separated distinct cell types and showed strong agreement with the results from FindAllMarkers, as illustrated in the dot plot of average expression and detection percentage (Fig. 3b). DE results from both methods were highly concordant, with an overall Spearman correlation of 0.81 between log fold changes across all cell types (Fig. 3c).
The spatial modeling methods implemented in SpNeigh (RunSpatialDE and ComputeSEI) are not directly comparable to tools such as SpatialDE or SPARK, which are designed for detecting general SVGs but do not incorporate spatial features such as boundaries or centroids. To assess the concordance between the two SpNeigh approaches, we analyzed cluster 0 (meningeal fibroblasts) using both boundary- and centroid-derived spatial weights (Fig. 3d). We then compared the ranks of genes based on Z1 coefficients from RunSpatialDE and normalized SEI values from ComputeSEI. The rank correlation was high, with Spearman coefficients of 0.75 (boundary weights) and 0.80 (centroid weights) (Fig. 3e), suggesting strong agreement between the two methods.
Notably, RunSpatialDE identified genes with increasing or decreasing trends along spatial gradients. For example, Dcn and Col1a1, markers of fibroblasts enriched near boundaries, were ranked among the top positive spatial DE genes, whereas Slc17a7 and Car4, markers of cortical neurons, showed negative spatial trends (Fig. 3f). The same fibroblast markers were also recovered by ComputeSEI (Fig. 3g), reinforcing the biological relevance of both approaches.
Both functions are computationally efficient and completed in under one second per run. While RunSpatialDE provides statistical inference and directionality of expression changes, ComputeSEI offers a complementary enrichment-based perspective that may capture biologically relevant genes overlooked by hypothesis testing. Together, these methods offer a robust and flexible toolkit for modeling spatial expression gradients.
SpNeigh reveals immune microenvironment differences between breast tumor and DCIS neighborhoods
The tumor microenvironment plays a critical role in cancer progression by influencing tumor growth, immune surveillance, and mechanisms of immune evasion [23, 24]. Although most invasive breast cancers arise from ductal carcinoma in situ (DCIS), only 10%–30% of DCIS cases eventually progress to invasive disease [25]. To demonstrate the utility of neighborhood analysis in SpNeigh, we applied it to a processed human breast cancer Xenium dataset with SingleR-based cell type annotations from Cheng et al. [12].
We focused on the spatial neighborhoods surrounding tumor and DCIS regions (Fig. 4a). Neighborhood ring regions were computed for tumor and DCIS separately, and cells located within these rings were extracted for downstream analysis (Fig. 4b). Cell type composition within each neighborhood was summarized (Fig. 4c). In the tumor neighborhood, fibroblasts were most abundant (29%), followed by T cells (14%), DCIS cells (13%), tumor cells (12%), and macrophages (11%). In the DCIS neighborhood, fibroblasts (34%) and T cells (17%) remained the top two cell types, with macrophages (11%) and DCIS cells (7%) also present.
To investigate cellular interactions within these neighborhoods, we computed spatial interaction matrices using function ComputeSpatialInteractionMatrix. The row-scaled heatmap (Fig. 4d) revealed distinct interaction patterns: DCIS cells in the tumor neighborhood primarily interacted with other DCIS cells, whereas DCIS in the DCIS neighborhood showed interactions with basal cells, DCIS, and fibroblasts. These differences suggest potential heterogeneity in DCIS subpopulations depending on their proximity to tumor regions. Moreover, immune cells such as T, B, plasma, and macrophage populations were observed to interact with T cells in both neighborhoods, indicating potential immune modulation.
To assess immune infiltration visually, we plotted the spatial distributions of major immune cell types (T, B, plasma, and macrophages) relative to tumor and DCIS boundaries (Fig. 4e). Notably, T, B, and plasma cells appeared more enriched around DCIS regions. To quantify these differences, we compared cell type proportions using a two-sided Student’s t-test. T cells were significantly more abundant in DCIS neighborhoods than in tumor neighborhoods (P = .012; Fig. 4f), whereas no significant differences were observed for macrophages (P = .166), plasma cells (P = .413), or B cells (P = .448). These results suggest a potentially protective immune environment surrounding DCIS, which may contribute to its limited progression into invasive carcinoma.
Overall, this case study demonstrates how SpNeigh can reveal subtle yet biologically meaningful differences in the cellular and immune landscapes between tumor and pre-invasive lesions using neighborhood-aware spatial analysis.
SpNeigh reveals expression changes during early-stage tumor development from DCIS
To investigate transcriptional changes during the transition from DCIS to invasive breast cancer, we analyzed tumor and DCIS cells in the same human breast cancer Xenium dataset. Boundaries were identified for both tumor and DCIS populations, revealing one major tumor region and multiple spatially distinct DCIS regions (Fig. 5a). As only a subset of DCIS cases progress to invasive carcinoma, we hypothesized that DCIS cells near tumors might show distinct gene expression signatures.
One DCIS region was located entirely within the tumor boundary (highlighted in red), while the remaining DCIS regions were outside the tumor region (green) (Fig. 5b). We performed differential expression analysis using RunLimmaDE to compare DCIS cells located inside versus outside the tumor boundary. Compared to DCIS cells distant from the tumor, tumor-adjacent DCIS cells showed elevated expression of TENT5C and reduced expression of MZB1 (Fig. 5c). Notably, expression levels of these two genes in the tumor-adjacent DCIS cells resembled those in tumor cells, suggesting early transcriptional reprogramming. Previous work based on the same dataset also identified MZB1 as a specific marker of DCIS regions distant from the tumor [1]. Interestingly, both TENT5C and MZB1 are known plasma cell markers [26, 27] (Supplementary Fig. S3), and the observed shifts in their expression may reflect immune microenvironment remodeling associated with early tumor progression.
To further examine intratumoral heterogeneity, we focused on spatial gradients within the tumor itself. We reasoned that tumor cells near the boundary may represent earlier developmental stages, while cells closer to the tumor centroid may be more advanced. We therefore computed centroid-based spatial weights for tumor cells (Fig. 5d) and applied both RunSpatialDE and ComputeSEI to identify spatially varying and enriched genes along this inferred developmental axis.
CEACAM6 and AGR3 were among the top genes showing decreasing expression along the centroid weights (Fig. 5e), while DUSP2 emerged as a highly enriched gene in cells near the centroid with high average expression across tumor cells (Fig. 5f). Visualization of these genes in both DCIS and tumor cells showed distinct spatial expression patterns (Fig. 5g). Notably, the DCIS region located inside the tumor boundary contained both tumor and DCIS cells. CEACAM6, a canonical DCIS marker [1], was highly expressed in all DCIS cells, including those within the tumor region, but was downregulated in neighboring tumor cells, showing a clear negative trend (Z1 <0) along the centroid weights. AGR3 expression also decreased progressively from DCIS cells to adjacent tumor cells and was lowest in tumor cells near the centroid. In contrast, DUSP2 was most highly expressed in centroid-proximal tumor cells and nearly absent in DCIS cells.
These results highlight the ability of SpNeigh to reveal spatial expression gradients and transitional states during tumor evolution. By modeling spatial context and neighborhood relationships, SpNeigh provides valuable insights into early tumor development and cellular reprogramming at the invasive front.
SpNeigh reveals gene expression changes along spatial gradients of liver zonation
The liver exhibits a highly structured lobular architecture, consisting of three zones: periportal (zone 1), midlobular (zone 2), and pericentral (zone 3) [28]. Midlobular hepatocytes are thought to contribute to hepatocyte renewal during homeostasis and regeneration [29]. To investigate zonation-associated gene expression patterns, we applied the spatial differential expression analysis in SpNeigh to a publicly available MERFISH dataset of healthy human liver tissue [13].
Low-quality cells were filtered out, and 23,144 hepatocytes were retained for downstream analysis. The hepatocyte subpopulations annotated by the original authors—hepatocyte 1, 2, and 3—correspond to periportal, midlobular, and pericentral zones, respectively. Boundaries for each hepatocyte population were computed and visualized in spatial plots (Fig. 6a). We then computed spatial weights to the boundaries of each population (Fig. 6b) and applied RunSpatialDE to identify spatially varying genes along these gradients.
The top 10 spatially differential genes for each hepatocyte population, identified using boundary-based weights, showed clear zonation-related expression trends (Supplementary Fig. S4a). For example, ASS1 and SDS were enriched near the boundaries of hepatocyte 1, with highest expression in hepatocyte 1 and intermediate expression in hepatocyte 2. Similarly, CYP1A1 and CYP3A4 were enriched near the boundaries of hepatocytes 2 and 3, with highest expression in hepatocyte 3 and intermediate levels in hepatocyte 2 (Fig. 6c and Supplementary Fig. S4a).
Expression levels of representative genes were correlated with their corresponding spatial weights using Spearman correlation. The correlations for ASS1, CYP1A1, and CYP3A4 with boundary weights for hepatocytes 1, 2, and 3 were 0.48, 0.21, and 0.41, respectively (Fig. 6d). Binned expression plots further demonstrated smooth gradients in gene expression along these spatial axes (Fig. 6e). Additional examples include GLS2, which showed peak expression in hepatocyte 1 and intermediate expression in hepatocyte 2, and CYP2E1, which showed the opposite trend, peaking in hepatocyte 3 (Supplementary Fig. S4b).
These findings are consistent with previously reported liver zonation markers, including periportal genes (GLS2, ASS1, SDS) and pericentral genes (CYP1A1, CYP3A4, CYP2E1) [13,30–33]. Interestingly, although canonical midlobular markers such as HAMP, HAMP2, and CCND1 [29,30] were not included in the 317-gene MERFISH panel, we identified COL1A1 as a potential negative marker for hepatocyte 2 (midlobular zone). Its expression was lowest in hepatocyte 2 and showed a negative trend along the spatial weights to hepatocyte 2 boundaries (Supplementary Fig. S4a and b). As COL1A1 is a known marker of hepatic stellate cells—cell types implicated in regulating liver zonation [34]—its expression in hepatocytes may reflect zone-specific regulatory interactions.
Together, these results demonstrate the ability of SpNeigh to reveal spatial expression gradients aligned with liver zonation and highlight its utility in boundary-aware analysis of spatial transcriptomics data.
Discussion
Discussion
Spatial transcriptomics technologies have transformed our ability to study gene expression in tissue architecture, but extracting biological meaning from spatial organization remains methodologically challenging. SpNeigh introduces a flexible and interpretable framework for spatial neighborhood modeling, integrating local boundary detection, distance-aware weighting, spatial differential expression, and enrichment analysis. Through modular components, SpNeigh enables the investigation of cell–cell interactions, microenvironmental heterogeneity, and spatially varying transcriptional programs across diverse tissue types and spatial platforms.
Our analysis demonstrates that SpNeigh effectively captures biologically relevant spatial features across a range of datasets. In mouse brain, boundary-aware modeling identified intermediate populations and layered expression gradients near white matter boundaries. In human breast cancer, neighborhood ring analysis revealed distinct immune microenvironments surrounding tumor and DCIS regions, highlighting spatially structured immune infiltration. In healthy liver, SpNeigh recovered known zonation patterns and identified new spatially varying genes, including zone-specific markers and potential regulatory candidates. Together, these examples showcase the utility of SpNeigh for discovering fine-grained transcriptional variation driven by spatial context.
Compared to existing methods such as SpatialDE [6], SPARK [7], and Giotto [4], which primarily focus on global spatial variability or spatial autocorrelation, SpNeigh is uniquely designed for neighborhood-aware modeling and boundary-centric analysis. The incorporation of spatial weights anchored to biologically meaningful structures—such as cluster boundaries or centroids—allows for directional testing of spatial expression gradients. Unlike Gaussian process-based methods that may suffer from scalability issues on large datasets, our use of spline-based linear modeling and weighted means ensures rapid computation without sacrificing interpretability.
While numerous computational tools have been developed for spatial transcriptomics analysis, they differ substantially in their underlying assumptions and analytical goals. Methods such as SpatialDE and SPARK are primarily designed to identify globally SVGs by modeling spatial autocorrelation across an entire tissue section. These approaches are well suited for detecting broad spatial patterns, but they do not explicitly model local tissue structure, such as boundaries, neighborhood composition, or directional gradients.
SpNeigh addresses a complementary analytical need by explicitly incorporating spatial geometry into gene expression modeling. By defining tissue boundaries, neighborhood rings, and distance-based spatial weights, SpNeigh enables region-aware and boundary-aware analyses that focus on local spatial context. This includes differential expression between boundary-adjacent and interior cells, gradient-based modeling along user-defined spatial axes, and enrichment analysis near specific spatial features. These analyses are difficult to perform using existing global spatial variability methods, and direct quantitative benchmarking against such tools is therefore not always appropriate.
We note that comprehensive benchmarking of neighborhood- and boundary-aware spatial analyses remains challenging due to the lack of ground-truth annotations for spatial interfaces and gradients. In this study, we focused on demonstrating biological plausibility, interpretability, and computational efficiency across multiple platforms and tissues. Future work may explore simulation-based benchmarking or community benchmarks as such resources become available.
While SpNeigh introduces novel capabilities, several directions for future development remain. For example, automated region selection, dynamic modeling of cell–cell interactions, and multivariate spatial analyses could further enhance its applicability. The current framework focuses primarily on 2D spatial datasets, but extensions to 3D tissue volumes and time-series spatial data are conceptually straightforward. Additionally, integrating histological features or multi-modal data may enable richer modeling of spatial niches.
Although SpNeigh was optimized for high-resolution spatial platforms such as Xenium and MERFISH, it is compatible with a broad range of spatial transcriptomics technologies, including sequencing-based platforms such as Visium HD [3] and Stereo-seq [19], provided spatial coordinates and expression data are available. This flexibility enables users to adapt SpNeigh across diverse datasets and experimental designs.
In summary, SpNeigh addresses a critical need for interpretable and efficient spatial modeling tools that account for local context and region-specific variation. Its flexible design and compatibility with modern spatial platforms make it a valuable addition to the spatial transcriptomics toolkit, supporting a broad range of biological and clinical investigations.
Spatial transcriptomics technologies have transformed our ability to study gene expression in tissue architecture, but extracting biological meaning from spatial organization remains methodologically challenging. SpNeigh introduces a flexible and interpretable framework for spatial neighborhood modeling, integrating local boundary detection, distance-aware weighting, spatial differential expression, and enrichment analysis. Through modular components, SpNeigh enables the investigation of cell–cell interactions, microenvironmental heterogeneity, and spatially varying transcriptional programs across diverse tissue types and spatial platforms.
Our analysis demonstrates that SpNeigh effectively captures biologically relevant spatial features across a range of datasets. In mouse brain, boundary-aware modeling identified intermediate populations and layered expression gradients near white matter boundaries. In human breast cancer, neighborhood ring analysis revealed distinct immune microenvironments surrounding tumor and DCIS regions, highlighting spatially structured immune infiltration. In healthy liver, SpNeigh recovered known zonation patterns and identified new spatially varying genes, including zone-specific markers and potential regulatory candidates. Together, these examples showcase the utility of SpNeigh for discovering fine-grained transcriptional variation driven by spatial context.
Compared to existing methods such as SpatialDE [6], SPARK [7], and Giotto [4], which primarily focus on global spatial variability or spatial autocorrelation, SpNeigh is uniquely designed for neighborhood-aware modeling and boundary-centric analysis. The incorporation of spatial weights anchored to biologically meaningful structures—such as cluster boundaries or centroids—allows for directional testing of spatial expression gradients. Unlike Gaussian process-based methods that may suffer from scalability issues on large datasets, our use of spline-based linear modeling and weighted means ensures rapid computation without sacrificing interpretability.
While numerous computational tools have been developed for spatial transcriptomics analysis, they differ substantially in their underlying assumptions and analytical goals. Methods such as SpatialDE and SPARK are primarily designed to identify globally SVGs by modeling spatial autocorrelation across an entire tissue section. These approaches are well suited for detecting broad spatial patterns, but they do not explicitly model local tissue structure, such as boundaries, neighborhood composition, or directional gradients.
SpNeigh addresses a complementary analytical need by explicitly incorporating spatial geometry into gene expression modeling. By defining tissue boundaries, neighborhood rings, and distance-based spatial weights, SpNeigh enables region-aware and boundary-aware analyses that focus on local spatial context. This includes differential expression between boundary-adjacent and interior cells, gradient-based modeling along user-defined spatial axes, and enrichment analysis near specific spatial features. These analyses are difficult to perform using existing global spatial variability methods, and direct quantitative benchmarking against such tools is therefore not always appropriate.
We note that comprehensive benchmarking of neighborhood- and boundary-aware spatial analyses remains challenging due to the lack of ground-truth annotations for spatial interfaces and gradients. In this study, we focused on demonstrating biological plausibility, interpretability, and computational efficiency across multiple platforms and tissues. Future work may explore simulation-based benchmarking or community benchmarks as such resources become available.
While SpNeigh introduces novel capabilities, several directions for future development remain. For example, automated region selection, dynamic modeling of cell–cell interactions, and multivariate spatial analyses could further enhance its applicability. The current framework focuses primarily on 2D spatial datasets, but extensions to 3D tissue volumes and time-series spatial data are conceptually straightforward. Additionally, integrating histological features or multi-modal data may enable richer modeling of spatial niches.
Although SpNeigh was optimized for high-resolution spatial platforms such as Xenium and MERFISH, it is compatible with a broad range of spatial transcriptomics technologies, including sequencing-based platforms such as Visium HD [3] and Stereo-seq [19], provided spatial coordinates and expression data are available. This flexibility enables users to adapt SpNeigh across diverse datasets and experimental designs.
In summary, SpNeigh addresses a critical need for interpretable and efficient spatial modeling tools that account for local context and region-specific variation. Its flexible design and compatibility with modern spatial platforms make it a valuable addition to the spatial transcriptomics toolkit, supporting a broad range of biological and clinical investigations.
Supplementary Material
Supplementary Material
lqag039_Supplemental_File
lqag039_Supplemental_File
출처: PubMed Central (JATS). 라이선스는 원 publisher 정책을 따릅니다 — 인용 시 원문을 표기해 주세요.
🏷️ 같은 키워드 · 무료전문 — 이 논문 MeSH/keyword 기반
- A Phase I Study of Hydroxychloroquine and Suba-Itraconazole in Men with Biochemical Relapse of Prostate Cancer (HITMAN-PC): Dose Escalation Results.
- Self-management of male urinary symptoms: qualitative findings from a primary care trial.
- Clinical and Liquid Biomarkers of 20-Year Prostate Cancer Risk in Men Aged 45 to 70 Years.
- Diagnostic accuracy of Ga-PSMA PET/CT versus multiparametric MRI for preoperative pelvic invasion in the patients with prostate cancer.
- Clinical Presentation and Outcomes of Patients Undergoing Surgery for Thyroid Cancer.
- Association of patient health education with the postoperative health related quality of life in low- intermediate recurrence risk differentiated thyroid cancer patients.