| Title: | Spatially-Aware Cell Clustering Algorithm with Cluster Significant Assessment |
|---|---|
| Description: | A spatially-aware cell clustering algorithm is provided with cluster significance assessment. It comprises four key modules: spatially-aware cell-gene co-embedding, cell clustering, signature gene identification, and cluster significant assessment. More details can be referred to Peng Xie, et al. (2025) <doi:10.1016/j.cell.2025.05.035>. |
| Authors: | Wei Liu [aut, cre], Xiao Zhang [aut], Yi Yang [aut], Peng Xie [aut], Chengqi Lin [aut], Jin Liu [aut] |
| Maintainer: | Wei Liu <[email protected]> |
| License: | GPL-3 |
| Version: | 0.2.0 |
| Built: | 2026-05-30 11:02:44 UTC |
| Source: | https://github.com/feiyoung/cofast |
Calculate the adjacency matrix given a spatial coordinate matrix with 2-dimension or 3-dimension or more.
AddAdj( pos, type = "fixed_distance", platform = c("Others", "Visium", "ST"), neighbors = 6, ... )AddAdj( pos, type = "fixed_distance", platform = c("Others", "Visium", "ST"), neighbors = 6, ... )
pos |
a matrix object, with columns representing the spatial coordinates that can be any diemsion, i.e., 2, 3 and >3. |
type |
an optional string, specify which type of neighbors' definition. Here we provide two definition: one is "fixed_distance", the other is "fixed_number". |
platform |
a string, specify the platform of the provided data, default as "Others". There are more platforms to be chosen, including "Visuim", "ST" and "Others" ("Others" represents the other SRT platforms except for 'Visium' and 'ST') The platform helps to calculate the adjacency matrix by defining the neighborhoods when type="fixed_distance" is chosen. |
neighbors |
an optional postive integer, specify how many neighbors used in calculation, default as 6. |
... |
Other arguments passed to |
When the type = "fixed_distance", then the spots within the Euclidean distance cutoffs from one spot are regarded as the neighbors of this spot. When the type = "fixed_number", the K-nearest spots are regarded as the neighbors of each spot.
return a sparse matrix, representing the adjacency matrix.
None
None
data(CosMx_subset) pos <- as.matrix([email protected][,c("x", "y")]) Adj_sp <- AddAdj(pos)data(CosMx_subset) pos <- as.matrix(CosMx_subset@meta.data[,c("x", "y")]) Adj_sp <- AddAdj(pos)
Identify clusters of spots by a shared nearest neighbor (SNN) modularity optimization based on coFAST's embeddings.
AddCluster( seu, reduction = "cofast", cluster.name = "cofast.cluster", res = 0.8, K = NULL, res.start = 0.2, res.end = 2, step = 0.02 )AddCluster( seu, reduction = "cofast", cluster.name = "cofast.cluster", res = 0.8, K = NULL, res.start = 0.2, res.end = 2, step = 0.02 )
seu |
a Seurat object. |
reduction |
a optional string, dimensional reduction name, 'cofast' by default. |
cluster.name |
an optional string, specify the colname in meta.data for clusters, 'cofast.cluster' by default. |
res |
a positive real, speficy the resolution parameter for Louvain clustering, default as 0.8. |
K |
a positive integer or NULL, specify the number of clusters, default as NULL that indicates not specify the number of clusters. |
res.start |
a positive real, when K is not NULL, starting value of resolution to be searched, default as 0.2. |
res.end |
a positive real, when K is not NULL, ending value of resolution to be searched, default as 2. |
step |
a positive real, when K is not NULL, step size of resolution to be searched, default as 0.02. |
None
return a revised Seurat object with a new column in meta.data named cluster.name.
None
None
library(Seurat) data(pbmc3k_subset) pbmc3k_subset <- AddCluster(pbmc3k_subset, reduction='ncfm') head(pbmc3k_subset)library(Seurat) data(pbmc3k_subset) pbmc3k_subset <- AddCluster(pbmc3k_subset, reduction='ncfm') head(pbmc3k_subset)
Calculate the adjacency matrix given a spatial coordinate matrix with 2-dimension or 3-dimension or more.
Addcoord2embed(seu, coord.name, assay = "RNA")Addcoord2embed(seu, coord.name, assay = "RNA")
seu |
a SeuratObject with spatial coordinate information in the |
coord.name |
a character vector, specify the names of spatial coordinates in the |
assay |
a string, specify the assay. |
return a revised Seurat object with a slot 'Spatial' in the reductions slot.
None
None
data(CosMx_subset) library(Seurat) Addcoord2embed(CosMx_subset, coord.name = c("x", "y"))data(CosMx_subset) library(Seurat) Addcoord2embed(CosMx_subset, coord.name = c("x", "y"))
Calculate the adjacency matrix given a spatial coordinate matrix with 2-dimension or 3-dimension or more.
AggregationScore(seu, reduction.name = "cofast", random.seed = 1)AggregationScore(seu, reduction.name = "cofast", random.seed = 1)
seu |
a SeuratObject with reductions not |
reduction.name |
an character, specify the reduction name for calculating the aggregation score. |
random.seed |
a positive integer, specify the random seed for reproducibility. |
return a data.frame with two columns: the first column is the number of spots in each category (cluster/cell type); the second column is the corresponding aggregation score.
None
None
library(Seurat) data(CosMx_subset) CosMx_subset <- Addcoord2embed(CosMx_subset, coord.name = c("x", "y")) Idents(CosMx_subset) <- 'cell_type' dat.sp.score <- AggregationScore(CosMx_subset, reduction.name = 'Spatial') print(dat.sp.score)library(Seurat) data(CosMx_subset) CosMx_subset <- Addcoord2embed(CosMx_subset, coord.name = c("x", "y")) Idents(CosMx_subset) <- 'cell_type' dat.sp.score <- AggregationScore(CosMx_subset, reduction.name = 'Spatial') print(dat.sp.score)
Graph output of a dimensional reduction technique on a 2D scatter plot where each point is a cell or feature and it's positioned based on the coembeddings determined by the reduction technique. By default, cells and their signature features are colored by their identity class (can be changed with the group.by parameter).
coembed_plot( seu, reduction, gene_txtdata = NULL, cell_label = NULL, xy_name = reduction, dims = c(1, 2), cols = NULL, shape_cg = c(1, 5), pt_size = 1, pt_text_size = 5, base_size = 16, base_family = "serif", legend.point.size = 5, legend.key.size = 1.5, alpha = 0.3 )coembed_plot( seu, reduction, gene_txtdata = NULL, cell_label = NULL, xy_name = reduction, dims = c(1, 2), cols = NULL, shape_cg = c(1, 5), pt_size = 1, pt_text_size = 5, base_size = 16, base_family = "serif", legend.point.size = 5, legend.key.size = 1.5, alpha = 0.3 )
seu |
a Seurat object with coembedding in the reductions slot wiht component name reduction. |
reduction |
a string, specify the reduction component that denotes coembedding. |
gene_txtdata |
a data.frame object with columns indcluding 'gene' and 'label', specify the cell type/spatial domain and signature genes. Default as NULL, all features will be used in comebeddings. |
cell_label |
an optional character in columns of metadata, specify the group of cells/spots. Default as NULL, use Idents as the group. |
xy_name |
an optional character, specify the names of x and y-axis, default as the same as reduction. |
dims |
a postive integer vector with length 2, specify the two components for visualization. |
cols |
an optional string vector, specify the colors for cell group in visualization. |
shape_cg |
a positive integers with length 2, specify the shapes of cell/spot and feature in plot. |
pt_size |
an optional integer, specify the point size, default as 1. |
pt_text_size |
an optional integer, specify the point size of text, default as 5. |
base_size |
an optional integer, specify the basic size. |
base_family |
an optional character, specify the font. |
legend.point.size |
an optional integer, specify the point size of legend. |
legend.key.size |
an optional integer, specify the size of legend key. |
alpha |
an optional positive real, range from 0 to 1, specify the transparancy of points. |
None
return a ggplot object
None
library(Seurat) data(pbmc3k_subset) data(top5_signatures) coembed_plot(pbmc3k_subset, reduction = "UMAPsig", gene_txtdata = top5_signatures, pt_text_size = 3, alpha=0.3)library(Seurat) data(pbmc3k_subset) data(top5_signatures) coembed_plot(pbmc3k_subset, reduction = "UMAPsig", gene_txtdata = top5_signatures, pt_text_size = 3, alpha=0.3)
Calculate UMAP projections for coembedding of cells and features
coembedding_umap( seu, reduction, reduction.name, gene.set = NULL, slot = "data", assay = "RNA", seed = 1 )coembedding_umap( seu, reduction, reduction.name, gene.set = NULL, slot = "data", assay = "RNA", seed = 1 )
seu |
a Seurat object with coembedding in the reductions slot wiht component name reduction. |
reduction |
a string, specify the reduction component that denotes coembedding. |
reduction.name |
a string, specify the reduction name for the obtained UMAP projection. |
gene.set |
a string vector, specify the features (genes) in calculating the UMAP projection, default as all features. |
slot |
an optional string, specify the slot in the assay, default as 'data'. |
assay |
an optional string, specify the assay name in the Seurat object when adding the UMAP projection. |
seed |
an optional integer, specify the random seed for reproducibility. |
None
return a revised Seurat object by adding a new reduction component named 'reduction.name'.
None
None
library(Seurat) data(pbmc3k_subset) data(top5_signatures) pbmc3k_subset <- coembedding_umap( pbmc3k_subset, reduction = "ncfm", reduction.name = "UMAPsig", gene.set = top5_signatures$gene )library(Seurat) data(pbmc3k_subset) data(top5_signatures) pbmc3k_subset <- coembedding_umap( pbmc3k_subset, reduction = "ncfm", reduction.name = "UMAPsig", gene.set = top5_signatures$gene )
Run cell-feature coembedding for SRT data based on FAST model.
coFAST( object, Adj_sp, assay = NULL, slot = "data", nfeatures = 2000, q = 10, reduction.name = "cofast", var.features = NULL, ... )coFAST( object, Adj_sp, assay = NULL, slot = "data", nfeatures = 2000, q = 10, reduction.name = "cofast", var.features = NULL, ... )
object |
a Seurat object. |
Adj_sp |
a sparse matrix, specify the adjacency matrix among spots. |
assay |
an optional string, the name of assay used. |
slot |
an optional string, the name of slot used. |
nfeatures |
an optional postive integer, the number of features to select as top variable features. Default is 2000. |
q |
an optional positive integer, specify the dimension of low dimensional embeddings to compute and store. Default is 10. |
reduction.name |
an optional string, dimensional reduction name, 'cofast' by default. |
var.features |
an optional string vector, specify the variable features, used to calculate cell embedding. |
... |
Other argument passed to the |
return a revised Seurat object with a new reduction slot reduction.name obtained by coFAST co-embedding, where default reduction.name is 'cofast'.
library(Seurat) data(CosMx_subset) pos <- as.matrix([email protected][,c("x", "y")]) Adj_sp <- AddAdj(pos) # Here, we set maxIter = 3 for cofast computation and demonstration. CosMx_subset <- coFAST(CosMx_subset, Adj_sp = Adj_sp, maxIter=3)library(Seurat) data(CosMx_subset) pos <- as.matrix(CosMx_subset@meta.data[,c("x", "y")]) Adj_sp <- AddAdj(pos) # Here, we set maxIter = 3 for cofast computation and demonstration. CosMx_subset <- coFAST(CosMx_subset, Adj_sp = Adj_sp, maxIter=3)
This is a toy CosMix spatial transcriptomics data.
library(Seurat) data(CosMx_subset) head(CosMx_subset)library(Seurat) data(CosMx_subset) head(CosMx_subset)
This function estimate the dimension of low dimensional embedding for a given cell by gene expression matrix. For more details, see Franklin et al. (1995) and Crawford et al. (2010).
diagnostic.cor.eigs(object, ...) ## Default S3 method: diagnostic.cor.eigs( object, q_max = 50, plot = TRUE, n.sims = 10, parallel = TRUE, ncores = 10, seed = 1, ... ) ## S3 method for class 'Seurat' diagnostic.cor.eigs( object, assay = NULL, slot = "data", nfeatures = 2000, q_max = 50, seed = 1, ... )diagnostic.cor.eigs(object, ...) ## Default S3 method: diagnostic.cor.eigs( object, q_max = 50, plot = TRUE, n.sims = 10, parallel = TRUE, ncores = 10, seed = 1, ... ) ## S3 method for class 'Seurat' diagnostic.cor.eigs( object, assay = NULL, slot = "data", nfeatures = 2000, q_max = 50, seed = 1, ... )
object |
A Seurat or matrix object |
... |
Other arguments passed to |
q_max |
the upper bound of low dimensional embedding. Default is 50. |
plot |
a indicator of whether plot eigen values. |
n.sims |
number of simulaton times. Default is 10. |
parallel |
a indicator of whether use parallel analysis. |
ncores |
the number of cores used in parallel analysis. Default is 10. |
seed |
a postive integer, specify the random seed for reproducibility |
assay |
an optional string, specify the name of assay in the Seurat object to be used. |
slot |
an optional string, specify the name of slot. |
nfeatures |
an optional integer, specify the number of features to select as top variable features. Default is 2000. |
A data.frame with attribute 'q_est' and 'plot', which is the estimated dimension of low dimensional embedding. In addition, this data.frame containing the following components:
q - The index of eigen values.
eig_value - The eigen values on observed data.
eig_sim - The mean value of eigen values of n.sims simulated data.
q_est - The selected dimension in attr(obj, 'q_est').
plot - The plot saved in attr(obj, 'plot').
1. Franklin, S. B., Gibson, D. J., Robertson, P. A., Pohlmann, J. T., & Fralish, J. S. (1995). Parallel analysis: a method for determining significant principal components. Journal of Vegetation Science, 6(1), 99-106.
2. Crawford, A. V., Green, S. B., Levy, R., Lo, W. J., Scott, L., Svetina, D., & Thompson, M. S. (2010). Evaluation of parallel analysis methods for determining the number of factors.Educational and Psychological Measurement, 70(6), 885-901.
n <- 100 p <- 50 d <- 15 object <- matrix(rnorm(n*d), n, d) %*% matrix(rnorm(d*p), d, p) diagnostic.cor.eigs(object, n.sims=2)n <- 100 p <- 50 d <- 15 object <- matrix(rnorm(n*d), n, d) %*% matrix(rnorm(d*p), d, p) diagnostic.cor.eigs(object, n.sims=2)
Find the signature genes for each group of cell/spots based on coembedding distance and expression ratio.
find.signature.genes( seu, distce.assay = "distce", ident = NULL, expr.prop.cutoff = 0.1, assay = NULL, genes.use = NULL )find.signature.genes( seu, distce.assay = "distce", ident = NULL, expr.prop.cutoff = 0.1, assay = NULL, genes.use = NULL )
seu |
a Seurat object with coembedding in the reductions slot wiht component name reduction. |
distce.assay |
an optional character, specify the assay name that constains distance matrix beween cells/spots and features, default as 'distce' (distance of coembeddings). |
ident |
an optional character in columns of metadata, specify the group of cells/spots. Default as NULL, use Idents as the group. |
expr.prop.cutoff |
an optional postive real ranging from 0 to 1, specify cutoff of expression proportion of features, default as 0.1. |
assay |
an optional character, specify the assay in seu, default as NULL, representing the default assay in seu. |
genes.use |
an optional string vector, specify genes as the signature candidates. |
In each data.frame object of the returned value, the row.names are gene names, and these genes are sorted by decreasing order of 'distance'. User can define the signature genes as top n genes in distance and that the 'expr.prop' larger than a cutoff. We set the cutoff as 0.1.
return a list with each component a data.frame object having two columns: 'distance' and 'expr.prop'.
None
None
library(Seurat) data(pbmc3k_subset) pbmc3k_subset <- pdistance(pbmc3k_subset, reduction='ncfm') df_list_rna <- find.signature.genes(pbmc3k_subset)library(Seurat) data(pbmc3k_subset) pbmc3k_subset <- pdistance(pbmc3k_subset, reduction='ncfm') df_list_rna <- find.signature.genes(pbmc3k_subset)
Obtain the top signature genes and related information.
get.top.signature.dat(df.list, ntop = 5, expr.prop.cutoff = 0.1)get.top.signature.dat(df.list, ntop = 5, expr.prop.cutoff = 0.1)
df.list |
a list that is obtained by the function |
ntop |
an optional positive integer, specify the how many top signature genes extracted, default as 5. |
expr.prop.cutoff |
an optional postive real ranging from 0 to 1, specify cutoff of expression proportion of features, default as 0.1. |
Using this funciton, we obtain the top signature genes and organize them into a data.frame. The 'row.names' are gene names. The colname 'distance' means the distance between gene (i.e., VPREB3) and cells with the specific cell type (i.e., B cell), which is calculated based on the coembedding of genes and cells in the coembedding space. The distance is smaller, the association between gene and the cell type is stronger. The colname 'expr.prop' represents the expression proportion of the gene (i.e., VPREB3) within the cell type (i.e., B cell). The colname 'label' means the cell types and colname 'gene' denotes the gene name. By the data.frame object, we know 'VPREB3' is the one of the top signature gene of B cell.
return a 'data.frame' object with four columns: 'distance','expr.prop', 'label' and 'gene'.
None
None
library(Seurat) data(pbmc3k_subset) pbmc3k_subset <- pdistance(pbmc3k_subset, reduction='ncfm') df_list_rna <- find.signature.genes(pbmc3k_subset) dat.sig <- get.top.signature.dat(df_list_rna, ntop=5) head(dat.sig)library(Seurat) data(pbmc3k_subset) pbmc3k_subset <- pdistance(pbmc3k_subset, reduction='ncfm') df_list_rna <- find.signature.genes(pbmc3k_subset) dat.sig <- get.top.signature.dat(df_list_rna, ntop=5) head(dat.sig)
Cell-feature coembedding for scRNA-seq data based on FAST model.
NCFM( object, assay = NULL, slot = "data", nfeatures = 2000, q = 10, reduction.name = "ncfm", weighted = FALSE, var.features = NULL )NCFM( object, assay = NULL, slot = "data", nfeatures = 2000, q = 10, reduction.name = "ncfm", weighted = FALSE, var.features = NULL )
object |
a Seurat object. |
assay |
an optional string, specify the name of assay in the Seurat object to be used, 'NULL' means default assay in seu. |
slot |
an optional string, specify the name of slot. |
nfeatures |
an optional integer, specify the number of features to select as top variable features. Default is 2000. |
q |
an optional positive integer, specify the dimension of low dimensional embeddings to compute and store. Default is 10. |
reduction.name |
an optional string, specify the dimensional reduction name, 'ncfm' by default. |
weighted |
an optional logical value, specify whether use weighted method. |
var.features |
an optional string vector, specify the variable features used to calculate cell embedding. |
return a revised Seurat object with a new reduction slot reduction.name obtained by NCFM co-embedding method, where reduction.name is default as 'ncfm'.
data(pbmc3k_subset) pbmc3k_subset <- NCFM(pbmc3k_subset)data(pbmc3k_subset) pbmc3k_subset <- NCFM(pbmc3k_subset)
This a toy single-cell RNA-seq data, the subset of PBMC3K.
library(Seurat) data(pbmc3k_subset) head(pbmc3k_subset)library(Seurat) data(pbmc3k_subset) head(pbmc3k_subset)
Calculate the cell-feature distance matrix based on coembeddings.
pdistance(object, reduction = "cofast", assay.name = "distce", eta = 1e-10)pdistance(object, reduction = "cofast", assay.name = "distce", eta = 1e-10)
object |
a Seurat object. |
reduction |
a optional string, dimensional reduction name, 'cofast' by default. |
assay.name |
a optional string, specify the new generated assay name, 'distce' by default. |
eta |
an optional positive real, a quantity to avoid numerical errors. 1e-10 by default. |
This function calculate the distance matrix between cells/spots and features, and then put the distance matrix in a new generated assay. This distance matrix will be used in the siganture gene identification.
return a revised Seurat object with a assay slot 'assay.name'.
data(pbmc3k_subset) pbmc3k_subset <- NCFM(pbmc3k_subset) pbmc3k_subset <- pdistance(pbmc3k_subset, "ncfm")data(pbmc3k_subset) pbmc3k_subset <- NCFM(pbmc3k_subset) pbmc3k_subset <- pdistance(pbmc3k_subset, "ncfm")
A dataframe including top five signature genes for each cell type of PBMC3k.
library(Seurat) data(top5_signatures) head(top5_signatures)library(Seurat) data(top5_signatures) head(top5_signatures)