Package 'ProFAST' reference manual

Title:	Probabilistic Factor Analysis for Spatially-Aware Dimension Reduction
Description:	Probabilistic factor analysis for spatially-aware dimension reduction across multi-section spatial transcriptomics data with millions of spatial locations. More details can be referred to Wei Liu, et al. (2023) <doi:10.1101/2023.07.11.548486>.
Authors:	Wei Liu [aut, cre], Xiao Zhang [aut], Jin Liu [aut]
Maintainer:	Wei Liu <[email protected]>
License:	GPL-3
Version:	1.4
Built:	2025-03-09 06:03:03 UTC
Source:	https://github.com/feiyoung/profast

Calculate the adjacency matrix given a spatial coordinate matrix

Description

Calculate the adjacency matrix given a spatial coordinate matrix with 2-dimension or 3-dimension or more.

Usage

AddAdj(
  pos,
  type = "fixed_distance",
  platform = c("Others", "Visium", "ST"),
  neighbors = 6,
  ...
)
AddAdj(
  pos,
  type = "fixed_distance",
  platform = c("Others", "Visium", "ST"),
  neighbors = 6,
  ...
)

Arguments

`pos`	a matrix object, with columns representing the spatial coordinates that can be any diemsion, i.e., 2, 3 and >3.
`type`	an optional string, specify which type of neighbors' definition. Here we provide two definition: one is "fixed_distance", the other is "fixed_number".
`platform`	a string, specify the platform of the provided data, default as "Others". There are more platforms to be chosen, including "Visuim", "ST" and "Others" ("Others" represents the other SRT platforms except for 'Visium' and 'ST') The platform helps to calculate the adjacency matrix by defining the neighborhoods when type="fixed_distance" is chosen.
`neighbors`	an optional postive integer, specify how many neighbors used in calculation, default as 6.
`...`	Other arguments passed to `getAdj_auto`.

Details

When the type = "fixed_distance", then the spots within the Euclidean distance cutoffs from one spot are regarded as the neighbors of this spot. When the type = "fixed_number", the K-nearest spots are regarded as the neighbors of each spot.

Value

return a sparse matrix, representing the adjacency matrix.

References

None

Examples

data(CosMx_subset)
pos <- as.matrix([email protected][,c("x", "y")])
Adj_sp <- AddAdj(pos)

data(CosMx_subset)
pos <- as.matrix(CosMx_subset@meta.data[,c("x", "y")])
Adj_sp <- AddAdj(pos)

Add FAST model settings for a PRECASTObj object

Description

Add FAST model settings for a PRECASTObj object

Usage

AddParSettingFAST(PRECASTObj, ...)
AddParSettingFAST(PRECASTObj, ...)

Arguments

`PRECASTObj`	a PRECASTObj object created by `CreatePRECASTObject`.
`...`	other arguments to be passed to `model_set_FAST` function.

Value

Return a revised PRECASTObj object with slot parameterList changed.

References

None

Coembedding dimensional reduction plot

Description

Graph output of a dimensional reduction technique on a 2D scatter plot where each point is a cell or feature and it's positioned based on the coembeddings determined by the reduction technique. By default, cells and their signature features are colored by their identity class (can be changed with the group.by parameter).

Usage

coembed_plot(
  seu,
  reduction,
  gene_txtdata = NULL,
  cell_label = NULL,
  xy_name = reduction,
  dims = c(1, 2),
  cols = NULL,
  shape_cg = c(1, 5),
  pt_size = 1,
  pt_text_size = 5,
  base_size = 16,
  base_family = "serif",
  legend.point.size = 5,
  legend.key.size = 1.5,
  alpha = 0.3
)
coembed_plot(
  seu,
  reduction,
  gene_txtdata = NULL,
  cell_label = NULL,
  xy_name = reduction,
  dims = c(1, 2),
  cols = NULL,
  shape_cg = c(1, 5),
  pt_size = 1,
  pt_text_size = 5,
  base_size = 16,
  base_family = "serif",
  legend.point.size = 5,
  legend.key.size = 1.5,
  alpha = 0.3
)

Arguments

`seu`	a Seurat object with coembedding in the reductions slot wiht component name reduction.
`reduction`	a string, specify the reduction component that denotes coembedding.
`gene_txtdata`	a data.frame object with columns indcluding 'gene' and 'label', specify the cell type/spatial domain and signature genes. Default as NULL, all features will be used in comebeddings.
`cell_label`	an optional character in columns of metadata, specify the group of cells/spots. Default as NULL, use Idents as the group.
`xy_name`	an optional character, specify the names of x and y-axis, default as the same as reduction.
`dims`	a postive integer vector with length 2, specify the two components for visualization.
`cols`	an optional string vector, specify the colors for cell group in visualization.
`shape_cg`	a positive integers with length 2, specify the shapes of cell/spot and feature in plot.
`pt_size`	an optional integer, specify the point size, default as 1.
`pt_text_size`	an optional integer, specify the point size of text, default as 5.
`base_size`	an optional integer, specify the basic size.
`base_family`	an optional character, specify the font.
`legend.point.size`	an optional integer, specify the point size of legend.
`legend.key.size`	an optional integer, specify the size of legend key.
`alpha`	an optional positive real, range from 0 to 1, specify the transparancy of points.

Details

None

Value

return a ggplot object

References

None

Examples

data(pbmc3k_subset)
data(top5_signatures)
coembed_plot(pbmc3k_subset, reduction = "UMAPsig",
 gene_txtdata = top5_signatures,  pt_text_size = 3, alpha=0.3)

data(pbmc3k_subset)
data(top5_signatures)
coembed_plot(pbmc3k_subset, reduction = "UMAPsig",
 gene_txtdata = top5_signatures,  pt_text_size = 3, alpha=0.3)

Calculate UMAP projections for coembedding of cells and features

Description

Calculate UMAP projections for coembedding of cells and features

Usage

coembedding_umap(
  seu,
  reduction,
  reduction.name,
  gene.set = NULL,
  slot = "data",
  assay = "RNA",
  seed = 1
)
coembedding_umap(
  seu,
  reduction,
  reduction.name,
  gene.set = NULL,
  slot = "data",
  assay = "RNA",
  seed = 1
)

Arguments

`seu`	a Seurat object with coembedding in the reductions slot wiht component name reduction.
`reduction`	a string, specify the reduction component that denotes coembedding.
`reduction.name`	a string, specify the reduction name for the obtained UMAP projection.
`gene.set`	a string vector, specify the features (genes) in calculating the UMAP projection, default as all features.
`slot`	an optional string, specify the slot in the assay, default as 'data'.
`assay`	an optional string, specify the assay name in the Seurat object when adding the UMAP projection.
`seed`	an optional integer, specify the random seed for reproducibility.

Details

None

Value

return a revised Seurat object by adding a new reduction component named 'reduction.name'.

References

None

Examples

data(pbmc3k_subset)
data(top5_signatures)

pbmc3k_subset <- coembedding_umap(
  pbmc3k_subset, reduction = "ncfm", reduction.name = "UMAPsig",
  gene.set = top5_signatures$gene
)



data(pbmc3k_subset)
data(top5_signatures)

pbmc3k_subset <- coembedding_umap(
  pbmc3k_subset, reduction = "ncfm", reduction.name = "UMAPsig",
  gene.set = top5_signatures$gene
)

A Seurat object including spatial transcriptomics dataset from CosMx platform

Description

This data is a subset of SCLC CosMx spatial transcriptomics dataset.

Usage

  
  data(CosMx_subset)  
data(CosMx_subset)

Format

A Seurat object, including count matrix, sptial coordinates, and manual annotation.

Source

The data is from the CosMx SRT sequencing platform.

References

None

Examples

  
  # Show some examples of how to use the dataset.  
  data(CosMx_subset)  
  library(Seurat)
  CosMx_subset
# Show some examples of how to use the dataset.  
  data(CosMx_subset)  
  library(Seurat)
  CosMx_subset

Determine the dimension of low dimensional embedding

Description

This function estimate the dimension of low dimensional embedding for a given cell by gene expression matrix. For more details, see Franklin et al. (1995) and Crawford et al. (2010).

Usage

diagnostic.cor.eigs(object, ...)

## Default S3 method:
diagnostic.cor.eigs(
  object,
  q_max = 50,
  plot = TRUE,
  n.sims = 10,
  parallel = TRUE,
  ncores = 10,
  seed = 1,
  ...
)

## S3 method for class 'Seurat'
diagnostic.cor.eigs(
  object,
  assay = NULL,
  slot = "data",
  nfeatures = 2000,
  q_max = 50,
  seed = 1,
  ...
)
diagnostic.cor.eigs(object, ...)

## Default S3 method:
diagnostic.cor.eigs(
  object,
  q_max = 50,
  plot = TRUE,
  n.sims = 10,
  parallel = TRUE,
  ncores = 10,
  seed = 1,
  ...
)

## S3 method for class 'Seurat'
diagnostic.cor.eigs(
  object,
  assay = NULL,
  slot = "data",
  nfeatures = 2000,
  q_max = 50,
  seed = 1,
  ...
)

Arguments

`object`	A Seurat or matrix object
`...`	Other arguments passed to `diagnostic.cor.eigs.default`.
`q_max`	the upper bound of low dimensional embedding. Default is 50.
`plot`	a indicator of whether plot eigen values.
`n.sims`	number of simulaton times. Default is 10.
`parallel`	a indicator of whether use parallel analysis.
`ncores`	the number of cores used in parallel analysis. Default is 10.
`seed`	a postive integer, specify the random seed for reproducibility
`assay`	an optional string, specify the name of assay in the Seurat object to be used.
`slot`	an optional string, specify the name of slot.
`nfeatures`	an optional integer, specify the number of features to select as top variable features. Default is 2000.

Value

A data.frame with attribute 'q_est' and 'plot', which is the estimated dimension of low dimensional embedding. In addition, this data.frame containing the following components:

q - The index of eigen values.
eig_value - The eigen values on observed data.
eig_sim - The mean value of eigen values of n.sims simulated data.
q_est - The selected dimension in attr(obj, 'q_est').
plot - The plot saved in attr(obj, 'plot').

References

1. Franklin, S. B., Gibson, D. J., Robertson, P. A., Pohlmann, J. T., & Fralish, J. S. (1995). Parallel analysis: a method for determining significant principal components. Journal of Vegetation Science, 6(1), 99-106.

2. Crawford, A. V., Green, S. B., Levy, R., Lo, W. J., Scott, L., Svetina, D., & Thompson, M. S. (2010). Evaluation of parallel analysis methods for determining the number of factors.Educational and Psychological Measurement, 70(6), 885-901.

Examples

n <- 100
p <- 50
d <- 15
object <- matrix(rnorm(n*d), n, d) %*% matrix(rnorm(d*p), d, p)
diagnostic.cor.eigs(object, n.sims=2)
n <- 100
p <- 50
d <- 15
object <- matrix(rnorm(n*d), n, d) %*% matrix(rnorm(d*p), d, p)
diagnostic.cor.eigs(object, n.sims=2)

Run FAST model for a PRECASTObj object

Description

Run FAST model for a PRECASTObj object

Usage

FAST(PRECASTObj, q = 15, fit.model = c("poisson", "gaussian"))
FAST(PRECASTObj, q = 15, fit.model = c("poisson", "gaussian"))

Arguments

`PRECASTObj`	a PRECASTObj object created by `CreatePRECASTObject`.
`q`	an optional integer, specify the number of low-dimensional embeddings to extract in FAST
`fit.model`	an optional string, specify the version of FAST to be fitted. The Gaussian version models the log-count matrices while the Poisson verions models the count matrices; default as poisson.

Value

Return a revised PRECASTObj object with slot PRECASTObj@resList added by a FAST compoonent.

References

None

(Varitional) ICM-EM algorithm for implementing FAST model

Description

(Varitional) ICM-EM algorithm for implementing FAST model

Usage

FAST_run(
  XList,
  AdjList,
  q = 15,
  fit.model = c("gaussian", "poisson"),
  AList = NULL,
  maxIter = 25,
  epsLogLik = 1e-05,
  verbose = TRUE,
  seed = 1,
  error_heter = TRUE,
  Psi_diag = FALSE,
  Vint_zero = FALSE
)
FAST_run(
  XList,
  AdjList,
  q = 15,
  fit.model = c("gaussian", "poisson"),
  AList = NULL,
  maxIter = 25,
  epsLogLik = 1e-05,
  verbose = TRUE,
  seed = 1,
  error_heter = TRUE,
  Psi_diag = FALSE,
  Vint_zero = FALSE
)

Arguments

`XList`	an M-length list consisting of multiple matrices with class `dgCMatrix` or `matrix` that specifies the count/log-count gene expression matrix for each data batch used for FAST model.
`AdjList`	an M-length list of sparse matrices with class `dgCMatrix`, specify the adjacency matrix used for intrisic CAR model in FAST. We provide this interface for those users who would like to define the adjacency matrix by themselves.
`q`	an optional integer, specify the number of low-dimensional embeddings to extract in FAST. Larger q means more information extracted.
`fit.model`	an optional string, specify the version of FAST to be fitted. The Gaussian version models the log-count matrices while the Poisson verions models the count matrices; default as `gaussian` due to fastter computation.
`AList`	an optional list with each component being a vector whose length is equal to the rows of component in `XList`, specify the normalization factor in FAST. The default is `NULL` that means the normalization factor equal to 1.
`maxIter`	the maximum iteration of ICM-EM algorithm. The default is 30.
`epsLogLik`	an optional positive vlaue, tolerance of relative variation rate of the observed pseudo loglikelihood value, defualt as '1e-5'.
`verbose`	a logical value, whether output the information in iteration.
`seed`	a postive integer, the random seed to be set in initialization.
`error_heter`	a logical value, whether use the heterogenous error for FAST model, default as `TRUE`. If `error.heter=FALSE`, then the homogenuous error is used.
`Psi_diag`	a logical value, whether set the conditional covariance matrix of the intrisic CAR to diagonal, default as `FALSE`.
`Vint_zero`	an optional logical value, specify whether the intial value of intrisic CAR component is set to zero; default as `FALSE`.

Details

None

Value

return a list including the following components: (1) hV: an M-length list consisting of spatial embeddings in FAST; (2) nu: the estimated intercept vector; (3) Psi: the estimated covariance matrix; (4) W: the estimated shared loading matrix; (5) Lam: the estimated covariance matrix of error term; (6): ELBO: the ELBO value when algorithm convergence; (7) ELBO_seq: the ELBO values for all itrations.

References

None

Fit FAST model for single-section SRT data

Description

Fit FAST model for single-section SRT data.

Usage

FAST_single(
  seu,
  Adj_sp,
  q = 15,
  fit.model = c("poisson", "gaussian"),
  slot = "data",
  assay = NULL,
  reduction.name = "fast",
  verbose = TRUE,
  ...
)
FAST_single(
  seu,
  Adj_sp,
  q = 15,
  fit.model = c("poisson", "gaussian"),
  slot = "data",
  assay = NULL,
  reduction.name = "fast",
  verbose = TRUE,
  ...
)

Arguments

`seu`	a Seurat object.
`Adj_sp`	a sparse matrix, specify the adjacency matrix among spots.
`q`	an optional integer, specify the number of low-dimensional embeddings to extract in FAST. Larger q means more information extracted.
`fit.model`	an optional string, specify the version of FAST to be fitted. The Gaussian version models the log-count matrices while the Poisson verions models the count matrices; default as possion model.
`slot`	an optional string, specify the slot in Seurat object as the input of FAST model, default as 'data'.
`assay`	an optional string, specify the assay in Seurat object, default as 'NULL' that means the default assay in Seurat object.
`reduction.name`	an optional string, specify the reduction name for the fast embedding, default as 'fast'.
`verbose`	a logical value, whether output the information in iteration.
`...`	other arguments passed to `FAST_run`.

Value

return a list including the parameters set in the arguments.

(Varitional) ICM-EM algorithm for implementing FAST model with structurized parameters

Description

(Varitional) ICM-EM algorithm for implementing FAST model with structurized parameters

Usage

FAST_structure(
  XList,
  AdjList,
  q = 15,
  fit.model = c("poisson", "gaussian"),
  parameterList = NULL
)
FAST_structure(
  XList,
  AdjList,
  q = 15,
  fit.model = c("poisson", "gaussian"),
  parameterList = NULL
)

Arguments

`XList`	an M-length list consisting of multiple matrices with class dgCMatrix or matrix that specify the count/log-count gene expression matrix for each data batch used for FAST model.
`AdjList`	an M-length list of sparse matrices with class dgCMatrix, specify the adjacency matrix used for intrisic CAR model in FAST. We provide this interface for those users who would like to define the adjacency matrix by themselves.
`q`	an optional integer, specify the number of low-dimensional embeddings to extract in FAST
`fit.model`	an optional string, specify the version of FAST to be fitted. The Gaussian version models the log-count matrices while the Poisson verions models the count matrices; default as gaussian due to fastter computation.
`parameterList`	an optional list, specify other parameters in FAST model; see `model_set_FAST` for other paramters. The default is `NULL` that means the default parameters produced by `model_set_FAST` is used.

Details

None

Value

References

None

Find the signature genes for each group of cell/spots

Description

Find the signature genes for each group of cell/spots based on coembedding distance and expression ratio.

Usage

find.signature.genes(
  seu,
  distce.assay = "distce",
  ident = NULL,
  expr.prop.cutoff = 0.1,
  assay = NULL,
  genes.use = NULL
)
find.signature.genes(
  seu,
  distce.assay = "distce",
  ident = NULL,
  expr.prop.cutoff = 0.1,
  assay = NULL,
  genes.use = NULL
)

Arguments

`seu`	a Seurat object with coembedding in the reductions slot wiht component name reduction.
`distce.assay`	an optional character, specify the assay name that constains distance matrix beween cells/spots and features, default as 'distce' (distance of coembeddings).
`ident`	an optional character in columns of metadata, specify the group of cells/spots. Default as NULL, use Idents as the group.
`expr.prop.cutoff`	an optional postive real ranging from 0 to 1, specify cutoff of expression proportion of features, default as 0.1.
`assay`	an optional character, specify the assay in seu, default as NULL, representing the default assay in seu.
`genes.use`	an optional string vector, specify genes as the signature candidates.

Details

In each data.frame object of the returned value, the row.names are gene names, and these genes are sorted by decreasing order of 'distance'. User can define the signature genes as top n genes in distance and that the 'expr.prop' larger than a cutoff. We set the cutoff as 0.1.

Value

return a list with each component a data.frame object having two columns: 'distance' and 'expr.prop'.

References

None

Examples

library(Seurat)
data(pbmc3k_subset)
pbmc3k_subset <- pdistance(pbmc3k_subset, reduction='ncfm')
df_list_rna <- find.signature.genes(pbmc3k_subset)

library(Seurat)
data(pbmc3k_subset)
pbmc3k_subset <- pdistance(pbmc3k_subset, reduction='ncfm')
df_list_rna <- find.signature.genes(pbmc3k_subset)

Calcuate the the adjusted McFadden's pseudo R-square

Description

Calcuate the the adjusted McFadden's pseudo R-square between the embeddings and the labels

Usage

get_r2_mcfadden(embeds, y)
get_r2_mcfadden(embeds, y)

Arguments

`embeds`	a n-by-q matrix, specify the embedding matrix.
`y`	a n-length vector, specify the labels.

Details

None

Value

return the adjusted McFadden's pseudo R-square.

References

McFadden, D. (1987). Regression-based specification tests for the multinomial logit model. Journal of econometrics, 34(1-2), 63-82.

Obtain the top signature genes and related information

Description

Obtain the top signature genes and related information.

Usage

get.top.signature.dat(df.list, ntop = 5, expr.prop.cutoff = 0.1)
get.top.signature.dat(df.list, ntop = 5, expr.prop.cutoff = 0.1)

Arguments

`df.list`	a list that is obtained by the function `find.signature.genes`.
`ntop`	an optional positive integer, specify the how many top signature genes extracted, default as 5.
`expr.prop.cutoff`	an optional postive real ranging from 0 to 1, specify cutoff of expression proportion of features, default as 0.1.

Details

Using this funciton, we obtain the top signature genes and organize them into a data.frame. The 'row.names' are gene names. The colname 'distance' means the distance between gene (i.e., VPREB3) and cells with the specific cell type (i.e., B cell), which is calculated based on the coembedding of genes and cells in the coembedding space. The distance is smaller, the association between gene and the cell type is stronger. The colname 'expr.prop' represents the expression proportion of the gene (i.e., VPREB3) within the cell type (i.e., B cell). The colname 'label' means the cell types and colname 'gene' denotes the gene name. By the data.frame object, we know 'VPREB3' is the one of the top signature gene of B cell.

Value

return a 'data.frame' object with four columns: 'distance','expr.prop', 'label' and 'gene'.

References

None

Examples

library(Seurat)
data(pbmc3k_subset)
pbmc3k_subset <- pdistance(pbmc3k_subset, reduction='ncfm')
df_list_rna <- find.signature.genes(pbmc3k_subset)
dat.sig <- get.top.signature.dat(df_list_rna, ntop=5)
head(dat.sig)
library(Seurat)
data(pbmc3k_subset)
pbmc3k_subset <- pdistance(pbmc3k_subset, reduction='ncfm')
df_list_rna <- find.signature.genes(pbmc3k_subset)
dat.sig <- get.top.signature.dat(df_list_rna, ntop=5)
head(dat.sig)

Integrate multiple SRT data into a Seurat object

Description

Integrate multiple SRT data based on the PRECASTObj object by FAST and other model fitting.

Usage

IntegrateSRTData(
  PRECASTObj,
  seulist_HK,
  Method = c("iSC-MEB", "HarmonyLouvain"),
  seuList_raw = NULL,
  covariates_use = NULL,
  Tm = NULL,
  subsample_rate = 1,
  verbose = TRUE
)
IntegrateSRTData(
  PRECASTObj,
  seulist_HK,
  Method = c("iSC-MEB", "HarmonyLouvain"),
  seuList_raw = NULL,
  covariates_use = NULL,
  Tm = NULL,
  subsample_rate = 1,
  verbose = TRUE
)

Arguments

`PRECASTObj`	a PRECASTObj object created by `CreatePRECASTObject`.
`seulist_HK`	a list with Seurat object as component including only the housekeeping genes.
`Method`	a string, specify the method to be used and two methods are supprted: `iSC-MEB` and `HarmonyLouvain`. The default is `iSC-MEB`.
`seuList_raw`	an optional list with Seurat object, the raw data.
`covariates_use`	a string vector, the colnames in `PRECASTObj@seulist[[1]]@meta.data`, representing other biological covariates to considered when removing batch effects. This is achieved by adding additional covariates for biological conditions in the regression, such as case or control. Default as 'NULL', denoting no other covariates to be considered.
`Tm`	an optional numeric vector with the length equal to `PRECASTObj@seulist`, the time point information if the data include the temporal information. Default as `NULL` that means there is no temporal information.
`subsample_rate`	a real ranging in (0,1], specify the rate of spot drawing for speeding up the computation when the number of spots is very large. Default is 1, meaing using all spots.
`verbose`	an optional logical value, default as `TRUE`.

Details

If seuList_raw is not equal NULL or PRECASTObj@seuList is not NULL, this function will remove the unwanted variations for all genes in seuList_raw object. Otherwise, only the the unwanted variation of genes in PRECASTObj@seulist will be removed. The former requires a big memory to be run, while the latter not. To speed up the computation when the number of spots is very large, we also provide a subsampling schema controlled by the arugment subsample_rate. When the total number of spots is larger than 80,000, this function will automatically draws 50,000 spots to calculate the paramters in the spatial linear model for removing unwanted variations.

Value

Return a Seurat object by integrating all SRT data batches into a SRT data, where the column "batch" in the meta.data represents the batch ID, and the column "cluster" represents the clusters. The embeddings are put in seu@reductions slot and Idents(seu) is set to cluster label. Note that only the normalized expression is valid in the data slot while count is invalid.

Fit an iSC-MEB model using specified multi-section embeddings

Description

Integrate multiple SRT data based on the PRECASTObj by FAST and iSC-MEB model fitting.

Usage

iscmeb_run(
  VList,
  AdjList,
  K,
  beta_grid = seq(0, 5, by = 0.2),
  maxIter = 25,
  epsLogLik = 1e-05,
  verbose = TRUE,
  int.model = "EEE",
  init.start = 1,
  Sigma_equal = FALSE,
  Sigma_diag = TRUE,
  seed = 1
)
iscmeb_run(
  VList,
  AdjList,
  K,
  beta_grid = seq(0, 5, by = 0.2),
  maxIter = 25,
  epsLogLik = 1e-05,
  verbose = TRUE,
  int.model = "EEE",
  init.start = 1,
  Sigma_equal = FALSE,
  Sigma_diag = TRUE,
  seed = 1
)

Arguments

`VList`	a M-length list of embeddings. The i-th element is a ni * q matrtix, where ni is the number of spots of sample i, and q is the number of embeddings. We provide this interface for those users who would like to define the embeddings by themselves.
`AdjList`	an M-length list of sparse matrices with class `dgCMatrix`, specify the adjacency matrix used for intrisic CAR model in FAST. We provide this interface for those users who would like to define the adjacency matrix by themselves.
`K`	an integer, specify the number of clusters.
`beta_grid`	an optional vector of positive value, the candidate set of the smoothing parameter to be searched by the grid-search optimization approach, defualt as a sequence starts from 0, ends with 5, increase by 0.2.
`maxIter`	the maximum iteration of ICM-EM algorithm. The default is 25.
`epsLogLik`	a string, the species, one of 'Human' and 'Mouse'.
`verbose`	an optional intger, spcify the number of housekeeping genes to be selected.
`int.model`	an optional string, specify which Gaussian mixture model is used in evaluting the initial values for iSC.MEB, default as "EEE"; and see `Mclust` for more models' names.
`init.start`	an optional number of times to calculate the initial value (1 by default). When init.start is larger than 1, initial value will be determined by log likelihood of mclust results.
`Sigma_equal`	an optional logical value, specify whether Sigmaks are equal, default as `FALSE`.
`Sigma_diag`	an optional logical value, specify whether Sigmaks are diagonal matrices, default as `TRUE`.
`seed`	an optional integer, the random seed in fitting iSC-MEB model.

Value

returns a iSCMEBResObj object which contains all model results.

Set parameters for FAST model

Description

Prepare parameters setup for FAST model fitting.

Usage

model_set_FAST(
  maxIter = 30,
  epsLogLik = 1e-05,
  error_heter = TRUE,
  Psi_diag = FALSE,
  verbose = TRUE,
  seed = 1
)
model_set_FAST(
  maxIter = 30,
  epsLogLik = 1e-05,
  error_heter = TRUE,
  Psi_diag = FALSE,
  verbose = TRUE,
  seed = 1
)

Arguments

`maxIter`	the maximum iteration of ICM-EM algorithm. The default is 30.
`epsLogLik`	an optional positive vlaue, tolerance of relative variation rate of the observed pseudo loglikelihood value, defualt as '1e-5'.
`error_heter`	a logical value, whether use the heterogenous error for FAST model, default as `TRUE`. If `error.heter=FALSE`, then the homogenuous error is used.
`Psi_diag`	a logical value, whether set the conditional covariance matrices of intrisic CAR to diagonal, default as `FALSE`
`verbose`	a logical value, whether output the information in iteration.
`seed`	a postive integer, the random seed to be set in initialization.

Value

return a Seurat object with new reduction (named reduction.name) added to the 'reductions' slot.

Examples

model_set_FAST(maxIter = 30, epsLogLik = 1e-5,
  error_heter=TRUE, Psi_diag=FALSE, verbose=TRUE, seed=2023)

model_set_FAST(maxIter = 30, epsLogLik = 1e-5,
  error_heter=TRUE, Psi_diag=FALSE, verbose=TRUE, seed=2023)

Cell-feature coembedding for scRNA-seq data

Description

Cell-feature coembedding for scRNA-seq data based on FAST model.

Usage

NCFM(
  object,
  assay = NULL,
  slot = "data",
  nfeatures = 2000,
  q = 10,
  reduction.name = "ncfm",
  weighted = FALSE,
  var.features = NULL
)
NCFM(
  object,
  assay = NULL,
  slot = "data",
  nfeatures = 2000,
  q = 10,
  reduction.name = "ncfm",
  weighted = FALSE,
  var.features = NULL
)

Arguments

`object`	a Seurat object.
`assay`	an optional string, specify the name of assay in the Seurat object to be used, 'NULL' means default assay in seu.
`slot`	an optional string, specify the name of slot.
`nfeatures`	an optional integer, specify the number of features to select as top variable features. Default is 2000.
`q`	an optional positive integer, specify the dimension of low dimensional embeddings to compute and store. Default is 10.
`reduction.name`	an optional string, specify the dimensional reduction name, 'ncfm' by default.
`weighted`	an optional logical value, specify whether use weighted method.
`var.features`	an optional string vector, specify the variable features used to calculate cell embedding.

Examples

data(pbmc3k_subset)
pbmc3k_subset <- NCFM(pbmc3k_subset)
data(pbmc3k_subset)
pbmc3k_subset <- NCFM(pbmc3k_subset)

Cell-feature coembedding for SRT data

Description

Run cell-feature coembedding for SRT data based on FAST model.

Usage

NCFM_fast(
  object,
  Adj_sp,
  assay = NULL,
  slot = "data",
  nfeatures = 2000,
  q = 10,
  reduction.name = "fast",
  var.features = NULL,
  ...
)
NCFM_fast(
  object,
  Adj_sp,
  assay = NULL,
  slot = "data",
  nfeatures = 2000,
  q = 10,
  reduction.name = "fast",
  var.features = NULL,
  ...
)

Arguments

`object`	a Seurat object.
`Adj_sp`	a sparse matrix, specify the adjacency matrix among spots.
`assay`	an optional string, the name of assay used.
`slot`	an optional string, the name of slot used.
`nfeatures`	an optional postive integer, the number of features to select as top variable features. Default is 2000.
`q`	an optional positive integer, specify the dimension of low dimensional embeddings to compute and store. Default is 10.
`reduction.name`	an optional string, dimensional reduction name, 'fast' by default.
`var.features`	an optional string vector, specify the variable features, used to calculate cell embedding.
`...`	Other argument passed to the `FAST_run`.

Examples

data(CosMx_subset)
pos <- as.matrix([email protected][,c("x", "y")])
Adj_sp <- AddAdj(pos)
# Here, we set maxIter = 3 for fast computation and demonstration.
CosMx_subset <- NCFM_fast(CosMx_subset, Adj_sp = Adj_sp, maxIter=3)

data(CosMx_subset)
pos <- as.matrix(CosMx_subset@meta.data[,c("x", "y")])
Adj_sp <- AddAdj(pos)
# Here, we set maxIter = 3 for fast computation and demonstration.
CosMx_subset <- NCFM_fast(CosMx_subset, Adj_sp = Adj_sp, maxIter=3)

A Seurat object including scRNA-seq PBMC dataset

Description

This data is a subset of PBMC3k scRNA-seq data in SeuratData package.

Usage

  
  data(pbmc3k_subset)  
data(pbmc3k_subset)

Format

A Seurat object, including count matrix, and manual annotation.

Source

The data is from the scRNA-seq sequencing platform.

References

None

Examples

  
  # Show  examples of how to use the dataset.  
  data(pbmc3k_subset)  
  library(Seurat)
  pbmc3k_subset
# Show  examples of how to use the dataset.  
  data(pbmc3k_subset)  
  library(Seurat)
  pbmc3k_subset

Calculate the cell-feature distance matrix

Description

Calculate the cell-feature distance matrix based on coembeddings.

Usage

pdistance(object, reduction = "fast", assay.name = "distce", eta = 1e-10)
pdistance(object, reduction = "fast", assay.name = "distce", eta = 1e-10)

Arguments

`object`	a Seurat object.
`reduction`	a opstional string, dimensional reduction name, 'fast' by default.
`assay.name`	a opstional string, specify the new generated assay name, 'distce' by default.
`eta`	an optional postive real, a quantity to avoid numerical errors. 1e-10 by default.

Details

This function calculate the distance matrix between cells/spots and features, and then put the distance matrix in a new generated assay. This distance matrix will be used in the siganture gene identification.

Examples

data(pbmc3k_subset)
pbmc3k_subset <- NCFM(pbmc3k_subset)
pbmc3k_subset <- pdistance(pbmc3k_subset, "ncfm")
data(pbmc3k_subset)
pbmc3k_subset <- NCFM(pbmc3k_subset)
pbmc3k_subset <- pdistance(pbmc3k_subset, "ncfm")

Embedding alignment and clustering based on the embeddings from FAST

Description

Embedding alignment and clustering using the Harmony and Louvain based on the ebmeddings from FAST as well as determining the number of clusters.

Usage

RunHarmonyLouvain(PRECASTObj, resolution = 0.5)
RunHarmonyLouvain(PRECASTObj, resolution = 0.5)

Arguments

`PRECASTObj`	a PRECASTObj object created by `CreatePRECASTObject`.
`resolution`	an optional real, the value of the resolution parameter, use a value above (below) 1.0 if you want to obtain a larger (smaller) number of communities.

Value

Return a revised PRECASTObj object with slot PRECASTObj@resList added by a Harmony compoonent (including the aligned embeddings and embeddings of batch effects) and a Louvain component (including the clusters).

Fit an iSC-MEB model using the embeddings from FAST

Description

Fit an iSC-MEB model using the embeddings from FAST and the number of clusters obtained by Louvain.

Usage

RuniSCMEB(PRECASTObj, ...)
RuniSCMEB(PRECASTObj, ...)

Arguments

`PRECASTObj`	a PRECASTObj object created by `CreatePRECASTObject`.
`...`	other arguments passed to `iscmeb_run`.

Value

Return a revised PRECASTObj object with an added component iSCMEB in the slot PRECASTObj@resList (including the aligned embeddings, clusters and posterior probability matrix of clusters).

Select housekeeping genes

Description

Select housekeeping genes for preparation of removing unwanted variations in expression matrices

Usage

SelectHKgenes(seuList, species = c("Human", "Mouse"), HK.number = 200)
SelectHKgenes(seuList, species = c("Human", "Mouse"), HK.number = 200)

Arguments

`seuList`	an M-length list consisting of Seurat object, include the information of expression matrix and spatial coordinates (named `row` and `col`) in the slot `meta.data`.
`species`	a string, the species, one of 'Human' and 'Mouse'.
`HK.number`	an optional integer, specify the number of housekeeping genes to be selected.

Value

Return a string vector of the selected gene names.

A data.frame object including top five signature genes in scRNA-seq PBMC dataset

Description

This data is a data.frame object that includes top five signature genes in scRNA-seq PBMC dataset

Usage

  
  data(top5_signatures)  
data(top5_signatures)

Format

A data.frame object, including signature genes, distance, and manual annotation.

Source

None

References

None

Examples

  
  # Show  examples of how to use the dataset.  
  data(top5_signatures)  
  head(top5_signatures)
# Show  examples of how to use the dataset.  
  data(top5_signatures)  
  head(top5_signatures)

Transfer gene names from one fortmat to the other format

Description

Transfer gene names from one fortmat to the other format for two species: human and mouse.

Usage

transferGeneNames(
  genelist,
  now_name = "ensembl",
  to_name = "symbol",
  species = c("Human", "Mouse"),
  Method = c("eg.db", "biomart")
)
transferGeneNames(
  genelist,
  now_name = "ensembl",
  to_name = "symbol",
  species = c("Human", "Mouse"),
  Method = c("eg.db", "biomart")
)

Arguments

`genelist`	a string vector, the gene list to be transferred.
`now_name`	a string, the current format of gene names, one of 'ensembl', 'symbol'.
`to_name`	a string, the format of gene names to transfer, one of 'ensembl', 'symbol'.
`species`	a string, the species, one of 'Human' and 'Mouse'.
`Method`	a string, the method to use, one of 'biomaRt' and 'eg.db', default as 'eg.db'.

Value

Return a string vector of transferred gene names. The gene names not matched in the database will not change.

Examples

geneNames <- c("ENSG00000171885", "ENSG00000115756")
transferGeneNames(geneNames, now_name = "ensembl", to_name="symbol",species="Human", Method='eg.db')


geneNames <- c("ENSG00000171885", "ENSG00000115756")
transferGeneNames(geneNames, now_name = "ensembl", to_name="symbol",species="Human", Method='eg.db')

Package 'ProFAST'

Help Index

Calculate the adjacency matrix given a spatial coordinate matrix

Description

Usage

Arguments

Details

Value

References

See Also

Examples

Add FAST model settings for a PRECASTObj object

Description

Usage

Arguments

Value

References

Coembedding dimensional reduction plot

Description

Usage

Arguments

Details

Value

References

See Also

Examples

Calculate UMAP projections for coembedding of cells and features

Description

Usage

Arguments

Details

Value

References

See Also

Examples

A Seurat object including spatial transcriptomics dataset from CosMx platform

Description

Usage

Format

Source

References

Examples

Determine the dimension of low dimensional embedding

Description

Usage

Arguments

Value

References

Examples

Run FAST model for a PRECASTObj object

Description

Usage

Arguments

Value

References

(Varitional) ICM-EM algorithm for implementing FAST model

Description

Usage

Arguments

Details

Value

References

See Also

Fit FAST model for single-section SRT data

Description

Usage

Arguments

Value

See Also

(Varitional) ICM-EM algorithm for implementing FAST model with structurized parameters

Description

Usage

Arguments

Details

Value

References

See Also

Find the signature genes for each group of cell/spots

Description

Usage