This vignette introduces the FAST
workflow for the analysis of multiple simulated spatial transcriptomics
dataset. FAST workflow is based on the PRECASTObj object created in the
PRECAST
R package and the workflow of FAST is similar to
that of PRECAST; see (https://feiyoung.github.io/PRECAST/articles/PRECAST.BreastCancer.html)
for the workflow of PRECAST. The workflow of FAST consists of three
steps
We demonstrate the use of FAST to three simulated ST data that are here, which can be downloaded to the current working path by the following command:
githubURL <- "https://github.com/feiyoung/ProFAST/blob/main/vignettes_data/simu3.rds?raw=true"
download.file(githubURL,"simu3.rds",mode='wb')
Then load to R
The package can be loaded with the command:
First, we view the the three simulated spatial transcriptomics data with ST platform. There are 200 genes for each data batch and ~2000 spots in total
Check the content in simu3
.
We show how to create a PRECASTObject object step by step. First, we
create a Seurat list object using the count matrix and meta data of each
data batch. Although simu3
is a prepared Seurat list
object, we re-create a same objcet seuList to show the details.
row
and col
, which benefits the
identification of spaital coordinates by FAST## Get the gene-by-spot read count matrices
countList <- lapply(simu3, function(x) x[["RNA"]]@counts)
## Check the spatial coordinates: Yes, they are named as "row" and "col"!
head(simu3[[1]]@meta.data)
## Get the meta data of each spot for each data batch
metadataList <- lapply(simu3, function(x) x@meta.data)
## ensure the row.names of metadata in metaList are the same as that of colnames count matrix in countList
M <- length(countList)
for(r in 1:M){
row.names(metadataList[[r]]) <- colnames(countList[[r]])
}
## Create the Seurat list object
seuList <- list()
for(r in 1:M){
seuList[[r]] <- CreateSeuratObject(counts = countList[[r]], meta.data=metadataList[[r]], project = "FASTsimu")
}
Next, we use CreatePRECASTObject()
to create a
PRECASTObject object based on the Seurat list object
seuList
. Users are able to see https://feiyoung.github.io/PRECAST/articles/PRECAST.BreastCancer.html
for what is done in this function. Since this is a simulated dataset, we
used all the 200 genes by using a custom gene list
customGenelist=custom_genelist)
. We observe that there are
only 197 genes passing the filtering step.
## Create PRECASTObject
custom_genelist <- row.names(seuList[[1]])
set.seed(2023)
PRECASTObj <- CreatePRECASTObject(seuList, customGenelist=custom_genelist)
## User can retain the raw seuList by the following commond.
## PRECASTObj <- CreatePRECASTObject(seuList, customGenelist=row.names(seuList[[1]]), rawData.preserve = TRUE)
## check the number of genes/features after filtering step
PRECASTObj@seulist
Add adjacency matrix list and parameter setting of FAST. More model
setting parameters can be found in model_set_FAST()
.
## seuList is null since the default value `rawData.preserve` is FALSE.
PRECASTObj@seuList
## Add adjacency matrix list for a PRECASTObj object to prepare for FAST model fitting.
PRECASTObj <- AddAdjList(PRECASTObj, platform = "ST")
## Add a model setting in advance for a PRECASTObj object: verbose =TRUE helps outputing the information in the algorithm;
PRECASTObj <- AddParSettingFAST(PRECASTObj, verbose=TRUE)
## Check the parameters
PRECASTObj@parameterList
For function FAST
, users can specify the number of
factors q
and the fitted model fit.model
. The
q
sets the number of spatial factors to be extracted, and a
lareger one means more information to be extracted but higher
computaional cost. The fit.model
specifies the version of
FAST to be fitted. The Gaussian version (gaussian
) models
the log-count matrices while the Poisson verion (poisson
)
models the count matrices; default as poisson
.
Next, we investigate the performance of dimension reduction by
calculating the adjusted McFadden’s pseudo R-square for each data batch.
The simulated true labels is in the meta.data
of each
component of PRECASTObj@seulist
.
Based on the embeddings from FAST, we use Harmony
to
align the embeddings then followed by Louvain clustering to obtain the
cluster labels. In this downstream analysis, other methods for embedding
alignment and clustering can be also used. In the vignette of two
sections of DLPFC Visium data, we will show another method
(iSC-MEB
) to jointly perform embedding alignment and
spatial clustering.
In the following, we remove the unwanted variations in the log-normalized expression matrices to obtain a combined log-normalized expression matrix in a Seurat object. In the context of the simulated data used in this study, housekeeping genes are not present, thus, we turn to another method to remove the unwanted variations. Specifically, we leverage the batch effect embeddings estimated through Harmony to capture and mitigate unwanted variations. Additionally, we utilize the cluster labels obtained via Louvain clustering to retain the desired biological effects.
The estimated embeddings of batch effects (batchEmbed) are in the
slot PRECASTObj@resList$Harmony
and cluster labels
(cluster) are in the slot PRECASTObj@resList$Louvain
.
Then, we integrate the three sections by removing the unwanted
variations and setting seulist_HK=NULL
and
Method = "HarmonyLouvain"
in the function
IntegrateSRTData()
. After obtaining seuInt
, we
will see there are three embeddings: FAST
,
harmony
and position
, in the slot
seuInt@reductions
. FAST
are the embeddings
obtained by FAST model fitting and are uncorrected embeddings that may
includes the unwanted effects (i.e., batch effects);
harmony
are the embeddings obtained by Harmony fitting and
are aligned embeddings; and position
are the spatial
coordinates.
First, user can choose a beautiful color schema using
chooseColors()
in the R package PRECAST
.
Then, we plot the spatial scatter plot for clusters using the
function SpaPlot()
in the R package
PRECAST
.
p12 <- SpaPlot(seuInt, item = "cluster", batch = NULL, point_size = 1, cols = cols_cluster, combine = TRUE)
p12
Users can re-plot the above figures for specific need by returning a
ggplot list object. For example, we plot the spatial heatmap using a
common legend by using the function drawFigs()
in the R
package PRECAST
.
pList <- SpaPlot(seuInt, item = "cluster", title_name= 'Section',batch = NULL, point_size = 1, cols = cols_cluster, combine = FALSE)
drawFigs(pList, layout.dim = c(1, 3), common.legend = TRUE, legend.position = "right", align = "hv")
We use the function AddUMAP()
in the R package
PRECAST
to obtain the three-dimensional components of UMAP
using the aligned embeddings in the reduction harmony
.
We plot the spatial tNSE RGB plot to illustrate the performance in extracting features.
p13 <- SpaPlot(seuInt, batch = NULL, item = "RGB_UMAP", point_size = 1, combine = FALSE, text_size = 15)
drawFigs(p13, layout.dim = c(1, 3), common.legend = TRUE, legend.position = "right", align = "hv")
We use the function AddUTSNE()
in the R package
PRECAST
to obtain the two-dimensional components of UMAP
using the aligned embeddings in the reduction harmony
, and
plot the tSNE plot based on the extracted features to check the
performance of integration.
seuInt <- AddTSNE(seuInt, n_comp = 2, reduction = 'harmony', assay = 'RNA')
p1 <- dimPlot(seuInt, item = "cluster", point_size = 0.5, font_family = "serif", cols = cols_cluster,
border_col = "gray10", legend_pos = "right") # Times New Roman
p2 <- dimPlot(seuInt, item = "batch", point_size = 0.5, font_family = "serif", legend_pos = "right")
drawFigs(list(p1, p2), layout.dim = c(1, 2), legend.position = "right")
Finally, we condut the combined differential expression analysis
using the integrated log-normalized expression matrix saved in the
seuInt
object. The function FindAllMarkers()
in the Seurat
R package is ued to achieve this
analysis.
dat_deg <- FindAllMarkers(seuInt)
library(dplyr)
n <- 5
dat_deg %>%
group_by(cluster) %>%
top_n(n = n, wt = avg_log2FC) -> top10
seuInt <- ScaleData(seuInt)
Plot dot plot of normalized expressions for each spatial domain identified by using the FAST embeddings.
col_here <- c("#F2E6AB", "#9C0141")
library(ggplot2)
p1 <- DotPlot(seuInt, features=unname(top10$gene), cols=col_here, # idents = ident_here,
col.min = -1, col.max = 1) + coord_flip()+ theme(axis.text.y = element_text(face = "italic"))+
ylab("Domain") + xlab(NULL) + theme(axis.text.x = element_text(size=12, angle = 25, hjust = 1, family='serif'),
axis.text.y = element_text(size=12, face= "italic", family='serif'))
p1
sessionInfo()
#> R version 4.4.2 (2024-10-31)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.1 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] rmarkdown_2.29
#>
#> loaded via a namespace (and not attached):
#> [1] digest_0.6.37 R6_2.5.1 fastmap_1.2.0 xfun_0.49
#> [5] maketools_1.3.1 cachem_1.1.0 knitr_1.49 htmltools_0.5.8.1
#> [9] buildtools_1.0.0 lifecycle_1.0.4 cli_3.6.3 sass_0.4.9
#> [13] jquerylib_0.1.4 compiler_4.4.2 sys_3.4.3 tools_4.4.2
#> [17] evaluate_1.0.1 bslib_0.8.0 yaml_2.3.10 jsonlite_1.8.9
#> [21] rlang_1.1.4