--- title: 'FAST: simulation' author: "Wei Liu" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{FAST: simulation} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` This vignette introduces the FAST workflow for the analysis of multiple simulated spatial transcriptomics dataset. FAST workflow is based on the PRECASTObj object created in the `PRECAST` R package and the workflow of FAST is similar to that of PRECAST; see (https://feiyoung.github.io/PRECAST/articles/PRECAST.BreastCancer.html) for the workflow of PRECAST. The workflow of FAST consists of three steps * Independent preprocessing and model setting * Spatial dimension reduction using FAST * Downstream analysis (i.e. , embedding alignment using Harmony and clustering using Louvain, visualization of clusters and embeddings, remove unwanted variation, combined differential expression analysis) * Except for the above downstream analyses, cell-cell interaction analysis and trajectory inference can also be performed on the basis of the embeddings obtained by FAST. We demonstrate the use of FAST to three simulated ST data that are [here](https://github.com/feiyoung/ProFAST/tree/main/vignettes_data), which can be downloaded to the current working path by the following command: ```{r eval=FALSE} githubURL <- "https://github.com/feiyoung/ProFAST/blob/main/vignettes_data/simu3.rds?raw=true" download.file(githubURL,"simu3.rds",mode='wb') ``` Then load to R ```{r eval = FALSE} load("simu3.rds") ``` The package can be loaded with the command: ```{r eval = FALSE} library(ProFAST) # load the package of FAST method library(PRECAST) library(Seurat) ``` ## View the simulated data First, we view the the three simulated spatial transcriptomics data with ST platform. There are 200 genes for each data batch and ~2000 spots in total ```{r eval = FALSE} simu3 ## a list including three Seurat object with default assay: RNA ``` Check the content in `simu3`. ```{r eval= FALSE} head(simu3[[1]]) ``` ```{r eval= FALSE} row.names(simu3[[1]])[1:10] ``` ## Create a PRECASTObject object We show how to create a PRECASTObject object step by step. First, we create a Seurat list object using the count matrix and meta data of each data batch. Although `simu3` is a prepared Seurat list object, we re-create a same objcet seuList to show the details. * Note: the spatial coordinates must be contained in the meta data and named as `row` and `col`, which benefits the identification of spaital coordinates by FAST ```{r eval = FALSE} ## Get the gene-by-spot read count matrices countList <- lapply(simu3, function(x) x[["RNA"]]@counts) ## Check the spatial coordinates: Yes, they are named as "row" and "col"! head(simu3[[1]]@meta.data) ## Get the meta data of each spot for each data batch metadataList <- lapply(simu3, function(x) x@meta.data) ## ensure the row.names of metadata in metaList are the same as that of colnames count matrix in countList M <- length(countList) for(r in 1:M){ row.names(metadataList[[r]]) <- colnames(countList[[r]]) } ## Create the Seurat list object seuList <- list() for(r in 1:M){ seuList[[r]] <- CreateSeuratObject(counts = countList[[r]], meta.data=metadataList[[r]], project = "FASTsimu") } ``` ### Prepare the PRECASTObject with preprocessing step. Next, we use `CreatePRECASTObject()` to create a PRECASTObject object based on the Seurat list object `seuList`. Users are able to see https://feiyoung.github.io/PRECAST/articles/PRECAST.BreastCancer.html for what is done in this function. Since this is a simulated dataset, we used all the 200 genes by using a custom gene list `customGenelist=custom_genelist)`. We observe that there are only 197 genes passing the filtering step. ```{r eval = FALSE} ## Create PRECASTObject custom_genelist <- row.names(seuList[[1]]) set.seed(2023) PRECASTObj <- CreatePRECASTObject(seuList, customGenelist=custom_genelist) ## User can retain the raw seuList by the following commond. ## PRECASTObj <- CreatePRECASTObject(seuList, customGenelist=row.names(seuList[[1]]), rawData.preserve = TRUE) ## check the number of genes/features after filtering step PRECASTObj@seulist ``` ## Fit FAST using simulated data ### Add the model setting Add adjacency matrix list and parameter setting of FAST. More model setting parameters can be found in `model_set_FAST()`. ```{r eval = FALSE} ## seuList is null since the default value `rawData.preserve` is FALSE. PRECASTObj@seuList ## Add adjacency matrix list for a PRECASTObj object to prepare for FAST model fitting. PRECASTObj <- AddAdjList(PRECASTObj, platform = "ST") ## Add a model setting in advance for a PRECASTObj object: verbose =TRUE helps outputing the information in the algorithm; PRECASTObj <- AddParSettingFAST(PRECASTObj, verbose=TRUE) ## Check the parameters PRECASTObj@parameterList ``` ### Fit FAST For function `FAST`, users can specify the number of factors `q` and the fitted model `fit.model`. The `q` sets the number of spatial factors to be extracted, and a lareger one means more information to be extracted but higher computaional cost. The `fit.model` specifies the version of FAST to be fitted. The Gaussian version (`gaussian`) models the log-count matrices while the Poisson verion (`poisson`) models the count matrices; default as `poisson`. ```{r eval = FALSE} ### set q= 20 here set.seed(2023) PRECASTObj <- FAST(PRECASTObj, q=20) ### Check the results str(PRECASTObj@resList) ```
**Run gaussian version** Users can also use the gaussian version by the following command: ```{r eval =FALSE} set.seed(2023) PRECASTObj <- FAST(PRECASTObj, q=20, fit.model='gaussian') ### Check the results str(PRECASTObj@resList) ```
### Evaluate the adjusted McFadden's pseudo R-square Next, we investigate the performance of dimension reduction by calculating the adjusted McFadden's pseudo R-square for each data batch. The simulated true labels is in the `meta.data` of each component of `PRECASTObj@seulist`. ```{r eval = FALSE} ## Obtain the true labels yList <- lapply(PRECASTObj@seulist, function(x) x$Group) ### Evaluate the MacR2 MacVec <- sapply(1:length(PRECASTObj@seulist), function(r) get_r2_mcfadden(PRECASTObj@resList$FAST$hV[[r]], yList[[r]])) ### output them print(MacVec) ``` ## Embedding alignment and clustering using Harmony and Louvain Based on the embeddings from FAST, we use `Harmony` to align the embeddings then followed by Louvain clustering to obtain the cluster labels. In this downstream analysis, other methods for embedding alignment and clustering can be also used. In the vignette of two sections of DLPFC Visium data, we will show another method (`iSC-MEB`) to jointly perform embedding alignment and spatial clustering. ```{r eval = FALSE} PRECASTObj <- RunHarmonyLouvain(PRECASTObj, resolution = 0.4) ARI_vec <- sapply(1:M, function(r) mclust::adjustedRandIndex(PRECASTObj@resList$Louvain$cluster[[r]], yList[[r]])) print(ARI_vec) ``` ## Remove unwanted variations in the log-normalized expression matrices In the following, we remove the unwanted variations in the log-normalized expression matrices to obtain a combined log-normalized expression matrix in a Seurat object. In the context of the simulated data used in this study, housekeeping genes are not present, thus, we turn to another method to remove the unwanted variations. Specifically, we leverage the batch effect embeddings estimated through Harmony to capture and mitigate unwanted variations. Additionally, we utilize the cluster labels obtained via Louvain clustering to retain the desired biological effects. The estimated embeddings of batch effects (batchEmbed) are in the slot `PRECASTObj@resList$Harmony` and cluster labels (cluster) are in the slot `PRECASTObj@resList$Louvain`. ```{r eval = FALSE} str(PRECASTObj@resList$Harmony) str(PRECASTObj@resList$Louvain) ``` Then, we integrate the three sections by removing the unwanted variations and setting `seulist_HK=NULL` and `Method = "HarmonyLouvain"` in the function `IntegrateSRTData()`. After obtaining `seuInt`, we will see there are three embeddings: `FAST`, `harmony` and `position`, in the slot `seuInt@reductions`. `FAST` are the embeddings obtained by FAST model fitting and are uncorrected embeddings that may includes the unwanted effects (i.e., batch effects); `harmony`are the embeddings obtained by Harmony fitting and are aligned embeddings; and `position` are the spatial coordinates. ```{r eval = FALSE} seulist_HK <- NULL seuInt <- IntegrateSRTData(PRECASTObj, seulist_HK=seulist_HK, Method = "HarmonyLouvain", seuList_raw=NULL, covariates_use=NULL, verbose=TRUE) seuInt ``` ## Visualization First, user can choose a beautiful color schema using `chooseColors()` in the R package `PRECAST`. ```{r eval = FALSE} cols_cluster <- chooseColors(palettes_name = "Nature 10", n_colors = 7, plot_colors = TRUE) ``` Then, we plot the spatial scatter plot for clusters using the function `SpaPlot()` in the R package `PRECAST`. ```{r eval = FALSE, fig.width=8, fig.height=6.3} p12 <- SpaPlot(seuInt, item = "cluster", batch = NULL, point_size = 1, cols = cols_cluster, combine = TRUE) p12 ``` Users can re-plot the above figures for specific need by returning a ggplot list object. For example, we plot the spatial heatmap using a common legend by using the function `drawFigs()` in the R package `PRECAST`. ```{r eval = FALSE, fig.width=9, fig.height=3.3} pList <- SpaPlot(seuInt, item = "cluster", title_name= 'Section',batch = NULL, point_size = 1, cols = cols_cluster, combine = FALSE) drawFigs(pList, layout.dim = c(1, 3), common.legend = TRUE, legend.position = "right", align = "hv") ``` We use the function `AddUMAP()` in the R package `PRECAST` to obtain the three-dimensional components of UMAP using the aligned embeddings in the reduction `harmony`. ```{r eval = FALSE} seuInt <- AddUMAP(seuInt, n_comp=3, reduction = 'harmony', assay = 'RNA') seuInt ``` We plot the spatial tNSE RGB plot to illustrate the performance in extracting features. ```{r eval = FALSE, fig.width=9, fig.height=3.3} p13 <- SpaPlot(seuInt, batch = NULL, item = "RGB_UMAP", point_size = 1, combine = FALSE, text_size = 15) drawFigs(p13, layout.dim = c(1, 3), common.legend = TRUE, legend.position = "right", align = "hv") ``` We use the function `AddUTSNE()` in the R package `PRECAST` to obtain the two-dimensional components of UMAP using the aligned embeddings in the reduction `harmony`, and plot the tSNE plot based on the extracted features to check the performance of integration. ```{r eval = FALSE, fig.width=9, fig.height=3.3} seuInt <- AddTSNE(seuInt, n_comp = 2, reduction = 'harmony', assay = 'RNA') p1 <- dimPlot(seuInt, item = "cluster", point_size = 0.5, font_family = "serif", cols = cols_cluster, border_col = "gray10", legend_pos = "right") # Times New Roman p2 <- dimPlot(seuInt, item = "batch", point_size = 0.5, font_family = "serif", legend_pos = "right") drawFigs(list(p1, p2), layout.dim = c(1, 2), legend.position = "right") ``` ## Combined differential epxression analysis Finally, we condut the combined differential expression analysis using the integrated log-normalized expression matrix saved in the `seuInt` object. The function `FindAllMarkers()` in the `Seurat` R package is ued to achieve this analysis. ```{r eval = FALSE} dat_deg <- FindAllMarkers(seuInt) library(dplyr) n <- 5 dat_deg %>% group_by(cluster) %>% top_n(n = n, wt = avg_log2FC) -> top10 seuInt <- ScaleData(seuInt) ``` Plot dot plot of normalized expressions for each spatial domain identified by using the FAST embeddings. ```{r eval = FALSE, fig.width=9, fig.height=5.4} col_here <- c("#F2E6AB", "#9C0141") library(ggplot2) p1 <- DotPlot(seuInt, features=unname(top10$gene), cols=col_here, # idents = ident_here, col.min = -1, col.max = 1) + coord_flip()+ theme(axis.text.y = element_text(face = "italic"))+ ylab("Domain") + xlab(NULL) + theme(axis.text.x = element_text(size=12, angle = 25, hjust = 1, family='serif'), axis.text.y = element_text(size=12, face= "italic", family='serif')) p1 ```
**Session Info** ```{r} sessionInfo() ```