DR-SC: simulation

Generate the simulated data

First, we generate the spatial transcriptomics data with lattice neighborhood, i.e. ST platform by using the function gendata_RNAExp in DR.SC package, which is a Seurat object format. It is noted that the meta.data must include spatial coordinates in columns named “row” (x coordinates) and “col” (y coordinates)!

library(DR.SC)
seu <- gendata_RNAExp(height=30, width=30,p=500, K=4)
head(seu@meta.data)

Fit DR-SC using simulated data

Data preprocessing

This preprocessing includes Log-normalization and feature selection. Here we select highly variable genes for example first. The selected genes’ names are saved in “$

### Given K
library(Seurat)
seu <- NormalizeData(seu)
# choose highly variable features using Seurat
seu <- FindVariableFeatures(seu, nfeatures = 400)

Fit DR-SC based on highly variable genes(HVGs)

For function DR.SC, users can specify the number of clusters K or set K to be an integer vector by using modified BIC(MBIC) to determine K. First, we try using user-specified number of clusters. Then we show the version chosen by MBIC.

### Given K
seu2 <- DR.SC(seu, q=30, K=4, platform = 'ST',  verbose=F, approxPCA=T)

After finishing model fitting, we use ajusted rand index (ARI) to check the performance of clustering

mclust::adjustedRandIndex(seu2$spatial.drsc.cluster, seu$true_clusters)

Next, we show the application of DR-SC in visualization. First, we can visualize the clusters from DR-SC on the spatial coordinates.

spatialPlotClusters(seu2)

We can also visualize the clusters from DR-SC on the two-dimensional tSNE based on the extracted features from DR-SC.

drscPlot(seu2)

Show the UMAP plot based on the extracted features from DR-SC.

drscPlot(seu2, visu.method = 'UMAP')

Use MBIC to choose number of clusters:

seu2 <- DR.SC(seu, q=10, K=2:6, platform = 'ST', verbose=F,approxPCA=T)
mbicPlot(seu2)

Fit DR-SC based on spatially variable genes(SVGs)

First, we select the spatilly variable genes using funciton FindSVGs.

### Given K
seu <- NormalizeData(seu, verbose=F)
# choose 400 spatially variable features using FindSVGs
seus <- FindSVGs(seu, nfeatures = 400, verbose = F)
seu2 <- DR.SC(seus, q=4, K=4, platform = 'ST', verbose=F)

Using ARI to check the performance of clustering

mclust::adjustedRandIndex(seu2$spatial.drsc.cluster, seu$true_clusters)

DR-SC can enhance visualization

Show the spatial scatter plot for clusters

spatialPlotClusters(seu2)

Show the tSNE plot based on the extracted features from DR-SC.

drscPlot(seu2)

Show the UMAP plot based on the extracted features from DR-SC.

drscPlot(seu2, visu.method = 'UMAP')

DR-SC can automatically determine the number of clusters

Use MBIC to choose number of clusters:

seu2 <- DR.SC(seus, q=4, K=2:6, platform = 'ST',  verbose=F)
mbicPlot(seu2)
# or plot BIC or AIC
# mbicPlot(seu2, criteria = 'BIC')
# mbicPlot(seu2, criteria = 'AIC')
# tune pen.const
seu2 <- selectModel(seu2, pen.const = 0.7)
mbicPlot(seu2)

DR-SC can help differentially expression analysis

Conduct visualization of marker gene expression. ### Ridge plots Visualize single cell expression distributions in each cluster from Seruat.

dat <- FindAllMarkers(seu2)
suppressPackageStartupMessages(library(dplyr) )
# Find the top 1 marker genes, user can change n to access more marker genes
dat %>%group_by(cluster) %>%
    top_n(n = 1, wt = avg_log2FC) -> top
genes <- top$gene
RidgePlot(seu2, features = genes, ncol = 2)

Violin plot

Visualize single cell expression distributions in each cluster

VlnPlot(seu2, features = genes, ncol=2)

Feature plot

We extract tSNE based on the features from DR-SC and then visualize feature expression in the low-dimensional space

seu2 <- RunTSNE(seu2, reduction="dr-sc", reduction.key='drsctSNE_')
FeaturePlot(seu2, features = genes, reduction = 'tsne' ,ncol=2)

Dot plots

The size of the dot corresponds to the percentage of cells expressing the feature in each cluster. The color represents the average expression level

DotPlot(seu2, features = genes)

Heatmap plot

Single cell heatmap of feature expression

# standard scaling (no regression)
dat %>%group_by(cluster) %>%
    top_n(n = 30, wt = avg_log2FC) -> top
### select the marker genes that are also the variable genes.

genes <- intersect(top$gene, seu2[['RNA']]@var.features)
## Change the HVGs to SVGs
#  <- topSVGs(seu2, 400)
seu2 <- ScaleData(seu2, verbose = F)
DoHeatmap(subset(seu2, downsample = 500),features = genes, size = 5)

Session information

sessionInfo()