library(knitr)
opts_chunk$set(cache = TRUE, out.width='100%', fig.align = 'center')


library(Seurat)
library(SeuratWrappers)
library(DropletUtils)
library(ggplot2)
library(RColorBrewer)
library(knitr)
library(dplyr)
library(harmony)
library(rliger)
library(glmGamPoi)
library(patchwork)
library(cowplot)

Warning message:
“package ‘Seurat’ was built under R version 4.2.2”
Attaching SeuratObject

Warning message:
“package ‘DropletUtils’ was built under R version 4.2.2”
Loading required package: SingleCellExperiment

Warning message:
“package ‘SingleCellExperiment’ was built under R version 4.2.2”
Loading required package: SummarizedExperiment

Warning message:
“package ‘SummarizedExperiment’ was built under R version 4.2.2”
Loading required package: MatrixGenerics

Loading required package: matrixStats

Warning message:
“package ‘matrixStats’ was built under R version 4.2.2”

Attaching package: ‘MatrixGenerics’


The following objects are masked from ‘package:matrixStats’:

    colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
    colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
    colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
    colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
    colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
    colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
    colWeightedMeans, colWeightedMedians, colWeightedSds,
    colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
    rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
    rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
    rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
    rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
    rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
    rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
    rowWeightedSds, rowWeightedVars


Loading required package: GenomicRanges

Warning message:
“package ‘GenomicRanges’ was built under R version 4.2.2”
Loading required package: stats4

Loading required package: BiocGenerics


Attaching package: ‘BiocGenerics’


The following objects are masked from ‘package:stats’:

    IQR, mad, sd, var, xtabs


The following objects are masked from ‘package:base’:

    anyDuplicated, aperm, append, as.data.frame, basename, cbind,
    colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
    get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
    Position, rank, rbind, Reduce, rownames, sapply, setdiff, sort,
    table, tapply, union, unique, unsplit, which.max, which.min


Loading required package: S4Vectors


Attaching package: ‘S4Vectors’


The following objects are masked from ‘package:base’:

    expand.grid, I, unname


Loading required package: IRanges

Loading required package: GenomeInfoDb

Warning message:
“package ‘GenomeInfoDb’ was built under R version 4.2.2”
Loading required package: Biobase

Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.


Attaching package: ‘Biobase’


The following object is masked from ‘package:MatrixGenerics’:

    rowMedians


The following objects are masked from ‘package:matrixStats’:

    anyMissing, rowMedians


Attaching package: ‘SummarizedExperiment’


The following object is masked from ‘package:SeuratObject’:

    Assays


The following object is masked from ‘package:Seurat’:

    Assays


Warning message:
“package ‘ggplot2’ was built under R version 4.2.2”
Warning message:
“package ‘knitr’ was built under R version 4.2.2”
Warning message:
“package ‘dplyr’ was built under R version 4.2.2”

Attaching package: ‘dplyr’


The following object is masked from ‘package:Biobase’:

    combine


The following objects are masked from ‘package:GenomicRanges’:

    intersect, setdiff, union


The following object is masked from ‘package:GenomeInfoDb’:

    intersect


The following objects are masked from ‘package:IRanges’:

    collapse, desc, intersect, setdiff, slice, union


The following objects are masked from ‘package:S4Vectors’:

    first, intersect, rename, setdiff, setequal, union


The following objects are masked from ‘package:BiocGenerics’:

    combine, intersect, setdiff, union


The following object is masked from ‘package:matrixStats’:

    count


The following objects are masked from ‘package:stats’:

    filter, lag


The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union


Loading required package: Rcpp

Warning message:
“package ‘Rcpp’ was built under R version 4.2.2”
Loading required package: cowplot

Loading required package: Matrix

Warning message:
“package ‘Matrix’ was built under R version 4.2.2”

Attaching package: ‘Matrix’


The following object is masked from ‘package:S4Vectors’:

    expand


Loading required package: patchwork


Attaching package: ‘patchwork’


The following object is masked from ‘package:cowplot’:

    align_plots


Attaching package: ‘rliger’


The following object is masked from ‘package:BiocGenerics’:

    normalize


Warning message:
“package ‘glmGamPoi’ was built under R version 4.2.2”

Attaching package: ‘glmGamPoi’


The following object is masked from ‘package:dplyr’:

    vars


The following object is masked from ‘package:ggplot2’:

    vars


source("./data/cellgeni/custom_seurat_functions.R")


umi_gz <- gzfile("./data/cellgeni/GSE149938_umi_matrix.csv.gz",'rt')
umi <- read.csv(umi_gz,check.names = F,quote = "")
matrix_3p    <- Read10X("./data/cellgeni/soupX_pbmc10k_filt")


srat_wb <- CreateSeuratObject(t(umi),project = "whole_blood")
srat_3p <- CreateSeuratObject(matrix_3p,project = "pbmc10k_3p")
rm(umi_gz)
rm(umi)
rm(matrix_3p)

Warning message:
“Non-unique features (rownames) present in the input matrix, making unique”


colnames(srat_wb@meta.data)[1] <- "cell_type"
srat_wb@meta.data$orig.ident <- "whole_blood"
srat_wb@meta.data$orig.ident <- as.factor(srat_wb@meta.data$orig.ident)
head(srat_wb[[]])


srat_wb <- SetIdent(srat_wb,value = "orig.ident")

srat_wb[["percent.mt"]] <- PercentageFeatureSet(srat_wb, pattern = "^MT-")
srat_wb[["percent.rbp"]] <- PercentageFeatureSet(srat_wb, pattern = "^RP[SL]")
srat_3p[["percent.mt"]] <- PercentageFeatureSet(srat_3p, pattern = "^MT-")
srat_3p[["percent.rbp"]] <- PercentageFeatureSet(srat_3p, pattern = "^RP[SL]")

VlnPlot(srat_wb, features = c("nFeature_RNA","nCount_RNA","percent.mt","percent.rbp"), ncol = 4)
VlnPlot(srat_3p, features = c("nFeature_RNA","nCount_RNA","percent.mt","percent.rbp"), ncol = 4)

Warning message in SingleExIPlot(type = type, data = data[, x, drop = FALSE], idents = idents, :
“All cells have the same value of percent.mt.”


table(rownames(srat_3p) %in% rownames(srat_wb))
common_genes <- rownames(srat_3p)[rownames(srat_3p) %in% rownames(srat_wb)]

FALSE  TRUE 
18050 18551


srat_3p <- subset(srat_3p, subset = nFeature_RNA > 1000 & nFeature_RNA < 6000 & percent.mt < 15)
srat_wb <- subset(srat_wb, subset = nFeature_RNA > 1000)

srat_3p <- srat_3p[rownames(srat_3p) %in% common_genes,]
srat_wb <- srat_wb[rownames(srat_wb) %in% common_genes,]


wb_list <- list()
wb_list[["pbmc10k_3p"]]   <- srat_3p
wb_list[["whole_blood"]]  <- srat_wb

for (i in 1:length(wb_list)) {
  wb_list[[i]] <- NormalizeData(wb_list[[i]], verbose = F)
  wb_list[[i]] <- FindVariableFeatures(wb_list[[i]], selection.method = "vst", nfeatures = 2000, verbose = F)
}


wb_anchors <- FindIntegrationAnchors(object.list = wb_list, dims = 1:30)
wb_seurat  <- IntegrateData(anchorset = wb_anchors, dims = 1:30)

Computing 2000 integration features

Scaling features for provided objects

Finding all pairwise anchors

Running CCA

Merging objects

Finding neighborhoods

Finding anchors

	Found 11508 anchors

Filtering anchors

	Retained 3175 anchors

Merging dataset 2 into 1

Extracting anchors for merged samples

Finding integration vectors

Finding integration vector weights

Integrating data


rm(wb_list)
rm(wb_anchors)


DefaultAssay(wb_seurat) <- "RNA"
wb_seurat <- NormalizeData(wb_seurat, verbose = F)
wb_seurat <- FindVariableFeatures(wb_seurat, selection.method = "vst", nfeatures = 2000, verbose = F)
wb_seurat <- ScaleData(wb_seurat, verbose = F)
wb_seurat <- RunPCA(wb_seurat, npcs = 30, verbose = F)
wb_seurat <- RunUMAP(wb_seurat, reduction = "pca", dims = 1:30, verbose = F)

DimPlot(wb_seurat, reduction = "umap") + plot_annotation(title = "10k 3' PBMC and whole blood, before integration")
DimPlot(wb_seurat, reduction = "umap", group.by = "orig.ident", pt.size = .1, split.by = 'orig.ident') + NoLegend()

Warning message:
“The default method for RunUMAP has changed from calling Python UMAP via reticulate to the R-native UWOT using the cosine metric
To use Python UMAP via reticulate, set umap.method to 'umap-learn' and metric to 'correlation'
This message will be shown once per session”


DefaultAssay(wb_seurat) <- "integrated"
wb_seurat <- ScaleData(wb_seurat, verbose = F)
wb_seurat <- RunPCA(wb_seurat, npcs = 30, verbose = F)
wb_seurat <- RunUMAP(wb_seurat, reduction = "pca", dims = 1:30, verbose = F)

DimPlot(wb_seurat, reduction = "umap") + plot_annotation(title = "10k 3' PBMC and white blood cells, after integration (Seurat 3)")


FeaturePlot(wb_seurat,c("MS4A1","LYZ","NKG7","PPBP","LTF","HBA1","FCER1A","IL7R","FCGR3B")) & scale_colour_gradientn(colours = rev(brewer.pal(n = 11, name = "Spectral")))

Scale for colour is already present.
Adding another scale for colour, which will replace the existing scale.
Scale for colour is already present.
Adding another scale for colour, which will replace the existing scale.
Scale for colour is already present.
Adding another scale for colour, which will replace the existing scale.
Scale for colour is already present.
Adding another scale for colour, which will replace the existing scale.
Scale for colour is already present.
Adding another scale for colour, which will replace the existing scale.
Scale for colour is already present.
Adding another scale for colour, which will replace the existing scale.
Scale for colour is already present.
Adding another scale for colour, which will replace the existing scale.
Scale for colour is already present.
Adding another scale for colour, which will replace the existing scale.
Scale for colour is already present.
Adding another scale for colour, which will replace the existing scale.


wb_seurat <- FindNeighbors(wb_seurat, dims = 1:30, k.param = 10, verbose = F)
wb_seurat <- FindClusters(wb_seurat, verbose = F)
DimPlot(wb_seurat,label = T) + NoLegend()


count_table <- table(wb_seurat@meta.data$seurat_clusters, wb_seurat@meta.data$orig.ident)
count_table

    
     pbmc10k_3p whole_blood
  0        1564          90
  1        1387         211
  2        1222         113
  3        1217         111
  4        1066         107
  5         342         442
  6           9         772
  7         448         232
  8         185         410
  9           1         560
  10        362         193
  11          0         521
  12          0         490
  13        324         158
  14        362          83
  15          0         445
  16        290         124
  17         12         367
  18          0         371
  19          8         355
  20          0         334
  21         11         313
  22        312          11
  23          0         209
  24         19         179
  25          0         188
  26         99          19
  27          0         107
  28          0          70
  29          0          58


plot_integrated_clusters(wb_seurat)

Using cluster as id variables


meta <- wb_seurat[[]]
table(meta[meta$seurat_clusters == '5',]$cell_type) ## erythrocytes
table(meta[meta$seurat_clusters == '20',]$cell_type) ## neutrophils
table(meta[meta$seurat_clusters == '24',]$cell_type) ## plasma B cells

 ery immB memB naiB preB regB 
   2   14    2  315    1  108

    CLP    immB  kineNK matureN   metaN     MLP    myeN  plasma    proN  toxiNK 
      1       1       5      98     101       1     104       2      16       5

   ery   naiB plasma 
    12      1    166


rm(wb_seurat)


wb_harmony    <- merge(srat_3p,srat_wb)

wb_harmony <- NormalizeData(wb_harmony, verbose = F)
wb_harmony <- FindVariableFeatures(wb_harmony, selection.method = "vst", nfeatures = 2000, verbose = F)
wb_harmony <- ScaleData(wb_harmony, verbose = F)
wb_harmony <- RunPCA(wb_harmony, npcs = 30, verbose = F)
wb_harmony <- RunUMAP(wb_harmony, reduction = "pca", dims = 1:30, verbose = F)


p1 <- DimPlot(object = wb_harmony, reduction = "pca", pt.size = .1, group.by = "orig.ident") + NoLegend()
p2 <- VlnPlot(object = wb_harmony, features = "PC_1", group.by = "orig.ident", pt.size = .1) + NoLegend()
plot_grid(p1,p2)


DimPlot(wb_harmony,reduction = "umap") + plot_annotation(title = "10k 3' PBMC and whole blood, before integration")


wb_harmony <- wb_harmony %>% RunHarmony("orig.ident", plot_convergence = T)

Harmony 1/10

Harmony 2/10

Harmony 3/10

Harmony converged after 3 iterations

Warning message:
“Invalid name supplied, making object name syntactically valid. New object name is Seurat..ProjectDim.RNA.harmony; see ?make.names for more details on syntax validity”


harmony_embeddings <- Embeddings(wb_harmony, 'harmony')
harmony_embeddings[1:5, 1:5]


p1 <- DimPlot(object = wb_harmony, reduction = "harmony", pt.size = .1, group.by = "orig.ident") + NoLegend()
p2 <- VlnPlot(object = wb_harmony, features = "harmony_1", group.by = "orig.ident", pt.size = .1) + NoLegend()
plot_grid(p1,p2)


wb_harmony <- wb_harmony %>%
  RunUMAP(reduction = "harmony", dims = 1:30, verbose = F) %>%
  FindNeighbors(reduction = "harmony", k.param = 10, dims = 1:30) %>%
  FindClusters() %>%
  identity()

Computing nearest neighbor graph

Computing SNN

Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck

Number of nodes: 16883
Number of edges: 275342

Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.9231
Number of communities: 29
Elapsed time: 1 seconds


wb_harmony <- SetIdent(wb_harmony,value = "orig.ident")
DimPlot(wb_harmony,reduction = "umap") + plot_annotation(title = "10k 3' PBMC and whole blood, after integration (Harmony)")
DimPlot(wb_harmony, reduction = "umap", group.by = "orig.ident", pt.size = .1, split.by = 'orig.ident') + NoLegend()


wb_harmony <- SetIdent(wb_harmony,value = "seurat_clusters")
DimPlot(wb_harmony,label = T) + NoLegend()


plot_integrated_clusters(wb_harmony)

Using cluster as id variables


rm(wb_harmony)


wb_liger    <- merge(srat_3p,srat_wb)

wb_liger    <- NormalizeData(wb_liger)
wb_liger    <- FindVariableFeatures(wb_liger)
wb_liger    <- ScaleData(wb_liger, split.by = "orig.ident", do.center = F)
wb_liger    <- RunOptimizeALS(wb_liger, k = 30, lambda = 5, split.by = "orig.ident")
wb_liger    <- RunQuantileNorm(wb_liger, split.by = "orig.ident")

Scaling data matrix

Scaling data from split pbmc10k_3p

Scaling data from split whole_blood

  |======================================================================| 100%
Finished in 3.81193 mins, 30 iterations.
Max iterations set: 30.
Final objective delta: 4.18028e-05.
Best results with seed 1.

Warning message:
“No columnames present in cell embeddings, setting to 'riNMF_1:30'”


wb_liger    <- FindNeighbors(wb_liger,reduction = "iNMF",k.param = 10,dims = 1:30)
wb_liger    <- FindClusters(wb_liger)

Computing nearest neighbor graph

Computing SNN

Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck

Number of nodes: 16883
Number of edges: 276592

Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.9233
Number of communities: 38
Elapsed time: 0 seconds


wb_liger    <- RunUMAP(wb_liger, dims = 1:ncol(wb_liger[["iNMF"]]), reduction = "iNMF",verbose = F)
wb_liger <- SetIdent(wb_liger,value = "orig.ident")
DimPlot(wb_liger,reduction = "umap") + plot_annotation(title = "10k 3' PBMC and 10k 5' PBMC cells, after integration (LIGER)")
DimPlot(wb_liger, reduction = "umap", group.by = "orig.ident", pt.size = .1, split.by = 'orig.ident') + NoLegend()


plot_integrated_clusters(wb_liger)

Using cluster as id variables


rm(wb_liger)
rm(srat_wb)
rm(srat_3p)


sessionInfo()

	cell_type	nCount_RNA	nFeature_RNA	orig.ident
	<fct>	<dbl>	<int>	<fct>
BNK_spBM1_L1_bar25	BNK	24494	1869	whole_blood
BNK_spBM1_L1_bar26	BNK	61980	2051	whole_blood
BNK_spBM1_L1_bar27	BNK	124382	3872	whole_blood
BNK_spBM1_L1_bar28	BNK	8144	1475	whole_blood
BNK_spBM1_L1_bar29	BNK	53612	2086	whole_blood
BNK_spBM1_L1_bar30	BNK	33582	2038	whole_blood

	harmony_1	harmony_2	harmony_3	harmony_4	harmony_5
AAACCCACATAACTCG-1	3.2402720	0.5517086	-4.014592	-0.2490492	-0.4555420
AAACCCACATGTAACC-1	4.3992278	-0.2653914	8.905441	5.7085193	-2.6650973
AAACCCAGTGAGTCAG-1	3.6924188	1.4199533	-2.323807	-1.7352863	1.2648235
AAACGAACAGTCAGTT-1	-0.8957237	-3.2318868	-2.398822	-0.1342351	-1.9238917
AAACGAACATTCGGGC-1	4.6218704	1.6066620	1.914114	-1.7354947	0.8168312

scRNA-seq Dataset Integration using R-based tools¶

Introduction¶

Harmony, 3' 10k PBMC cells and whole blood STRT-Seq¶

LIGER, 3' 10k PBMC cells and whole blood STRT-Seq¶

sessionInfo()¶