Mass cytometry makes it possible to count a large number of proteins simultaneously on individual cells (Bandura et al. 2009; Bendall et al. 2011). Mass cytometry has less spillover— measurements from one channel overlap less with those of another—than flow cytometry (Bagwell and Adams 1993; Novo, Grégori, and Rajwa 2013), but spillover is still a problem and affects downstream analyses such as differential testing (Weber et al. 2019; Seiler et al. 2021) or dimensionality reduction (McCarthy et al. 2017). Reducing spillover by careful design of experiment is possible (Takahashi et al. 2017), but a purely experimental approach may not be sufficient nor efficient (Lun et al. 2017). Chevrier et al. (2018) propose a method for addressing spillover by conducting an experiment on beads. This experiment measures spillover by staining each bead with a single antibody. Their solution relies on an estimate for the spillover matrix using non-negative matrix factorization. The spillover matrix encodes the pairwise spillover proportion between channels. We avoid this step and directly describe the spillover channels and the channel with the true signal using a mixture of nonparametric distributions. Our main new assumption is that the spillover distribution—not just the spillover proportion—from the beads experiment carries over to the biological experiment. Here, we illustrate the spillR
R package for spillover compensation in mass cytometry.
Our motivation to submit to Bioconductor is to make our package available to a large user base and to ensure its compatibility with other packages that address preprocessing and analysis of mass cytometry data.
Install this package.
if (!require("BiocManager", quietly = TRUE)) {
install.packages("BiocManager")
}
BiocManager::install("spillR")
We test our method on one of the example datasets in the CATALYST
package. The dataset has an experiment with real cells and a corresponding bead experiment. The experiments on real cells has 5,000 peripheral blood mononuclear cells from healthy donors measured on 39 channels. The experiment on beads has 10,000 cells measured on 36 channels. They have single stained bead experiments. The number of beads per mental label range from 112 to 241.
We compare the two methods on the same marker as in the original CATALYST
paper (Chevrier et al. 2018) in their Figure 3B. In the original experiment, they conjugated three proteins—CD3, CD8, and HLA-DR—with two different metal labels. Here is the CATALYST
code to load the data into a Single Cell Experiment (SCE).
library(spillR)
library(CATALYST)
library(dplyr)
library(ggplot2)
library(cowplot)
bc_key <- c(139, 141:156, 158:176)
sce_bead <- prepData(ss_exp)
sce_bead <- assignPrelim(sce_bead, bc_key, verbose = FALSE)
sce_bead <- applyCutoffs(estCutoffs(sce_bead))
sce_bead <- computeSpillmat(sce_bead)
# --------- experiment with real cells ---------
data(mp_cells, package = "CATALYST")
sce <- prepData(mp_cells)
The function compCytof
takes as inputs two Single Cell Experiment (SCE) objects. One contains the real cells experiment and the other the beads experiment. It also requires a table marker_to_barc
that maps the channels to their barcodes used in the beads experiment. The output is the same SCE for the real experiments with the addition of the compensated counts and the asinh
transformed compensated counts.
# --------- table for mapping markers and barcode ---------
marker_to_barc <-
rowData(sce_bead)[, c("channel_name", "is_bc")] |>
as_tibble() |>
dplyr::filter(is_bc == TRUE) |>
mutate(barcode = bc_key) |>
select(marker = channel_name, barcode)
# --------- compensate function from spillR package ---------
sce_spillr <-
spillR::compCytof(sce, sce_bead, marker_to_barc, impute_value = NA,
overwrite = FALSE)
# --------- 2d histogram from CATALYST package -------
as <- c("counts", "exprs", "compcounts", "compexprs")
chs <- c("Yb171Di", "Yb173Di")
ps <- lapply(as, function(a) plotScatter(sce_spillr, chs, assay = a))
plot_grid(plotlist = ps, nrow = 2)
spillR
offers the possibility to visualize the compensation results and the internal spillover estimates. The function plotDiagnostics
presents two plots: the frequency polygons before and after spillover compensation, and the density plot of spillover markers with our estimation of the spillover probability function as a black dashed curve. This plot allows us to check the compensation performed by our method. If the black curve captures all the spillover makers, then that indicates a reliable spillover estimation. If the target marker in the beads experiment overlaps with the real cells, then that indicates a high-quality bead experiment.
ps <- spillR::plotDiagnostics(sce_spillr, "Yb173Di")
x_lim <- c(0, 7)
plot_grid(ps[[1]] + xlim(x_lim),
ps[[2]] + xlim(x_lim),
ncol = 1, align = "v"
)
sessionInfo()
## R Under development (unstable) (2024-10-21 r87258)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.21-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] cowplot_1.1.3 ggplot2_3.5.1
## [3] dplyr_1.1.4 spillR_1.3.0
## [5] CATALYST_1.31.0 SingleCellExperiment_1.29.0
## [7] SummarizedExperiment_1.37.0 Biobase_2.67.0
## [9] GenomicRanges_1.59.0 GenomeInfoDb_1.43.0
## [11] IRanges_2.41.0 S4Vectors_0.45.0
## [13] BiocGenerics_0.53.0 MatrixGenerics_1.19.0
## [15] matrixStats_1.4.1 BiocStyle_2.35.0
##
## loaded via a namespace (and not attached):
## [1] RColorBrewer_1.1-3 jsonlite_1.8.9
## [3] shape_1.4.6.1 magrittr_2.0.3
## [5] magick_2.8.5 spatstat.utils_3.1-0
## [7] ggbeeswarm_0.7.2 TH.data_1.1-2
## [9] farver_2.1.2 rmarkdown_2.28
## [11] GlobalOptions_0.1.2 zlibbioc_1.53.0
## [13] vctrs_0.6.5 tinytex_0.53
## [15] rstatix_0.7.2 htmltools_0.5.8.1
## [17] S4Arrays_1.7.0 plotrix_3.8-4
## [19] BiocNeighbors_2.1.0 broom_1.0.7
## [21] SparseArray_1.7.0 Formula_1.2-5
## [23] sass_0.4.9 bslib_0.8.0
## [25] plyr_1.8.9 sandwich_3.1-1
## [27] zoo_1.8-12 cachem_1.1.0
## [29] igraph_2.1.1 lifecycle_1.0.4
## [31] iterators_1.0.14 pkgconfig_2.0.3
## [33] rsvd_1.0.5 Matrix_1.7-1
## [35] R6_2.5.1 fastmap_1.2.0
## [37] GenomeInfoDbData_1.2.13 clue_0.3-65
## [39] digest_0.6.37 colorspace_2.1-1
## [41] ggnewscale_0.5.0 scater_1.35.0
## [43] irlba_2.3.5.1 ggpubr_0.6.0
## [45] beachmat_2.23.0 labeling_0.4.3
## [47] cytolib_2.19.0 fansi_1.0.6
## [49] colorRamps_2.3.4 nnls_1.6
## [51] httr_1.4.7 polyclip_1.10-7
## [53] abind_1.4-8 compiler_4.5.0
## [55] withr_3.0.2 doParallel_1.0.17
## [57] ConsensusClusterPlus_1.71.0 backports_1.5.0
## [59] BiocParallel_1.41.0 viridis_0.6.5
## [61] carData_3.0-5 highr_0.11
## [63] hexbin_1.28.4 ggforce_0.4.2
## [65] ggsignif_0.6.4 MASS_7.3-61
## [67] drc_3.0-1 DelayedArray_0.33.0
## [69] rjson_0.2.23 FlowSOM_2.15.0
## [71] gtools_3.9.5 tools_4.5.0
## [73] vipor_0.4.7 beeswarm_0.4.0
## [75] glue_1.8.0 grid_4.5.0
## [77] Rtsne_0.17 cluster_2.1.6
## [79] reshape2_1.4.4 generics_0.1.3
## [81] gtable_0.3.6 tidyr_1.3.1
## [83] data.table_1.16.2 ScaledMatrix_1.15.0
## [85] BiocSingular_1.23.0 car_3.1-3
## [87] utf8_1.2.4 XVector_0.47.0
## [89] ggrepel_0.9.6 foreach_1.5.2
## [91] pillar_1.9.0 stringr_1.5.1
## [93] circlize_0.4.16 splines_4.5.0
## [95] flowCore_2.19.0 tweenr_2.0.3
## [97] lattice_0.22-6 survival_3.7-0
## [99] RProtoBufLib_2.19.0 tidyselect_1.2.1
## [101] ComplexHeatmap_2.23.0 scuttle_1.17.0
## [103] knitr_1.48 gridExtra_2.3
## [105] bookdown_0.41 xfun_0.48
## [107] stringi_1.8.4 UCSC.utils_1.3.0
## [109] yaml_2.3.10 evaluate_1.0.1
## [111] codetools_0.2-20 tibble_3.2.1
## [113] BiocManager_1.30.25 cli_3.6.3
## [115] munsell_0.5.1 jquerylib_0.1.4
## [117] Rcpp_1.0.13 png_0.1-8
## [119] spatstat.univar_3.0-1 XML_3.99-0.17
## [121] parallel_4.5.0 viridisLite_0.4.2
## [123] mvtnorm_1.3-1 scales_1.3.0
## [125] ggridges_0.5.6 purrr_1.0.2
## [127] crayon_1.5.3 GetoptLong_1.0.5
## [129] rlang_1.1.4 multcomp_1.4-26