The imcdatasets package provides access to publicly available datasets generated using imaging mass cytometry (IMC) (???).
IMC is a technology that enables measurement of up to 50 markers from tissue sections at a resolution of 1 \(\mu m\) (???). In classical processing pipelines, such as the ImcSegmentationPipeline or steinbock, the multichannel images are segmented to generate cells masks. These masks are then used to extract single cell features from the multichannel images.
Each dataset in imcdatasets
is composed of three elements that can be
retrieved separately:
1. Single-cell data in the form of a SingleCellExperiment
or
SpatialExperiment
class object (named sce.rds
).
2. Multichannel images in the form of a CytoImageList
class object (named
images.rds
).
3. Cell segmentation masks in the form of a CytoImageList
class object (named
masks.rds
).
The listDatasets()
function returns all available datasets in imcdatasets
,
along with associated information. The FunctionCall
column gives the name of
the R function that enables to load the dataset.
datasets <- listDatasets()
datasets <- as.data.frame(datasets)
datasets$FunctionCall <- sprintf("`%s`", datasets$FunctionCall)
knitr::kable(datasets)
FunctionCall | Species | Tissue | NumberOfCells | NumberOfImages | NumberOfChannels | Reference |
---|---|---|---|---|---|---|
Damond_2019_Pancreas() |
Human | Pancreas | 252059 | 100 | 38 | (???) |
JacksonFischer_2020_BreastCancer() |
Human | Primary breast tumour | 285851 | 100 | 42 | (???) |
Zanotelli_2020_Spheroids() |
Human | Cell line spheroids | 229047 | 517 | 51 | (???) |
Users can import the datasets by calling a single function and specifying the type of data to retrieve. The following examples highlight accessing the dataset provided by Damond, N. et al., A Map of Human Type 1 Diabetes Progression by Imaging Mass Cytometry (???).
Importing single-cell expression data and metadata
sce <- Damond_2019_Pancreas("sce")
sce
## class: SingleCellExperiment
## dim: 38 252059
## metadata(0):
## assays(3): counts exprs quant_norm
## rownames(38): H3 SMA ... DNA1 DNA2
## rowData names(6): channel metal ... antibody_clone full_name
## colnames(252059): 138_1 138_2 ... 319_1149 319_1150
## colData names(28): cell_id image_name ... patient_ethnicity patient_BMI
## reducedDimNames(0):
## mainExpName: Damond_2019_Pancreas
## altExpNames(0):
Importing multichannel images
images <- Damond_2019_Pancreas("images")
images
## CytoImageList containing 100 image(s)
## names(100): E02 E03 E04 E05 E06 E07 E08 E09 E10 E11 E12 E13 E14 E15 E16 E17 E18 E19 E20 E21 E22 E23 E24 E25 E26 E27 E28 E29 E30 E31 E32 E33 E34 G01 G02 G03 G04 G05 G06 G07 G08 G09 G10 G11 G12 G13 G14 G15 G16 G17 G18 G19 G20 G21 G22 G23 G24 G25 G26 G27 G28 G29 G30 G31 G32 G33 J01 J02 J03 J04 J05 J06 J07 J08 J09 J10 J11 J12 J13 J14 J15 J16 J17 J18 J19 J20 J21 J22 J23 J24 J25 J26 J27 J28 J29 J30 J31 J32 J33 J34
## Each image contains 38 channel(s)
## channelNames(38): H3 SMA INS CD38 CD44 PCSK2 CD99 CD68 MPO SLC2A1 CD20 AMY2A CD3e PPY PIN PD_1 GCG PDX1 SST SYP KRT19 CD45 FOXP3 CD45RA CD8a CA9 IAPP Ki67 NKX6_1 p_HH3 CD4 CD31 CDH1 PTPRN p_Rb cPARP_cCASP3 DNA1 DNA2
Importing cell segmentation masks
masks <- Damond_2019_Pancreas("masks")
masks
## CytoImageList containing 100 image(s)
## names(100): E02 E03 E04 E05 E06 E07 E08 E09 E10 E11 E12 E13 E14 E15 E16 E17 E18 E19 E20 E21 E22 E23 E24 E25 E26 E27 E28 E29 E30 E31 E32 E33 E34 G01 G02 G03 G04 G05 G06 G07 G08 G09 G10 G11 G12 G13 G14 G15 G16 G17 G18 G19 G20 G21 G22 G23 G24 G25 G26 G27 G28 G29 G30 G31 G32 G33 J01 J02 J03 J04 J05 J06 J07 J08 J09 J10 J11 J12 J13 J14 J15 J16 J17 J18 J19 J20 J21 J22 J23 J24 J25 J26 J27 J28 J29 J30 J31 J32 J33 J34
## Each image contains 1 channel
On disk storage
Objects containing multi-channel images and segmentation masks can furthermore be stored on disk rather than in memory. Nevertheless, they need to be loaded into memory once before writing them to disk. This process takes longer than keeping them in memory but reduces memory requirements during downstream analysis.
To write images or masks to disk, set on_disk = TRUE
and specify a path
where images/masks will be stored as .h5 files:
# Create temporary location
cur_path <- tempdir()
masks <- Damond_2019_Pancreas(data_type = "masks", on_disk = TRUE,
h5FilesPath = cur_path)
masks
## CytoImageList containing 100 image(s)
## names(100): E02 E03 E04 E05 E06 E07 E08 E09 E10 E11 E12 E13 E14 E15 E16 E17 E18 E19 E20 E21 E22 E23 E24 E25 E26 E27 E28 E29 E30 E31 E32 E33 E34 G01 G02 G03 G04 G05 G06 G07 G08 G09 G10 G11 G12 G13 G14 G15 G16 G17 G18 G19 G20 G21 G22 G23 G24 G25 G26 G27 G28 G29 G30 G31 G32 G33 J01 J02 J03 J04 J05 J06 J07 J08 J09 J10 J11 J12 J13 J14 J15 J16 J17 J18 J19 J20 J21 J22 J23 J24 J25 J26 J27 J28 J29 J30 J31 J32 J33 J34
## Each image contains 1 channel
Additional information about each dataset is available in the help page:
?Damond_2019_Pancreas
The metadata associated with a specific data object can be displayed as follows:
Damond_2019_Pancreas(data_type = "sce", metadata = TRUE)
Damond_2019_Pancreas(data_type = "images", metadata = TRUE)
Damond_2019_Pancreas(data_type = "masks", metadata = TRUE)
The SingleCellExperiment
class objects can be used for data analysis. For
more information, please refer to the SingleCellExperiment
package and to the Orchestrating Single-Cell Analysis with Bioconductor workflow.
The CytoImageList
class objects can be used for plotting cell and pixel
information. Some typical use cases are given below. For more information,
please see the cytomapper package and the
associated vignette.
Subsetting the images and masks
cur_images <- images[1:5]
cur_masks <- masks[1:5]
Plotting pixel information
The images
objects can be used to display pixel-level data.
plotPixels(
cur_images,
colour_by = c("CDH1", "CD99", "H3"),
bcg = list(
CD99 = c(0,2,1),
CDH1 = c(0,8,1),
H3 = c(0,5,1)
)
)
Plotting cell information
The masks
and sce
objects can be combined to display cell-level data.
plotCells(
cur_masks, object = sce,
img_id = "image_number", cell_id = "cell_number",
colour_by = c("CD8a", "PIN"),
exprs_values = "exprs"
)
Outlining cells on images
Cell information can be displayed on top of images by combining the images
,
masks
and sce
objects.
plotPixels(
cur_images, mask = cur_masks, object = sce,
img_id = "image_number", cell_id = "cell_number",
outline_by = "cell_type",
colour_by = c("H3", "CD99", "CDH1"),
bcg = list(
CD99 = c(0,2,1),
CDH1 = c(0,8,1),
H3 = c(0,5,1)
)
)
## R version 4.2.1 (2022-06-23)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.5 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.16-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.16-bioc/R/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] imcdatasets_1.6.0 SpatialExperiment_1.8.0
## [3] cytomapper_1.10.0 EBImage_4.40.0
## [5] SingleCellExperiment_1.20.0 SummarizedExperiment_1.28.0
## [7] Biobase_2.58.0 GenomicRanges_1.50.0
## [9] GenomeInfoDb_1.34.0 IRanges_2.32.0
## [11] S4Vectors_0.36.0 BiocGenerics_0.44.0
## [13] MatrixGenerics_1.10.0 matrixStats_0.62.0
## [15] BiocStyle_2.26.0
##
## loaded via a namespace (and not attached):
## [1] AnnotationHub_3.6.0 BiocFileCache_2.6.0
## [3] systemfonts_1.0.4 sp_1.5-0
## [5] shinydashboard_0.7.2 BiocParallel_1.32.0
## [7] ggplot2_3.3.6 digest_0.6.30
## [9] htmltools_0.5.3 viridis_0.6.2
## [11] magick_2.7.3 tiff_0.1-11
## [13] fansi_1.0.3 magrittr_2.0.3
## [15] memoise_2.0.1 limma_3.54.0
## [17] Biostrings_2.66.0 svgPanZoom_0.3.4
## [19] R.utils_2.12.1 svglite_2.1.0
## [21] jpeg_0.1-9 colorspace_2.0-3
## [23] blob_1.2.3 rappdirs_0.3.3
## [25] xfun_0.34 dplyr_1.0.10
## [27] crayon_1.5.2 RCurl_1.98-1.9
## [29] jsonlite_1.8.3 glue_1.6.2
## [31] gtable_0.3.1 nnls_1.4
## [33] zlibbioc_1.44.0 XVector_0.38.0
## [35] DelayedArray_0.24.0 DropletUtils_1.18.0
## [37] Rhdf5lib_1.20.0 HDF5Array_1.26.0
## [39] abind_1.4-5 scales_1.2.1
## [41] DBI_1.1.3 edgeR_3.40.0
## [43] Rcpp_1.0.9 viridisLite_0.4.1
## [45] xtable_1.8-4 dqrng_0.3.0
## [47] bit_4.0.4 htmlwidgets_1.5.4
## [49] httr_1.4.4 RColorBrewer_1.1-3
## [51] ellipsis_0.3.2 pkgconfig_2.0.3
## [53] R.methodsS3_1.8.2 scuttle_1.8.0
## [55] sass_0.4.2 dbplyr_2.2.1
## [57] locfit_1.5-9.6 utf8_1.2.2
## [59] tidyselect_1.2.0 rlang_1.0.6
## [61] later_1.3.0 AnnotationDbi_1.60.0
## [63] munsell_0.5.0 BiocVersion_3.16.0
## [65] tools_4.2.1 cachem_1.0.6
## [67] cli_3.4.1 generics_0.1.3
## [69] RSQLite_2.2.18 ExperimentHub_2.6.0
## [71] evaluate_0.17 stringr_1.4.1
## [73] fastmap_1.1.0 fftwtools_0.9-11
## [75] yaml_2.3.6 knitr_1.40
## [77] bit64_4.0.5 purrr_0.3.5
## [79] KEGGREST_1.38.0 sparseMatrixStats_1.10.0
## [81] mime_0.12 R.oo_1.25.0
## [83] compiler_4.2.1 beeswarm_0.4.0
## [85] filelock_1.0.2 curl_4.3.3
## [87] png_0.1-7 interactiveDisplayBase_1.36.0
## [89] tibble_3.1.8 bslib_0.4.0
## [91] stringi_1.7.8 highr_0.9
## [93] lattice_0.20-45 Matrix_1.5-1
## [95] vctrs_0.5.0 pillar_1.8.1
## [97] lifecycle_1.0.3 rhdf5filters_1.10.0
## [99] BiocManager_1.30.19 jquerylib_0.1.4
## [101] bitops_1.0-7 raster_3.6-3
## [103] httpuv_1.6.6 R6_2.5.1
## [105] bookdown_0.29 promises_1.2.0.1
## [107] gridExtra_2.3 vipor_0.4.5
## [109] codetools_0.2-18 assertthat_0.2.1
## [111] rhdf5_2.42.0 rjson_0.2.21
## [113] withr_2.5.0 GenomeInfoDbData_1.2.9
## [115] parallel_4.2.1 terra_1.6-17
## [117] grid_4.2.1 beachmat_2.14.0
## [119] rmarkdown_2.17 DelayedMatrixStats_1.20.0
## [121] shiny_1.7.3 ggbeeswarm_0.6.0