scp 1.8.0
scpdata
packagescpdata
disseminates mass spectrometry (MS)-based single-cell
proteomics (SCP) data sets formatted using the scp
data structure.
The data structure is described in the
scp
vignette.
In this vignette, we describe how to access the SCP data sets. To
start, we load the scpdata
package.
library("scpdata")
ExperimentHub
The data is stored using the
ExperimentHub
infrastructure. We first create a connection with ExperimentHub
.
eh <- ExperimentHub()
You can list all data sets available in scpdata
using the query
function.
query(eh, "scpdata")
#> ExperimentHub with 18 records
#> # snapshotDate(): 2022-10-24
#> # $dataprovider: PRIDE, MassIVE, SlavovLab website
#> # $species: Homo sapiens, Mus musculus, Rattus norvegicus, Gallus gallus
#> # $rdataclass: QFeatures
#> # additional mcols(): taxonomyid, genome, description,
#> # coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
#> # rdatapath, sourceurl, sourcetype
#> # retrieve records with, e.g., 'object[["EH3899"]]'
#>
#> title
#> EH3899 | specht2019v2
#> EH3900 | specht2019v3
#> EH3901 | dou2019_lysates
#> EH3902 | dou2019_mouse
#> EH3903 | dou2019_boosting
#> ... ...
#> EH7295 | williams2020_lfq
#> EH7296 | williams2020_tmt
#> EH7711 | leduc2022
#> EH7712 | derks2022
#> EH7713 | brunner2022
Another way to get information about the available data sets is to
call scpdata()
. This will retrieve all the available metadata. For
example, we can retrieve the data set titles along with the
description to make an informed choice about which data set to choose.
info <- scpdata()
#> snapshotDate(): 2022-10-24
knitr::kable(info[, c("title", "description")])
title | description | |
---|---|---|
EH3899 | specht2019v2 | SCP expression data for monocytes (U-937) and macrophages at PSM, peptide and protein level |
EH3900 | specht2019v3 | SCP expression data for more monocytes (U-937) and macrophages at PSM, peptide and protein level |
EH3901 | dou2019_lysates | SCP expression data for Hela digests (0.2 or 10 ng) at PSM and protein level |
EH3902 | dou2019_mouse | SCP expression data for C10, SVEC or Raw cells at PSM and protein level |
EH3903 | dou2019_boosting | SCP expression data for C10, SVEC or Raw cells and 3 boosters (0, 5 or 50 ng) at PSM and protein level |
EH3904 | zhu2018MCP | Near SCP expression data for micro-dissection rat brain samples (50, 100, or 200 µm width) at PSM level |
EH3905 | zhu2018NC_hela | Near SCP expression data for HeLa samples (aproximately 12, 40, or 140 cells) at PSM level |
EH3906 | zhu2018NC_lysates | Near SCP expression data for HeLa lysates (10, 40 and 140 cell equivalent) at PSM level |
EH3907 | zhu2018NC_islets | Near SCP expression data for micro-dissected human pancreas samples (control patients or type 1 diabetes) at PSM level |
EH3908 | cong2020AC | SCP expression data for Hela cells at PSM, peptide and protein level |
EH3909 | zhu2019EL | SCP expression data for chicken utricle samples (1, 3, 5 or 20 cells) at PSM, peptide and protein level |
EH6011 | liang2020_hela | Expression data for HeLa cells (0, 1, 10, 150, 500 cells) at PSM, peptide and protein level |
EH7085 | schoof2021 | Single-cell proteomics data from OCI-AML8227 cell culture to reconstruct the cellular hierarchy. |
EH7295 | williams2020_lfq | Single-cell label free proteomics data from a MCF10A cell line culture. |
EH7296 | williams2020_tmt | Single-cell proteomics data from three acute myeloid leukemia cell line culture (MOLM-14, K562, CMK). |
EH7711 | leduc2022 | Single-cell proteomics data of 878 melanoma cells and 877 monocytes. |
EH7712 | derks2022 | Single-cell and bulk (100-cell) proteomics data of PDAC, melanoma cells and monocytes. |
EH7713 | brunner2022 | Single-cell proteomics data of cell cycle stages in HeLa. |
To get one of the data sets (e.g. dou2019_lysates
) you can either
retrieve it using the ExperimentHub
query function
scp <- eh[["EH3901"]]
#> see ?scpdata and browseVignettes('scpdata') for documentation
#> loading from cache
scp
#> An instance of class QFeatures containing 4 assays:
#> [1] Hela_run_1: SingleCellExperiment with 24562 rows and 10 columns
#> [2] Hela_run_2: SingleCellExperiment with 24310 rows and 10 columns
#> [3] peptides: SingleCellExperiment with 13934 rows and 20 columns
#> [4] proteins: SingleCellExperiment with 1641 rows and 20 columns
or you can the use the built-in functions from scpdata
scp <- dou2019_lysates()
#> see ?scpdata and browseVignettes('scpdata') for documentation
#> loading from cache
scp
#> An instance of class QFeatures containing 4 assays:
#> [1] Hela_run_1: SingleCellExperiment with 24562 rows and 10 columns
#> [2] Hela_run_2: SingleCellExperiment with 24310 rows and 10 columns
#> [3] peptides: SingleCellExperiment with 13934 rows and 20 columns
#> [4] proteins: SingleCellExperiment with 1641 rows and 20 columns
Each data set has been extensively documented in a separate man page
(e.g. ?dou2019_lysates
). You can find information about the data
content, the acquisition protocol, the data collection procedure as
well as the data sources and reference.
For more information about manipulating the data sets, check the
scp
package. The scp
vignette
will guide you through a typical SCP data processing workflow. Once
your data is loaded from scpdata
you can skip section 2
Read in SCP data of the scp
vignette.
R version 4.2.1 (2022-06-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.5 LTS
Matrix products: default
BLAS: /home/biocbuild/bbs-3.16-bioc/R/lib/libRblas.so
LAPACK: /home/biocbuild/bbs-3.16-bioc/R/lib/libRlapack.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_GB LC_COLLATE=C
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] scpdata_1.6.0 ExperimentHub_2.6.0
[3] AnnotationHub_3.6.0 BiocFileCache_2.6.0
[5] dbplyr_2.2.1 QFeatures_1.8.0
[7] MultiAssayExperiment_1.24.0 SummarizedExperiment_1.28.0
[9] Biobase_2.58.0 GenomicRanges_1.50.0
[11] GenomeInfoDb_1.34.0 IRanges_2.32.0
[13] S4Vectors_0.36.0 BiocGenerics_0.44.0
[15] MatrixGenerics_1.10.0 matrixStats_0.62.0
[17] BiocStyle_2.26.0
loaded via a namespace (and not attached):
[1] ProtGenerics_1.30.0 bitops_1.0-7
[3] bit64_4.0.5 filelock_1.0.2
[5] httr_1.4.4 tools_4.2.1
[7] bslib_0.4.0 utf8_1.2.2
[9] R6_2.5.1 DBI_1.1.3
[11] lazyeval_0.2.2 withr_2.5.0
[13] tidyselect_1.2.0 bit_4.0.4
[15] curl_4.3.3 compiler_4.2.1
[17] cli_3.4.1 DelayedArray_0.24.0
[19] bookdown_0.29 sass_0.4.2
[21] rappdirs_0.3.3 stringr_1.4.1
[23] digest_0.6.30 rmarkdown_2.17
[25] XVector_0.38.0 pkgconfig_2.0.3
[27] htmltools_0.5.3 highr_0.9
[29] fastmap_1.1.0 rlang_1.0.6
[31] RSQLite_2.2.18 shiny_1.7.3
[33] jquerylib_0.1.4 generics_0.1.3
[35] jsonlite_1.8.3 dplyr_1.0.10
[37] RCurl_1.98-1.9 magrittr_2.0.3
[39] GenomeInfoDbData_1.2.9 Matrix_1.5-1
[41] Rcpp_1.0.9 fansi_1.0.3
[43] MsCoreUtils_1.10.0 lifecycle_1.0.3
[45] stringi_1.7.8 yaml_2.3.6
[47] MASS_7.3-58.1 zlibbioc_1.44.0
[49] grid_4.2.1 blob_1.2.3
[51] promises_1.2.0.1 crayon_1.5.2
[53] lattice_0.20-45 Biostrings_2.66.0
[55] KEGGREST_1.38.0 knitr_1.40
[57] pillar_1.8.1 igraph_1.3.5
[59] glue_1.6.2 BiocVersion_3.16.0
[61] evaluate_0.17 BiocManager_1.30.19
[63] vctrs_0.5.0 png_0.1-7
[65] httpuv_1.6.6 purrr_0.3.5
[67] clue_0.3-62 assertthat_0.2.1
[69] cachem_1.0.6 BiocBaseUtils_1.0.0
[71] xfun_0.34 mime_0.12
[73] xtable_1.8-4 AnnotationFilter_1.22.0
[75] later_1.3.0 SingleCellExperiment_1.20.0
[77] tibble_3.1.8 AnnotationDbi_1.60.0
[79] memoise_2.0.1 cluster_2.1.4
[81] ellipsis_0.3.2 interactiveDisplayBase_1.36.0
This vignette is distributed under a CC BY-SA license.