Contents

1 Introduction

The TransOmicsData package contains datasets spanning various biological contexts such as in vitro embryonic and tissue-specific development in mouse and human. It covers multiple omics sequencing technologies such as RNAseq, mass spectrometry and ChIP-seq. This package was developed to provide convenient access to raw or pre-processed data for comparative trans-omics analysis.

2 The TransOmicsData package

2.1 Accessing the data

The data stored in this package can be retrieved using ExperimentHub.

# if (!requireNamespace("BiocManager", quietly = TRUE))
#    install.packages("BiocManager")

# BiocManager::install("ExperimentHub")
library(ExperimentHub)
refreshHub(hubClass = "ExperimentHub")
## ExperimentHub with 8333 records
## # snapshotDate(): 2024-10-24
## # $dataprovider: Eli and Edythe L. Broad Institute of Harvard and MIT, NCBI,...
## # $species: Homo sapiens, Mus musculus, Saccharomyces cerevisiae, Drosophila...
## # $rdataclass: SummarizedExperiment, data.frame, ExpressionSet, matrix, char...
## # additional mcols(): taxonomyid, genome, description,
## #   coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
## #   rdatapath, sourceurl, sourcetype 
## # retrieve records with, e.g., 'object[["EH1"]]' 
## 
##            title                                                              
##   EH1    | RNA-Sequencing and clinical data for 7706 tumor samples from The...
##   EH166  | ERR188297                                                          
##   EH167  | ERR188088                                                          
##   EH168  | ERR188204                                                          
##   EH169  | ERR188317                                                          
##   ...      ...                                                                
##   EH9604 | Resistance of TEAD inhibitor to drug                               
##   EH9605 | spe                                                                
##   EH9606 | sce                                                                
##   EH9607 | ProteinGym metadata for 217 DMS substitution assays                
##   EH9608 | roadmap_wgbs_hg38
ehub <- ExperimentHub()
myfiles <- query(ehub, "TransOmicsData")
myfiles
## ExperimentHub with 12 records
## # snapshotDate(): 2024-10-24
## # $dataprovider: PRIDE, NCBI
## # $species: Mus musculus, Homo sapiens
## # $rdataclass: SummarizedExperiment
## # additional mcols(): taxonomyid, genome, description,
## #   coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
## #   rdatapath, sourceurl, sourcetype 
## # retrieve records with, e.g., 'object[["EH8536"]]' 
## 
##            title                                         
##   EH8536 | Chen organoid phosphoproteome                 
##   EH8537 | Chen organoid proteome                        
##   EH8538 | Chen organoid transcriptome                   
##   EH8539 | Xiao myogenesis differentation phosphoproteome
##   EH8540 | Xiao myogenesis differentiation proteome      
##   ...      ...                                           
##   EH8543 | Yang ESC epigenome                            
##   EH8544 | Yang ESC phosphoproteome                      
##   EH8545 | Yang ESC proteome                             
##   EH8546 | Yang ESC transcriptome                        
##   EH9515 | Chen organoid sctranscriptome

2.2 Package installation

# BiocManager::install("TransOmicsData")

To list the summarized metadata for all datasets in the package:

library(TransOmicsData)
listDatasets()
## DataFrame with 3 rows and 5 columns
##             Title            Description                  Omics     Species
##       <character>            <character>            <character> <character>
## 1   chen-organoid neural organoid diff.. phosphoproteome, pro..       human
## 2 xiao-myogenesis C2C12 myogenesis dif.. phosphoproteome, pro..       mouse
## 3        yang-esc ESC to epiLC differe.. epigenome, phosphopr..       human
##                RDataPath
##              <character>
## 1 TransOmicsData/0.99...
## 2 TransOmicsData/0.99...
## 3 TransOmicsData/0.99...

2.3 Citing TransOmicsData

We hope that TransOmicsData will be useful for your research. Please use the following information to cite the package. Thank you!

## Citation info
citation("TransOmicsData")
## To cite TransOmicsData in publications use:
## 
##   Chen C, Xiao D, Yang P (2024). _TransOmicsData: a collection of
##   trans-omics data covering a wide range of biological systems._.
##   University of Sydney, Sydney, Australia.
##   doi:10.18129/B9.bioc.TransOmicsData
##   <https://doi.org/10.18129/B9.bioc.TransOmicsData>,
##   <https://github.com/PYangLab/TransOmicsData>.
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {TransOmicsData: a collection of trans-omics data covering a wide range of biological systems.},
##     author = {Carissa Chen and Di Xiao and Pengyi Yang},
##     organization = {University of Sydney},
##     address = {Sydney, Australia},
##     year = {2024},
##     url = {https://github.com/PYangLab/TransOmicsData},
##     doi = {10.18129/B9.bioc.TransOmicsData},
##   }

Session info

## R Under development (unstable) (2024-10-21 r87258)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.21-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] TransOmicsData_1.1.0 ExperimentHub_2.13.1 AnnotationHub_3.13.3 BiocFileCache_2.13.2 dbplyr_2.5.0        
## [6] BiocGenerics_0.51.3  BiocStyle_2.33.1    
## 
## loaded via a namespace (and not attached):
##  [1] KEGGREST_1.45.1         xfun_0.48               bslib_0.8.0             Biobase_2.65.1         
##  [5] vctrs_0.6.5             tools_4.5.0             generics_0.1.3          stats4_4.5.0           
##  [9] curl_5.2.3              tibble_3.2.1            fansi_1.0.6             AnnotationDbi_1.67.0   
## [13] RSQLite_2.3.7           blob_1.2.4              pkgconfig_2.0.3         S4Vectors_0.43.2       
## [17] lifecycle_1.0.4         GenomeInfoDbData_1.2.13 compiler_4.5.0          Biostrings_2.73.2      
## [21] GenomeInfoDb_1.41.2     htmltools_0.5.8.1       sass_0.4.9              yaml_2.3.10            
## [25] pillar_1.9.0            crayon_1.5.3            jquerylib_0.1.4         cachem_1.1.0           
## [29] mime_0.12               tidyselect_1.2.1        digest_0.6.37           dplyr_1.1.4            
## [33] purrr_1.0.2             bookdown_0.41           BiocVersion_3.20.0      fastmap_1.2.0          
## [37] cli_3.6.3               magrittr_2.0.3          utf8_1.2.4              withr_3.0.1            
## [41] filelock_1.0.3          UCSC.utils_1.1.0        rappdirs_0.3.3          bit64_4.5.2            
## [45] rmarkdown_2.28          XVector_0.45.0          httr_1.4.7              bit_4.5.0              
## [49] png_0.1-8               memoise_2.0.1           evaluate_1.0.1          knitr_1.48             
## [53] IRanges_2.39.2          rlang_1.1.4             glue_1.8.0              DBI_1.2.3              
## [57] BiocManager_1.30.25     jsonlite_1.8.9          R6_2.5.1                zlibbioc_1.51.2