Geneplast is designed for evolutionary and plasticity analyses based on orthologous groups (OG) distribution in a given species tree. This supporting package provides datasets obtained and processed from different orthologous databases for use in geneplast evolutionary analyses.
Currently, data from the following sources are available:
Each dataset consists of four objects:
phylo
representing a phylogenetic tree for the species in sspids
.The general procedure for creating the objects previously described starts by selecting only eukaryotes species from the orthologous database with the aid of NCBI taxonomy classification.
We build a graph from taxonomy nodes and locate the root of eukaryotes. Then, we traverse this sub-graph from root to leaves corresponding to the taxonomy identifiers of the species in the database. By selecting the leaves of the resulting sub-graph, we obtain the sspids
object.
Once the species of interest are selected, the orthology information of corresponding proteins are filtered to obtain the cogdata
object. The cogids
object consists of unique orthologs identifiers from cogdata
.
Finally, the phyloTree
object is built from TimeTree full eukaryotes phylogenetic tree, which is pruned to show only our species of interest. The missing species are filled using strategies of matching genera and closest species inferred from NCBI’s tree previously built.
If you don’t already have AnnotationHub installed on your system, use BiocManager::install
to install the package:
To begin, let’s create a new AnnotationHub
connection and use it to query AnnotationHub for all Geneplast resources.
library('AnnotationHub')
#> Loading required package: BiocGenerics
#> Loading required package: parallel
#>
#> Attaching package: 'BiocGenerics'
#> The following objects are masked from 'package:parallel':
#>
#> clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
#> clusterExport, clusterMap, parApply, parCapply, parLapply,
#> parLapplyLB, parRapply, parSapply, parSapplyLB
#> The following objects are masked from 'package:stats':
#>
#> IQR, mad, sd, var, xtabs
#> The following objects are masked from 'package:base':
#>
#> Filter, Find, Map, Position, Reduce, anyDuplicated, append,
#> as.data.frame, basename, cbind, colnames, dirname, do.call,
#> duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
#> lapply, mapply, match, mget, order, paste, pmax, pmax.int, pmin,
#> pmin.int, rank, rbind, rownames, sapply, setdiff, sort, table,
#> tapply, union, unique, unsplit, which.max, which.min
#> Loading required package: BiocFileCache
#> Loading required package: dbplyr
# create an AnnotationHub connection
ah <- AnnotationHub()
#> snapshotDate(): 2020-10-26
# search for all Geneplast resources
meta <- query(ah, "geneplast")
length(meta)
#> [1] 3
head(meta)
#> AnnotationHub with 3 records
#> # snapshotDate(): 2020-10-26
#> # $dataprovider: STRING, OrthoDB, OMA
#> # $species: NA
#> # $rdataclass: Rda
#> # additional mcols(): taxonomyid, genome, description,
#> # coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
#> # rdatapath, sourceurl, sourcetype
#> # retrieve records with, e.g., 'object[["AH83116"]]'
#>
#> title
#> AH83116 | STRING database v11.0
#> AH83117 | OMA Browser All.Jan2020
#> AH83118 | OrthoDB v10.1
# types of Geneplast data available
table(meta$rdataclass)
#>
#> Rda
#> 3
# distribution of resources by specific databases
table(meta$dataprovider)
#>
#> OMA OrthoDB STRING
#> 1 1 1
Please refer to geneplast vignette for more details.
sessionInfo()
#> R version 4.0.2 Patched (2020-09-15 r79213)
#> Platform: x86_64-apple-darwin17.7.0 (64-bit)
#> Running under: macOS High Sierra 10.13.6
#>
#> Matrix products: default
#> BLAS: /Users/ka36530_ca/R-stuff/bin/R-4-0/lib/libRblas.dylib
#> LAPACK: /Users/ka36530_ca/R-stuff/bin/R-4-0/lib/libRlapack.dylib
#>
#> locale:
#> [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#>
#> attached base packages:
#> [1] parallel stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] AnnotationHub_2.21.5 BiocFileCache_1.13.1 dbplyr_1.4.4
#> [4] BiocGenerics_0.35.4
#>
#> loaded via a namespace (and not attached):
#> [1] Rcpp_1.0.5 later_1.1.0.1
#> [3] BiocManager_1.30.10 pillar_1.4.6
#> [5] compiler_4.0.2 tools_4.0.2
#> [7] digest_0.6.26 bit_4.0.4
#> [9] evaluate_0.14 RSQLite_2.2.1
#> [11] memoise_1.1.0 lifecycle_0.2.0
#> [13] tibble_3.0.4 pkgconfig_2.0.3
#> [15] rlang_0.4.8 shiny_1.5.0
#> [17] DBI_1.1.0 curl_4.3
#> [19] yaml_2.2.1 xfun_0.18
#> [21] fastmap_1.0.1 dplyr_1.0.2
#> [23] stringr_1.4.0 httr_1.4.2
#> [25] knitr_1.30 IRanges_2.23.10
#> [27] generics_0.0.2 vctrs_0.3.4
#> [29] S4Vectors_0.27.14 rappdirs_0.3.1
#> [31] stats4_4.0.2 bit64_4.0.5
#> [33] tidyselect_1.1.0 Biobase_2.49.1
#> [35] glue_1.4.2 R6_2.4.1
#> [37] AnnotationDbi_1.51.3 rmarkdown_2.4
#> [39] purrr_0.3.4 blob_1.2.1
#> [41] magrittr_1.5 promises_1.1.1
#> [43] ellipsis_0.3.1 htmltools_0.5.0
#> [45] assertthat_0.2.1 xtable_1.8-4
#> [47] mime_0.9 interactiveDisplayBase_1.27.5
#> [49] httpuv_1.5.4 stringi_1.5.3
#> [51] BiocVersion_3.12.0 crayon_1.3.4