The package geneplast.data provides datasets from different sources via AnnotationHub to use in geneplast pipelines. The datasets have species, phylogenetic trees, and orthology relationships among eukaryotes from different orthologs databases.
geneplast.data 0.99.6
Geneplast is designed for large-scale evolutionary plasticity and rooting analysis based on orthologs groups (OG) distribution in a given species tree. This supporting package provides datasets obtained and processed from different orthologs databases for use in geneplast evolutionary analyses.
Currently, data from the following sources are available:
Each dataset consists of four objects:
data.frame
containing OG identifiers.data.frame
with species identifiers.data.frame
with OG to protein mapping.phylo
representing a phylogenetic tree for the species in sspids
.The general procedure for creating the objects previously described starts by selecting only eukaryotes species from the orthologs database with the aid of NCBI taxonomy classification.
We build a graph from taxonomy nodes and locate the root of eukaryotes. Then, we traverse this sub-graph from root to leaves corresponding to the taxonomy identifiers of the species in the database. By selecting the leaves of the resulting sub-graph, we obtain the sspids
object.
Once the species of interest are selected, the orthology information of corresponding proteins is filtered to obtain the cogdata
object.
The cogids
object consists of unique orthologs identifiers from cogdata
.
Finally, the phyloTree
object is built from TimeTree full eukaryotes phylogenetic tree, which is pruned to show only our species of interest. The missing species are filled using strategies of matching genera and closest species inferred from NCBI’s tree previously built.
1 - Create a new AnnotationHub
connection and query for all geneplast resources.
library('AnnotationHub')
# create an AnnotationHub connection
ah <- AnnotationHub()
# search for all geneplast resources
meta <- query(ah, "geneplast")
head(meta)
2 - Load the objects into the session using the ID of the chosen dataset.
# load the objects from STRING database v11.0
load(meta[["AH83116"]])
This section reproduces a case study using annotated datasets from STRING, OMA, and OrthoDB.
The following steps show how to run geneplast rooting analysis and transfer its results to a graph model. For detailed step-by-step instructions, please check the geneplast vignette.
1 - Create an object of class ‘OGR’ for a reference ‘spid’.
library(geneplast)
ogr <- groot.preprocess(cogdata=cogdata, phyloTree=phyloTree, spid="9606")
2 - Run the groot
function and infer the evolutionary roots.
Note: this step should take a long processing time due to the large number of OGs in the input data (also, nPermutations
argument is set to 100 for demonstration purpose only).
ogr <- groot(ogr, nPermutations=100, verbose=TRUE)
1 - Load a PPI network and required packages. The igraph
object called ‘ppi.gs’ provides PPI information for apoptosis and genome-stability genes [@Castro2008].
library(RedeR)
library(igraph)
library(RColorBrewer)
data(ppi.gs)
2 - Map rooting information on the igraph
object.
g <- ogr2igraph(ogr, cogdata, ppi.gs, idkey = "ENTREZ")
3 - Adjust colors for rooting information.
pal <- brewer.pal(9, "RdYlBu")
color_col <- colorRampPalette(pal)(37) #set a color for each root!
g <- att.setv(g=g, from="Root", to="nodeColor", cols=color_col, na.col = "grey80", breaks = seq(1,37))
4 - Aesthetic adjusts for some graph attributes.
g <- att.setv(g = g, from = "SYMBOL", to = "nodeAlias")
E(g)$edgeColor <- "grey80"
V(g)$nodeLineColor <- "grey80"
5 - Send the igraph
object to RedeR interface.
rdp <- RedPort()
calld(rdp)
resetd(rdp)
addGraph(rdp, g)
addLegend.color(rdp, colvec=g$legNodeColor$scale, size=15, labvec=g$legNodeColor$legend, title="Roots represented in Fig1")
6 - Get apoptosis and genome-stability sub-networks.
g1 <- induced_subgraph(g=g, V(g)$name[V(g)$Apoptosis==1])
g2 <- induced_subgraph(g=g, V(g)$name[V(g)$GenomeStability==1])
7 - Group apoptosis and genome-stability genes into containers.
myTheme <- list(nestFontSize=25, zoom=80, isNest=TRUE, gscale=65, theme=2)
addGraph(rdp, g1, gcoord=c(25, 50), theme = c(myTheme, nestAlias="Apoptosis"))
addGraph(rdp, g2, gcoord=c(75, 50), theme = c(myTheme, nestAlias="Genome Stability"))
relax(rdp, p1=50, p2=50, p3=50, p4=50, p5= 50, ps = TRUE)
load(meta[["AH83117"]])
cogdata$cog_id <- paste0("OMA", cogdata$cog_id)
cogids$cog_id <- paste0("OMA", cogids$cog_id)
human_entrez_2_oma_Aug2020 <- read_delim("processed_human.entrez_2_OMA.Aug2020.tsv",
delim = "\t", escape_double = FALSE,
col_names = FALSE, trim_ws = TRUE)
names(human_entrez_2_oma_Aug2020) <- c("protein_id", "gene_id")
cogdata <- cogdata %>% left_join(human_entrez_2_oma_Aug2020)
ogr <- groot.preprocess(cogdata=cogdata, phyloTree=phyloTree, spid="9606")
ogr <- groot(ogr, nPermutations=100, verbose=TRUE)
g <- ogr2igraph(ogr, cogdata, ppi.gs, idkey = "ENTREZ")
pal <- brewer.pal(9, "RdYlBu")
color_col <- colorRampPalette(pal)(37) #set a color for each root!
g <- att.setv(g=g, from="Root", to="nodeColor", cols=color_col, na.col = "grey80", breaks = seq(1,37))
g <- att.setv(g = g, from = "SYMBOL", to = "nodeAlias")
E(g)$edgeColor <- "grey80"
V(g)$nodeLineColor <- "grey80"
# rdp <- RedPort()
# calld(rdp)
resetd(rdp)
addGraph(rdp, g)
addLegend.color(rdp, colvec=g$legNodeColor$scale, size=15, labvec=g$legNodeColor$legend, title="Roots represented in Fig2")
g1 <- induced_subgraph(g=g, V(g)$name[V(g)$Apoptosis==1])
g2 <- induced_subgraph(g=g, V(g)$name[V(g)$GenomeStability==1])
myTheme <- list(nestFontSize=25, zoom=80, isNest=TRUE, gscale=65, theme=2)
addGraph(rdp, g1, gcoord=c(25, 50), theme = c(myTheme, nestAlias="Apoptosis"))
addGraph(rdp, g2, gcoord=c(75, 50), theme = c(myTheme, nestAlias="Genome Stability"))
relax(rdp, p1=50, p2=50, p3=50, p4=50, p5= 50, ps = TRUE)
load(meta[["AH83118"]])
cogdata$cog_id <- paste0("ODB", cogdata$cog_id)
cogids$cog_id <- paste0("ODB", cogids$cog_id)
human_entrez_2_odb <- read_delim("odb10v1_genes-human-entrez.tsv",
delim = "\t", escape_double = FALSE,
col_names = FALSE, trim_ws = TRUE)
names(human_entrez_2_odb) <- c("protein_id", "gene_id")
cogdata <- cogdata %>% left_join(human_entrez_2_odb)
ogr <- groot.preprocess(cogdata=cogdata, phyloTree=phyloTree, spid="9606")
ogr <- groot(ogr, nPermutations=100, verbose=TRUE)
g <- ogr2igraph(ogr, cogdata, ppi.gs, idkey = "ENTREZ")
pal <- brewer.pal(9, "RdYlBu")
color_col <- colorRampPalette(pal)(37) #set a color for each root!
g <- att.setv(g=g, from="Root", to="nodeColor", cols=color_col, na.col = "grey80", breaks = seq(1,37))
g <- att.setv(g = g, from = "SYMBOL", to = "nodeAlias")
E(g)$edgeColor <- "grey80"
V(g)$nodeLineColor <- "grey80"
rdp <- RedPort()
calld(rdp)
resetd(rdp)
addGraph(rdp, g)
addLegend.color(rdp, colvec=g$legNodeColor$scale, size=15, labvec=g$legNodeColor$legend, title="Roots represented in Fig3")
g1 <- induced_subgraph(g=g, V(g)$name[V(g)$Apoptosis==1])
g2 <- induced_subgraph(g=g, V(g)$name[V(g)$GenomeStability==1])
myTheme <- list(nestFontSize=25, zoom=80, isNest=TRUE, gscale=65, theme=2)
addGraph(rdp, g1, gcoord=c(25, 50), theme = c(myTheme, nestAlias="Apoptosis"))
addGraph(rdp, g2, gcoord=c(75, 50), theme = c(myTheme, nestAlias="Genome Stability"))
relax(rdp, p1=50, p2=50, p3=50, p4=50, p5= 50, ps = TRUE)
sessionInfo()
#> R Under development (unstable) (2022-03-17 r81925)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 20.04.4 LTS
#>
#> Matrix products: default
#> BLAS: /home/biocbuild/bbs-3.15-bioc/R/lib/libRblas.so
#> LAPACK: /home/biocbuild/bbs-3.15-bioc/R/lib/libRlapack.so
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_GB LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] BiocStyle_2.23.1
#>
#> loaded via a namespace (and not attached):
#> [1] bookdown_0.25 digest_0.6.29 R6_2.5.1
#> [4] jsonlite_1.8.0 magrittr_2.0.3 evaluate_0.15
#> [7] stringi_1.7.6 rlang_1.0.2 cli_3.2.0
#> [10] jquerylib_0.1.4 bslib_0.3.1 rmarkdown_2.13
#> [13] tools_4.2.0 stringr_1.4.0 xfun_0.30
#> [16] yaml_2.3.5 fastmap_1.1.0 compiler_4.2.0
#> [19] BiocManager_1.30.16 htmltools_0.5.2 knitr_1.38
#> [22] sass_0.4.1