This R package implements inferential methods to compare gene lists (proving equivalence) in terms of their biological significance as expressed in the Gene Ontology GO. The compared gene lists are characterized by cross-tabulation frequency tables of enriched GO terms. Dissimilarity between gene lists is evaluated using the Sorensen-Dice dissimilarity. The fundamental guiding principle is that two gene lists are taken as similar if they share a significant proportion of common enriched GO terms.
This inferential method, is developed and explained in the paper An equivalence test between features lists, based on the Sorensen-Dice index and the joint frequencies of GO term enrichment, available online in https://rdcu.be/cOISz
goSorensen package has to be installed with a working R version (>=4.3.3). Installation could take a few minutes on a regular desktop or laptop. Package can be installed from Bioconductor as follows:
if (!requireNamespace("goSorensen", quietly = TRUE)) {
BiocManager::install("goSorensen")
}
library(goSorensen)
Before using goSorensen
, the user must have adequate
knowledge of the species they intend to focus their analysis. The
genomic annotation packages available in Bioconductor provide all the
essential information about these species. For instance, if the research
involves mice, the user must install and activate the annotation package
org.Mm.eg.db
in the following manner:
if (!requireNamespace("org.Mm.eg.db", quietly = TRUE)) {
BiocManager::install("org.Mm.eg.db")
}
Other species may include:
org.Hs.eg.db
: Genome wide annotation for Humans.org.At.tair.db
: Genome wide annotation for
Arabidopsisorg.Ag.eg.db
: Genome wide annotation for Anophelesorg.Bt.eg.db
: Genome wide annotation for Bovineorg.Ce.eg.db
: Genome wide annotation for Wormorg.Cf.eg.db
: Genome wide annotation for Canineorg.Dm.eg.db
: Genome wide annotation for Flyorg.EcSakai.eg.db
: Genome wide annotation for E coli
strain Sakaiorg.EcK12.eg.db
: Genome wide annotation for E coli
strain K12org.Dr.eg.db
: Genome wide annotation for Zebrafishorg.Gg.eg.db
: Genome wide annotation for Chickenorg.Mm.eg.db
: Genome wide annotation for Mouseorg.Mmu.eg.db
: Genome wide annotation for RhesusDue to the extensive research conducted on the human species and the
examples documented in goSorensen about this species, the installation
of the goSorensen package automatically includes the annotation package
org.Hs.eg.db
as a dependency. Therefore, there is no need
to install it previously. The user must install the appropriate package
to use genomic annotation for other species.
The goSorensen functions require genome annotation via the
orgPackg
argument to annotate genes in GO terms for
enrichment analysis. For example, the enrichment contingency table can
be computed using the buildEnrichTable
function.
buildEnrichTable(..., orgPackg = "org.Mm.eg.db")
goSorensen
functions, such as
buidEnrichTable
, that entail identifying annotations in a
GO term, which are essential for determining enrichment, require the
user to provide, in the argument geneUniverse
, a vector
containing the identifiers of the universe of genes (wide genome) of the
species being analysed. These gene identifiers can be readily obtained
from the above-mentioned genomic annotation packages through the
function keys
. For instance, one can acquire the ENTREZ
identifiers for the universe of genes in mice as follows:
# Load the package:
library(org.Mm.eg.db)
# Obtain the IDs:
mouseEntrezIDs <- keys(org.Mm.eg.db, keytype = "ENTREZID")
Or, for the human species, as follows:
# Load the package:
library(org.Hs.eg.db)
# Obtain the IDs:
humanEntrezIDs <- keys(org.Hs.eg.db, keytype = "ENTREZID")
And in this way, for any other species
Then, one can use this object in the geneUniverse
argument in whatever goSorensen function (e.g.,
buidEnrichTable
) is needed:
buildEnrichTable(..., orgPackg = "org.Mm.eg.db", geneUniverse = mouseEntrezIDs)
Contributions are welcome, if you wish to contribute or give ideas to
improve the package, please you can contact with maintainer (Pablo
Flores) to the address p_flores@espoch.edu.ec
, and we can
discuss your suggestion.
Flores, P., Salicru, M., Sanchez-Pla, A., & Ocana, J. (2022). An equivalence test between features lists, based on the Sorensen-Dice index and the joint frequencies of GO term enrichment. BMC bioinformatics, 23(1), 1-21.