OmnipathR 3.5.25
1 Institute for Computational Biomedicine, Heidelberg University
The integration between NicheNet and OmniPath is a fresh development, as of May 2021 we have already tested the basic functionality. To use the features described here you will need the latest git version of OmnipathR, which we update often as we test and improve the elements of the pipeline. Bug reports, questions and suggestions are welcome, please open an issue on github.
The OmnipathR package contains methods to build NichNet[1] format networks from OmniPath[2] data and from the resources used in the original NicheNet study. With a few exceptions, almost all resources used originally by NicheNet are available by OmnipathR both in their raw format or after minimal preprocessing, and in the format suitable for NicheNet. However, the networks built by OmnipathR are not completely identical with the ones used by NicheNet due to the complexity of processing such a large amount of data. The advantage of building these networks with OmnipathR is the transparent and reproducible process, as the origin of any data record can be easily traced back to its original source. The data used in the NicheNet publication is deposited at Zenodo. The other part of the prior knowledge used by NicheNet is a collection of expression data from ligand perturbation experiments. In OmnipathR this is taken directly from the Zenodo repository.
Apart from building the prior knowledge, OmnipathR provides a number of glue methods which connect elements of the NicheNet workflow, ultimately making it possible to run everything with a single function call, starting from the processed transcriptomics data and going until the predicted ligand activities. To implement such a workflow we followed the analysis from the case study in the OmniPath paper and the vignettes from NicheNet. The workflow consists of thin wrappers around some of the NicheNet methods. Parameters for these methods can be provided from the top level call.
The transcriptomics data is the main input for the pipeline which is specific for the study. The transcriptomics data can be processed by standard tools, such as DESeq[3] for bulk and Seurat[4] for single-cell transcriptomics. For the latter more guidance is available in the NicheNet vignettes. The pipeline presented here only requires the list of genes expressed in the transmitter and receiver cells (in case of autocrine signaling these are the same); and a list with genes of interest. Genes of interest might come from the investigation subject or gene set enrichment analysis (see an example here).
The nichenet_main
function executes all the workflow elements, hence it can
be used to run the complete workflow. Here we show it as an example, and in
the next sections we take a closer look at the individual steps. The first
four arguments comes from the study data and objective, as discussed above.
All parameters of the workflow can be overridden here, for example, for the
mlrbo_optimization
we override the ncores
argument, and exclude CPDB
from the signaling network and set the confidence threshold for EVEX.
library(OmnipathR)
library(nichenetr)
nichenet_results <- nichenet_main(
expressed_genes_transmitter = expressed_genes_transmitter,
expressed_genes_receiver = expressed_genes_receiver,
genes_of_interest = genes_of_interest,
background_genes = background_genes,
signaling_network = list(
cpdb = NULL,
evex = list(min_confidence = 1.0)
),
gr_network = list(only_omnipath = TRUE),
n_top_ligands = 20,
mlrmbo_optimization_param = list(ncores = 4)
)
Without transcriptomics data, it is possible to build a NicheNet model which
later can be used for the ligand activity prediction. Hence if the first four
arguments above are NULL
the pipeline will run only the model building.
Still, in this tutorial we wrap all blocks into eval = FALSE
because the
model optimization is computationally intensive, it might take hours to run.
To create the NicheNet model it’s enough to call nichenet_main
without
arguments, all further arguments are optional override of package defaults.
nichenet_model <- nichenet_main()
Before running the pipeline it’s necessary to install nichenetr
and its
dependencies. The nichenetr
package is available only from github, while
the dependencies are distributed in standard repositories like CRAN.
require(devtools)
install_github('saeyslab/nichenetr')
This procedure should install the dependencies as well. Some more specific
dependencies are mlrMBO
, parallelMap
, ParamHelpers
and smoof
.
The nichenetr
package is suggested by OmnipathR
but not loaded and
attached by default. This might cause issues, for example NicheNet requires
access to some of its lazy loaded external data. To address this, you can
either attach nichenetr
:
library(nichenetr)
Or call a function which loads the datasets into the global environment, so NicheNet can access them:
nichenet_workarounds()
With a fresh setup, as a first step it is recommended to test the pipeline,
to make sure it is able to run in your particular environment, all packages
are installed and loaded properly. The optimization process takes hours to
run, while the test included in OmnipathR
takes only around 10 minutes.
This way you can avoid some errors to come up after running the analysis
already for many hours. Although it’s never guaranteed that you won’t have
errors in the pipeline as the outcome depends on the input data and certain
conditions are not handled in nichenetr
, mlr
and mlrMBO
. If you get
an error during the optimization process, start it over a few times to see
if it’s consistent. Sometimes it helps also to restart your R session. I
would recommend to try it around 10 times before suspecting a bug. See more
details in the docs of nichenet_test
.
The nichenet_test
function builds a tiny network and runs the pipeline
with a highly reduced number of parameters and iteration steps in the
optimization. Also it inputs dummy gene sets as genes of interest and
background genes.
require(OmnipathR)
nichenet_workarounds()
nichenet_test()
NicheNet requires three types of interactions: signaling, ligand-receptor (lr)
and gene regulatory (gr). All these are collected from various resources. In
OmnipathR a top level function, nichenet_networks
manages the build of all
the three networks. It returns a list with three data frames:
networks <- nichenet_networks()
Its only_omnipath
argument is a shortcut to build all networks by using
only OmniPath data:
networks <- nichenet_networks(only_omnipath = TRUE)
To the network building functions of OmniPath many further arguments can
be provided, at the end these are passed to
import_post_translational_interactions
, import_intercell_network
and
import_transcriptional_interactions
, so we recommend to read the manual of
these functions. As an example, one can restrict the signaling network to
the resources SIGNOR and PhosphoSite, while for ligand-receptor use only
ligands and enzymes on the transmitter side and receptors and transporters
on the receiver side, while for gene regulatory network, use only the A
confidence level of DoRothEA:
networks <- nichenet_networks(
only_omnipath = TRUE,
signaling_network = list(
omnipath = list(
resources = c('SIGNOR', 'PhosphoSite')
)
),
lr_network = list(
omnipath = list(
transmitter_param = list(parent = c('ligand', 'secreted_enzyme')),
receiver_param = list(parent = c('receptor', 'transporter'))
)
),
gr_network = list(
omnipath = list(
resources = 'DoRothEA',
dorothea_levels = 'A'
)
)
)
The function nichenet_signaling_network
builds the signaling network from
all resources with the default settings. Each argument of this function is a
list of arguments for a single resource. If an argument is set to NULL
, the
resource will be omitted. In the example below we set custom confidence score
thresholds for ConsensusPathDB and EVEX, while use no data from Pathway
Commons. In this case, the rest of the resources will be loaded with their
default parameters:
signaling_network <- nichenet_signaling_network(
cpdb = list(
complex_max_size = 1,
min_score = .98
),
evex = list(
min_confidence = 2
),
pathwaycommons = NULL
)
Currently the following signaling network resources are available by
OmnipathR: OmniPath, Pathway Commons, Harmonizome (PhosphoSite, KEA and
DEPOD), Vinayagam et al. 2011, ConsensusPathDB, EVEX and InWeb InBioMap.
Let’s take a closer look on one of the resource specific methods. The name
of these functions all follow the same scheme, e.g. for EVEX it is:
nichenet_signaling_network_evex
. From most of the resources we just load
and format the interactions, but a few of them accepts parameters, such as
confidence score thresholds or switches to include or exclude certain kind
of records:
evex_signaling <- nichenet_signaling_network_evex(top_confidence = .9)
The network format of NicheNet is very minimalistic, it contains only the
interacting partners and the resource. To investigate the data more in depth,
we recommend to look at the original data. The methods retrieving data from
resources all end by _download
for foreign resources and start by import_
for OmniPath. For example, to access the EVEX data as it is provided by the
resource:
evex <- evex_download(remove_negatives = FALSE)
These networks work exactly the same way as the signaling network: they are
built by the nichenet_lr_network
and nichenet_gr_network
functions.
Currently the following ligand-receptor network resources are available in
OmnipathR: OmniPath, Ramilowski et al. 2015, Guide to Pharmacology
(IUPHAR/BPS); and in the gene regulatory network: OmniPath, Harmonizome,
RegNetwork, HTRIdb, ReMap, EVEX, PathwayCommons and TRRUST.
The ligand-receptor interactions from OmniPath are special: these cover
broader categories than only ligands and receptors. It includes also
secreted enzymes, transporters, etc. Also the import_intercell_network
function, responsible for creating the OmniPath LR network, by default
returns the most complete network. However, this contains a considerable
number of interactions which might be false positives in the context of
intercellular communication. It is highly recommended to apply quality
filtering on this network, using the quality_filter_param
argument of
the functions in the NicheNet pipeline. The default settings already
ensure a certain level of filtering, read more about the options in the
docs of filter_intercell_network
. Passing the high_confidence = TRUE
argument by lr_network = list(omnipath = list(high_confidence = TRUE))
results a quite stringent filtering, it’s better to find your preferred
options and set them by quality_filter_param
. An example how to pass
quality options from the top of the pipeline:
nichenet_results <- nichenet_main(
quality_filter_param = list(
min_curation_effort = 1,
consensus_percentile = 30
)
)
Initially for testing purposes we included an option to create a small network (few thousands interactions instead of few millions), using only a high quality subset of OmniPath data. It might be interesting to try the analysis with this lower coverage but higher confidence network:
nichenet_results <- nichenet_main(small = TRUE)
The small network is not customizable, and it uses the high_confidence
option in import_intercell_network
. You can build similar smaller networks
by using the general options instead. Another option tiny
generates an
even smaller network, however this involves a random subsetting, the outcome
is not deterministic, this option is only for testing purposes.
The expression data from the collection of over a hundred of ligand
perturbation experiments is available in the data deposited by NicheNet
authors in Zenodo, in the
expression_settings.rds
file. The function below downloads and imports
this data:
expression <- nichenet_expression_data()
If there is any ligand which is missing from the ligand-receptor network, probably we want to remove it:
lr_network <- nichenet_lr_network()
expression <- nichenet_remove_orphan_ligands(
expression = expression,
lr_network = lr_network
)
In this step we optimize the parameters of the model and assign weights to
the network resources, especially based on the ligand perturbation data. The
nichenet_optimization
function is a wrapper around
nichenetr::mlrmbl_optimization
. The arguments for this function, and the
objective function can be overriden, for example it’s a good idea to set
the cores according to our machine’s CPU count:
optimization_results <- nichenet_optimization(
networks = networks,
expression = expression,
mlrmbo_optimization_param = list(ncores = 4)
)
This function takes very long, even hours to run. Before it returns the
results, it saves them into an RDS file. This and the further RDS files are
saved into the nichenet_results
directory by default, it can be changed
by the option omnipath.nichenet_results_dir
.
options(omnipath.nichenet_results_dir = 'my/nichenet/dir')
nichenet_results_dir()
# [1] "my/nichenet/dir"
The next major part is to build a model which connects ligands to targets,
i.e. which genes are affected in their expression by which of the ligands.
Again, we use a wrapper around NicheNet functions: the nichenet_build_model
calls nichenetr::onstruct_weighted_networks
and
nichenetr::apply_hub_corrections
. If the argument weights
is FALSE
,
the optimized resource weights are not used, all resources considered with
the same weight. The results are saved automatically into an RDS.
nichenet_model <- nichenet_build_model(
optimization_results = optimization_results,
networks = networks,
)
Finally, we produce a ligand-target matrix, a sparse matrix with weighted
connections from ligands to targets. The function
nichenet_ligand_target_matrix
is a wrapper around
nichenetr::construct_ligand_target_matrix
. As usual the results are
exported to an RDS. This is the last step of the NicheNet model building
process.
lt_matrix <- nichenet_ligand_target_matrix(
nichenet_model$weighted_networks,
networks$lr_network,
nichenet_model$optimized_parameters
)
In this step we ues the NicheNet model to infer ligand activities from our transcriptomics data. The last three arguments are all character vectors with gene symbols. For genes of interest, here we select 50 random genes from the targets in the ligand-target matrix. Also just for testing purposes, we consider all genes in the network to be expressed in both the transmitter and receiver cells.
genes_of_interest <- sample(rownames(ligand_target_matrix), 50)
background_genes <- setdiff(
rownames(ligand_target_matrix),
genes_of_interest
)
expressed_genes_transmitter <- union(
unlist(purrr::map(networks, 'from')),
unlist(purrr::map(networks, 'to'))
)
expressed_genes_receiver <- expressed_genes_transmitter
ligand_activities <- nichenet_ligand_activities(
ligand_target_matrix = lt_matrix,
lr_network = networks$lr_network,
expressed_genes_transmitter = expressed_genes_transmitter,
expressed_genes_receiver = expressed_genes_receiver,
genes_of_interest = genes_of_interest
)
Once the ligand activities inferred, we can obtain a list of top ligand-target links. It’s up to us how many of the top ranking ligands, and for each of these ligands, how many of their top targets we include in this table:
lt_links <- nichenet_ligand_target_links(
ligand_activities = ligand_activities,
ligand_target_matrix = lt_matrix,
genes_of_interest = genes_of_interest,
n_top_ligands = 20,
n_top_targets = 100
)
The NicheNet workflow in OmnipathR is implemented until this point. The
results can be visualized by the methods included in nichenetr
, can be
analysed for enrichment of functional properties, or the signaling network
connecting the ligands to targets can be reconstructed and analysed,
as it’s shown in the OmniPath
case study.
## R version 4.2.1 (2022-06-23)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.5 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.16-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.16-bioc/R/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_GB
## [4] LC_COLLATE=C LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C
## [10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] magrittr_2.0.3 ggraph_2.1.0 igraph_1.3.5 ggplot2_3.3.6 dplyr_1.0.10 OmnipathR_3.5.25
## [7] BiocStyle_2.25.0
##
## loaded via a namespace (and not attached):
## [1] viridis_0.6.2 httr_1.4.4 sass_0.4.2 tidyr_1.2.1 tidygraph_1.2.2
## [6] bit64_4.0.5 vroom_1.6.0 jsonlite_1.8.3 viridisLite_0.4.1 bslib_0.4.0
## [11] assertthat_0.2.1 highr_0.9 BiocManager_1.30.18 selectr_0.4-2 cellranger_1.1.0
## [16] yaml_2.3.6 progress_1.2.2 ggrepel_0.9.1 pillar_1.8.1 backports_1.4.1
## [21] glue_1.6.2 digest_0.6.30 polyclip_1.10-4 checkmate_2.1.0 rvest_1.0.3
## [26] colorspace_2.0-3 htmltools_0.5.3 pkgconfig_2.0.3 logger_0.2.2 magick_2.7.3
## [31] bookdown_0.29 purrr_0.3.5 scales_1.2.1 tweenr_2.0.2 later_1.3.0
## [36] tzdb_0.3.0 ggforce_0.4.1 tibble_3.1.8 generics_0.1.3 farver_2.1.1
## [41] ellipsis_0.3.2 cachem_1.0.6 withr_2.5.0 cli_3.4.1 crayon_1.5.2
## [46] readxl_1.4.1 evaluate_0.17 fansi_1.0.3 MASS_7.3-58.1 xml2_1.3.3
## [51] tools_4.2.1 prettyunits_1.1.1 hms_1.1.2 lifecycle_1.0.3 stringr_1.4.1
## [56] munsell_0.5.0 compiler_4.2.1 jquerylib_0.1.4 rlang_1.0.6 grid_4.2.1
## [61] rappdirs_0.3.3 labeling_0.4.2 rmarkdown_2.17 gtable_0.3.1 DBI_1.1.3
## [66] curl_4.3.3 graphlayouts_0.8.3 R6_2.5.1 gridExtra_2.3 knitr_1.40
## [71] fastmap_1.1.0 bit_4.0.4 utf8_1.2.2 readr_2.1.3 stringi_1.7.8
## [76] parallel_4.2.1 Rcpp_1.0.9 vctrs_0.5.0 tidyselect_1.2.0 xfun_0.34
[1] R Browaeys, W Saelens and Y Saeys (2020) NicheNet: modeling intercellular communication by linking ligands to target genes. Nat Methods 17, 159–162
[2] D Turei, A Valdeolivas, L Gul, N Palacio-Escat, M Klein, O Ivanova, M Olbei, A Gabor, F Theis, D Modos, T Korcsmaros and J Saez-Rodriguez (2021) Integrated intra- and intercellular signaling knowledge for multicellular omics analysis. Molecular Systems Biology 17:e9923
[3] S Anders, W Huber (2010) Differential expression analysis for sequence count data. Genome Biol 11, R106
[4] Y Hao, S Hao, E Andersen-Nissen, WM Mauck, S Zheng, A Butler, MJ Lee, AJ Wilk, C Darby, M Zagar, P Hoffman, M Stoeckius, E Papalexi, EP Mimitou, J Jain, A Srivastava, T Stuart, LB Fleming, B Yeung, AJ Rogers, JM McElrath, CA Blish, R Gottardo, P Smibert and R Satija (2020) Integrated analysis of multimodal single-cell data. bioRxiv 2020.10.12.335331