Introduction
In this vignette, CCTE Bioactivity APIs will be explored.
NOTE: Please see the introductory vignette for an overview of the ccdR package and initial set up instruction with API key storage.
Data for the Bioactivity APIs comes from ToxCast’s invitrodb.
US EPA’s Toxicity Forecaster (ToxCast) program makes in vitro medium- and high-throughput screening assay data publicly available for prioritization and hazard characterization of thousands of chemicals.
The ToxCast pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores ToxCast data to populate its linked MySQL database, InvitroDB. These assays comprise Tier 2-3 of the new Computational Toxicology Blueprint, and employ automated chemical screening technologies, to evaluate the effects of chemical exposure on living cells and biological macromolecules, such as proteins (Thomas et al., 2019). More information on the ToxCast program can be found at https://www.epa.gov/comptox-tools/toxicity-forecasting-toxcast.
This flexible analysis pipeline is capable of efficiently processing and storing large volumes of data. The diverse data, received in heterogeneous formats from numerous vendors, are transformed to a standard computable format and loaded into the tcpl database by vendor-specific R scripts. Once data is loaded into the database, ToxCast utilizes generalized processing functions provided in this package to process, normalize, model, qualify, and visualize the data.
Functions
Several ccdR functions are used to access the CCTE Bioactivity API data.
Bioactivity Assay Resource
Specific assays may be searched as well as all available assays that have data.
Get annotation by aeid
get_annotation_by_aeid()
retrieves annotation for a
specific assay endpoint id (aeid).
<- get_annotation_by_aeid(AEID = "891", API_key = apikey, Server = url)
res_dt # optionally perform this unnest, apply names_repair = "unique" to give a unique column name
# note - the gene column may be an array of multiple genes rather than just one, meaning this step may not work
#res_dt <- res_dt |> tidyr::unnest_wider(col = c("citation", "gene", "assayList"), names_repair = "unique")
Get all assay annotations
get_all_assays()
retrieves all annotations for all
assays available. Optionally, the user can unnest “citation”, “gene”,
“assayList” wider so each element has its own column.
<- get_all_assays(API_key = apikey, Server = url)
res_dt # optionally perform the following unnest, apply names_repair = "unique" to give a unique column name
# note - the gene column may be an array of multiple genes rather than just one, meaning this step may not work
#res_dt <- res_dt |> tidyr::unnest_wider(col = c("citation", "gene", "assayList"), names_repair = "unique")
Bioactivity Data Resource
There are several resources for retrieving bioactivity data associated with a variety of identifier types (e.g., DTXSID, aeid) that are available to the user.
Get summary data
get_bioactivity_summary()
retrieves a summary of the
number of active hits compared to the total number tested for both
multiple and single concentration by aeid.
<- get_bioactivity_summary(AEID = "891", API_key = apikey, Server = url) res_dt
Get data
get_bioactivity_details()
can retrieve all available
multiple concentration data by assay endpoint id (aeid), sample id
(spid), Level 4 ID (m4id), or chemical DTXSID. Examples for each are
provided below:
By spid
<- get_bioactivity_details(SPID = "TP0001055F12", API_key = apikey, Server = paste0(url, "/data")) res_dt
By m4id
<- get_bioactivity_details(m4id = 739695, API_key = apikey, Server = paste0(url, "/data")) res_dt
By DTXSID
<- get_bioactivity_details(DTXSID = "DTXSID7020182", API_key = apikey, Server = paste0(url, "/data")) res_dt
By aeid
<- get_bioactivity_details(AEID = "891", API_key = apikey, Server = paste0(url, "/data")) res_dt
Conclusion
In this vignette, a variety of functions that access different types
of data found in the Bioactivity
endpoints of the CCTE APIs
were listed. We encourage the reader to explore the data accessible
through these endpoints work with it to get a better understanding of
what data is available. Additional endpoints and corresponding functions
exist and we encourage the user to explore these.