systemPipeR 2.11.7
This is a Generic workflow template for building new workflows. It is provided by
systemPipeRdata,
a companion package to systemPipeR (H Backman and Girke 2016).
Similar to other systemPipeR
workflow templates, a single command generates
the necessary working environment. This includes the expected directory structure
for executing systemPipeR
workflows and parameter files for running
command-line (CL) software utilized in specific analysis steps.
In-depth information can be found in the main vignette of systemPipeRdata.
The Generic template presented here is special that it provides a workflow
skelleton intended to be used as a starting point for building new workflows.
Basic workflow steps are included to illustrate how to design command-line (CL)
and R-based workflow steps, as well as R Markdown code chunks that are not part
of a workflow. For more comprehensive information on designing
and executing workflows, users want to refer to the main vignettes of
systemPipeR
and
systemPipeRdata.
The details about contructing workflow steps are explained in the
Detailed Tutorial section
of systemPipeR's
main vignette that uses the same workflow steps as the Generic workflow template.
The Rmd
file (new.Rmd
) associated with this vignette serves a dual purpose.
It acts both as a template for executing the workflow and as a template for
generating a reproducible scientific analysis report. Thus, users want to
customize the text (and/or code) of this or other systemPipeR
workflow vignettes to describe their
experimental design and analysis results. This typically involves deleting the
instructions how to work with this workflow, and customizing the text
describing experimental designs, other metadata and analysis results.
The Generic
workflow template includes the following four data processing steps.
The topology graph of this workflow template is shown in Figure 1.
The environment of the chosen workflow is generated with the genWorenvir
function. After this, the user’s R session needs to be directed into the resulting directory
(here new
).
systemPipeRdata::genWorkenvir(workflow = "new", mydirname = "new")
setwd("new")
The SPRproject
function initializes a new workflow project instance. This function
call creates a an empty SAL
workflow container and at the same time a
linked project log directory (default name .SPRproject
) that acts as a flat-file
database of a workflow. For additional details, please visit this
section
in systemPipeR's
main vignette.
library(systemPipeR)
sal <- SPRproject()
sal
This section illustrates how to load the following five workflow steps into a
SAL
workflow container (SYSargsList
) first one-by-one in interactive mode
(see here) or with the importWF
command (see here),
and then run the workflow with the runWF
command.
Next, the systemPipeR
package needs to be loaded in a workflow.
appendStep(sal) <- LineWise(code = {
library(systemPipeR)
}, step_name = "load_library")
After adding the R code, sal contains now one workflow step.
sal
This is the first data processing step. In this case it is an R step that uses the LineWise
function to define the workflow step, and appends it to the SAL
workflow container.
appendStep(sal) <- LineWise(code = {
mapply(FUN = function(x, y) write.csv(x, y), x = split(iris,
factor(iris$Species)), y = file.path("results", paste0(names(split(iris,
factor(iris$Species))), ".csv")))
}, step_name = "export_iris", dependency = "load_library")
The following adds a CL step that uses the gzip
software to compress the files that were
generated in the previous step.
targetspath <- system.file("extdata/cwl/gunzip", "targets_gunzip.txt",
package = "systemPipeR")
appendStep(sal) <- SYSargsList(targets = targetspath, dir = TRUE,
wf_file = "gunzip/workflow_gzip.cwl", input_file = "gunzip/gzip.yml",
dir_path = "param/cwl", inputvars = c(FileName = "_FILE_PATH_",
SampleName = "_SampleName_"), step_name = "gzip", dependency = "export_iris")
Next, the output files (here compressed gz
files), that were generated by the
previous gzip
step, will be uncompressed in the current step with the gunzip
software.
appendStep(sal) <- SYSargsList(targets = "gzip", dir = TRUE,
wf_file = "gunzip/workflow_gunzip.cwl", input_file = "gunzip/gunzip.yml",
dir_path = "param/cwl", inputvars = c(gzip_file = "_FILE_PATH_",
SampleName = "_SampleName_"), rm_targets_col = "FileName",
step_name = "gunzip", dependency = "gzip")
Imports the tabular files from the previous step back into R, performs some summary statistics and plots the results as bar diagrams.
appendStep(sal) <- LineWise(code = {
# combine all files into one data frame
df <- lapply(getColumn(sal, step = "gunzip", "outfiles"),
function(x) read.delim(x, sep = ",")[-1])
df <- do.call(rbind, df)
# calculate mean and sd for each species
stats <- data.frame(cbind(mean = apply(df[, 1:4], 2, mean),
sd = apply(df[, 1:4], 2, sd)))
stats$species <- rownames(stats)
# plot
plot <- ggplot2::ggplot(stats, ggplot2::aes(x = species,
y = mean, fill = species)) + ggplot2::geom_bar(stat = "identity",
color = "black", position = ggplot2::position_dodge()) +
ggplot2::geom_errorbar(ggplot2::aes(ymin = mean - sd,
ymax = mean + sd), width = 0.2, position = ggplot2::position_dodge(0.9))
plot
}, step_name = "stats", dependency = "gunzip", run_step = "optional")
appendStep(sal) <- LineWise(code = {
sessionInfo()
}, step_name = "sessionInfo", dependency = "stats")
Once the above steps have been loaded into sal
, the workflow can be executed from start to
finish (or partially) with the runWF
command. Subsequently, scientific and technical workflow
reports can be generated with the renderReport
and renderLogs
functions, respectively.
The following code section also demonstrates how the above workflow steps can be imported with
the importWF
function from the associated Rmd
workflow script (here new.Rmd
). Constructing
workflow instances with this automated approach is usually preferred since it is much more convenient
and reliable compared to the manual approach described earlier.
Note: To demonstrate the ‘systemPipeR’s’ automation routines without regenerating a new workflow
environment from scratch, the first line below uses the overwrite=TRUE
option of the SPRproject
function.
This option is generally discouraged as it erases the existing workflow project and sal
container.
For information on resuming and restarting workflow runs, users want to consult the relevant section of
the main vignette (see here.)
sal <- SPRproject(overwrite = TRUE) # Avoid 'overwrite=TRUE' in real runs.
sal <- importWF(sal, file_path = "new.Rmd") # Imports above steps from new.Rmd.
sal <- runWF(sal) # Runs workflow.
plotWF(sal) # Plots workflow topology graph
sal <- renderReport(sal) # Renders scientific report.
sal <- renderLogs(sal) # Renders technical report from log files.
The listCmdTools
(and listCmdModules
) return the CL tools that
are used by a workflow. To include a CL tool list in a workflow report,
one can use the following code. Additional details on this topic
can be found in the main vignette here.
if (file.exists(file.path(".SPRproject", "SYSargsList.yml"))) {
local({
sal <- systemPipeR::SPRproject(resume = TRUE)
systemPipeR::listCmdTools(sal)
systemPipeR::listCmdModules(sal)
})
} else {
cat(crayon::blue$bold("Tools and modules required by this workflow are:\n"))
cat(c("gzip", "gunzip"), sep = "\n")
}
## Tools and modules required by this workflow are:
## gzip
## gunzip
This is the session information that will be included when rendering this report.
sessionInfo()
## R Under development (unstable) (2024-10-21 r87258)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.21-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets
## [6] methods base
##
## other attached packages:
## [1] BiocStyle_2.33.1
##
## loaded via a namespace (and not attached):
## [1] digest_0.6.37 R6_2.5.1
## [3] codetools_0.2-20 bookdown_0.41
## [5] fastmap_1.2.0 xfun_0.48
## [7] cachem_1.1.0 knitr_1.48
## [9] htmltools_0.5.8.1 rmarkdown_2.28
## [11] lifecycle_1.0.4 cli_3.6.3
## [13] sass_0.4.9 jquerylib_0.1.4
## [15] compiler_4.5.0 highr_0.11
## [17] tools_4.5.0 evaluate_1.0.1
## [19] bslib_0.8.0 yaml_2.3.10
## [21] formatR_1.14 BiocManager_1.30.25
## [23] crayon_1.5.3 jsonlite_1.8.9
## [25] rlang_1.1.4