Optimizing liquid chromatography coupled to mass spectrometry (LC–MS) methods presents a significant challenge. The ‘rawDiag’ package (Trachsel et al. 2018), accessible through rawDiag, streamlines method optimization by generating MS operator-specific diagnostic plots based on scan-level metadata. Tailored for use on the R shell or as a shiny application on the Orbitrap instrument PC, ‘rawDiag’ leverages rawrr (Kockmann and Panse 2021) for reading vendor proprietary instrument data. Developed, rigorously tested, and actively employed at the Functional Genomics Center Zurich ETHZ | UZH, ‘rawDiag’ stands as a robust solution in advancing LC–MS Orbitrap method optimization.”
rawDiag 1.3.0
BiocStyle::markdown()
knitr::opts_chunk$set(fig.wide = TRUE, fig.retina = 3, error=FALSE, eval=TRUE)
knitr::include_graphics("octopussy.png")
Over the past two decades, liquid chromatography coupled to mass spectrometry (LC–MS) has evolved into the method of choice in the field of proteomics. (Cox and Mann 2011; Mallick and Kuster 2010) During a typical LC–MS measurement, a complex mixture of analytes is separated by a liquid chromatography system coupled to a mass spectrometer (MS) through an ion source interface. This interface converts the analytes that elute from the chromatography system over time into a beam of ions. The MS records from this ion beam a series of mass spectra containing detailed information on the analyzed sample. (Savaryn, Toby, and Kelleher 2016) The resulting raw data consist of the mass spectra and their metadata, typically recorded in a vendor-specific binary format. During a measurement the mass spectrometer applies internal heuristics, which enables the instrument to adapt to sample properties, for example, sample complexity and amount of ions in near real time. Still, method parameters controlling these heuristics need to be set prior to the measurement. Optimal measurement results require a careful balancing of instrument parameters, but their complex interactions with each other make LC–MS method optimization a challenging task.
Here we present rawDiag, a platform-independent software tool implemented in the R language (Becker, Chambers, and Wilks 1988) that supports LC–MS operators during the process of empirical method optimization. Our work builds on the ideas of the discontinued software rawMeat (VAST Scientific). Our application is currently tailored toward spectral data acquired on Thermo Fisher Scientific instruments (raw format), with a particular focus on Orbitrap (Zubarev and Makarov 2013) mass analyzers (Exactive or Fusion instruments). These instruments are heavily used in the field of bottom-up proteomics (Aebersold and Mann 2003) to analyze complex peptide mixtures derived from enzymatic digests of proteomes.
rawDiag is meant to run after MS acquisition, optimally as an interactive R shiny application, and produces a series of diagnostic plots visualizing the impact of method parameter choices on the acquired data across injections. If static reports are required then pdf files can be generated using rmarkdown. In this vignette, we present the usage of our tool.
rawDiag gains advantages from being part of the Bioconductor ecosystem, such as its ability to utilize the rawrr package and potentially extend its functionality through interaction with the Spectra infrastructure, particularly with the MsBackendRawFileReader.
rawDiag proides a wrapper function readRaw
using the
rawrr methods raw::readIndex
, rawrr::readTrailer
,
and rawrr::readChromatogram
to read proprietary mass spectrometer generated
data by invoking third-party managed methods through a system2
text connection
.
The rawrr package provides the entire stack below,
which rawDiag utilizes.
R>
|
text connection
|
system2
|
Mono Runtime |
Managed Assembly
(CIL/.NET code)
rawrr.exe |
ThermoFisher.CommonCore.*.dll |
In case you prefer to compile rawrr.exe
from C# source code, please install
the mono compiler and xbuild by installing the following Linux packages:
sudo apt-get install mono-mcs mono-xbuild
Otherwise, to execute the precompiled code, the following Linux packages are sufficient:
sudo apt-get install mono-runtime libmono-system-data4.0-cil -y
The output should look like:
if (Sys.info()["sysname"] %in% c("Darwin", "Linux")) {
system2("mono", args = '--version', stdout = TRUE)
}
## [1] "Mono JIT compiler version 6.8.0.105 (Debian 6.8.0.105+dfsg-3.6ubuntu2 Sun Mar 31 02:55:28 UTC 2024)"
## [2] "Copyright (C) 2002-2014 Novell, Inc, Xamarin Inc and Contributors. www.mono-project.com"
## [3] "\tTLS: __thread"
## [4] "\tSIGSEGV: altstack"
## [5] "\tNotifications: epoll"
## [6] "\tArchitecture: amd64"
## [7] "\tDisabled: none"
## [8] "\tMisc: softdebug "
## [9] "\tInterpreter: yes"
## [10] "\tLLVM: supported, not enabled."
## [11] "\tSuspend: hybrid"
## [12] "\tGC: sgen (concurrent by default)"
Running the rawrr.exe
will run out of the box.
If the native C# compiler is not available install mono from:
To install this package, start R (version “>=4.4”) and enter:
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("rawrr")
assemblies aka Common Intermediate Language bytecode - the download and install can be done on all platforms using the command:
rawDiag::checkRawrr
## function ()
## {
## if (isFALSE(requireNamespace("BiocManager", quietly = TRUE)))
## stop("exec", "install.packages('BiocManager')")
## if (isFALSE(requireNamespace("rawrr", quietly = TRUE)))
## stop("exec", "BiocManager::install('rawrr')")
## if (isFALSE(rawrr:::.checkRawFileReaderDLLs()))
## rawrr::installRawFileReaderDLLs()
## if (isFALSE(file.exists(rawrr:::.rawrrAssembly())))
## rawrr::installRawrrExe()
## if (isFALSE(rawrr:::.isAssemblyWorking()))
## stop("rawrr assembly is not working. check if mono if available.")
## TRUE
## }
## <bytecode: 0x5ce7356092d8>
## <environment: namespace:rawDiag>
rawDiag::checkRawrr()
## [1] TRUE
if (isFALSE(rawrr::.checkDllInMonoPath())){
rawrr::installRawFileReaderDLLs()
}
## removing DLL files in directory '/home/biocbuild/.cache/R/rawrr/rawrrassembly'
## ThermoFisher.CommonCore.BackgroundSubtraction.dll
## 0
## ThermoFisher.CommonCore.Data.dll
## 0
## ThermoFisher.CommonCore.MassPrecisionEstimator.dll
## 0
## ThermoFisher.CommonCore.RawFileReader.dll
## 0
rawrr::installRawrrExe()
## MD5 96e3a4cc1b7caaf92890d85ed4c72f77 /home/biocbuild/.cache/R/rawrr/rawrrassembly/rawrr.exe
## [1] 0
rawrr::rawrrAssemblyPath()
## [1] "/home/biocbuild/.cache/R/rawrr/rawrrassembly"
rawrr::rawrrAssemblyPath() |> list.files()
## [1] "ThermoFisher.CommonCore.BackgroundSubtraction.dll"
## [2] "ThermoFisher.CommonCore.Data.dll"
## [3] "ThermoFisher.CommonCore.MassPrecisionEstimator.dll"
## [4] "ThermoFisher.CommonCore.RawFileReader.dll"
## [5] "rawrr.exe"
for more information please read the INSTALL file in the rawrr package.
fetch example Orbitrap raw files from ExperimentHub’s tartare package.
library(ExperimentHub)
## Loading required package: BiocGenerics
##
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:stats':
##
## IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
##
## Filter, Find, Map, Position, Reduce, anyDuplicated, aperm, append,
## as.data.frame, basename, cbind, colnames, dirname, do.call,
## duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
## lapply, mapply, match, mget, order, paste, pmax, pmax.int, pmin,
## pmin.int, rank, rbind, rownames, sapply, saveRDS, setdiff, table,
## tapply, union, unique, unsplit, which.max, which.min
## Loading required package: AnnotationHub
## Loading required package: BiocFileCache
## Loading required package: dbplyr
ExperimentHub::ExperimentHub() -> eh
normalizePath(eh[["EH3222"]]) -> EH3222
## see ?tartare and browseVignettes('tartare') for documentation
## loading from cache
normalizePath(eh[["EH4547"]]) -> EH4547
## see ?tartare and browseVignettes('tartare') for documentation
## loading from cache
(rawfileEH3222 <- paste0(EH3222, ".raw"))
## [1] "/home/biocbuild/.cache/R/ExperimentHub/18273831991816_3238.raw"
if (!file.exists(rawfileEH3222)){
file.copy(EH3222, rawfileEH3222)
}
(rawfileEH4547 <- paste0(EH4547, ".raw"))
## [1] "/home/biocbuild/.cache/R/ExperimentHub/1826043b0ae2f1_4590.raw"
if (!file.exists(rawfileEH4547)){
file.copy(EH4547, rawfileEH4547)
}
c(rawfileEH3222, rawfileEH4547) -> rawfile
Of note, the proprietary .Net assemblies (Shofstahl 2018) require a file extentention of .raw
. Therfore we have to rename the EH files and add the .raw
suffix.
list meta data of the raw files.
(rawfile |>
lapply(FUN = rawrr::readFileHeader) -> rawFileHeader)
## [[1]]
## [[1]]$`RAW file`
## [1] "18273831991816_3238.raw"
##
## [[1]]$`RAW file version`
## [1] "66"
##
## [[1]]$`Creation date`
## [1] "7/16/2019 5:56:24 PM"
##
## [[1]]$Operator
## [1] "lumos"
##
## [[1]]$`Number of instruments`
## [1] 2
##
## [[1]]$Description
## [1] ""
##
## [[1]]$`Instrument model`
## [1] "Orbitrap Fusion Lumos"
##
## [[1]]$`Instrument name`
## [1] "Orbitrap Fusion Lumos"
##
## [[1]]$`Instrument method`
## [1] "C:/Xcalibur/methods/p3181/methMelt_GT20_RT35.meth"
##
## [[1]]$`Serial number`
## [1] "FSN20583"
##
## [[1]]$`Software version`
## [1] "3.1.2412.25"
##
## [[1]]$`Firmware version`
## [1] ""
##
## [[1]]$Units
## [1] "None"
##
## [[1]]$`Mass resolution`
## [1] "0.500"
##
## [[1]]$`Number of scans`
## [1] 8742
##
## [[1]]$`Number of ms2 scans`
## [1] 2346
##
## [[1]]$`Scan range`
## [1] 1 8742
##
## [[1]]$`Time range`
## [1] 0.00 35.01
##
## [[1]]$`Mass range`
## [1] 91 2000
##
## [[1]]$`Scan filter (first scan)`
## [1] "FTMS + c ESI Full ms [350.0000-2000.0000]"
##
## [[1]]$`Scan filter (last scan)`
## [1] "FTMS + c ESI Full ms [350.0000-2000.0000]"
##
## [[1]]$`Total number of filters`
## [1] "2205"
##
## [[1]]$`Sample name`
## [1] ""
##
## [[1]]$`Sample id`
## [1] "1:A,5"
##
## [[1]]$`Sample type`
## [1] "Unknown"
##
## [[1]]$`Sample comment`
## [1] ""
##
## [[1]]$`Sample vial`
## [1] "1:B,1"
##
## [[1]]$`Sample volume`
## [1] "0"
##
## [[1]]$`Sample injection volume`
## [1] "2"
##
## [[1]]$`Sample row number`
## [1] "0"
##
## [[1]]$`Sample dilution factor`
## [1] "1"
##
## [[1]]$`Sample barcode`
## [1] ""
##
## [[1]]$`User text 0`
## [1] ""
##
## [[1]]$`User text 1`
## [1] ""
##
## [[1]]$`User text 2`
## [1] ""
##
## [[1]]$`User text 3`
## [1] ""
##
## [[1]]$`User text 4`
## [1] ""
##
##
## [[2]]
## [[2]]$`RAW file`
## [1] "1826043b0ae2f1_4590.raw"
##
## [[2]]$`RAW file version`
## [1] "66"
##
## [[2]]$`Creation date`
## [1] "11/16/2018 9:58:53 AM"
##
## [[2]]$Operator
## [1] "QexactiveHF"
##
## [[2]]$`Number of instruments`
## [1] 2
##
## [[2]]$Description
## [1] ""
##
## [[2]]$`Instrument model`
## [1] "Q Exactive HF Orbitrap"
##
## [[2]]$`Instrument name`
## [1] "Q Exactive HF Orbitrap"
##
## [[2]]$`Instrument method`
## [1] "C:/Xcalibur/methods/__QCloud/current_method/forward trap elute/autoQC01_TRAP_GT20min_RT35min.meth"
##
## [[2]]$`Serial number`
## [1] "Exactive Series slot #2496"
##
## [[2]]$`Software version`
## [1] "2.9-290204/2.9.2.2947"
##
## [[2]]$`Firmware version`
## [1] "rev. 1"
##
## [[2]]$Units
## [1] "None"
##
## [[2]]$`Mass resolution`
## [1] "0.500"
##
## [[2]]$`Number of scans`
## [1] 21881
##
## [[2]]$`Number of ms2 scans`
## [1] 20885
##
## [[2]]$`Scan range`
## [1] 1 21881
##
## [[2]]$`Time range`
## [1] 0 35
##
## [[2]]$`Mass range`
## [1] 100 1805
##
## [[2]]$`Scan filter (first scan)`
## [1] "FTMS + c NSI Full ms [350.0000-1800.0000]"
##
## [[2]]$`Scan filter (last scan)`
## [1] "FTMS + c NSI Full ms2 582.3190@hcd27.00 [100.0000-1205.0000]"
##
## [[2]]$`Total number of filters`
## [1] "22"
##
## [[2]]$`Sample name`
## [1] "autoQC01"
##
## [[2]]$`Sample id`
## [1] "NA"
##
## [[2]]$`Sample type`
## [1] "Unknown"
##
## [[2]]$`Sample comment`
## [1] ""
##
## [[2]]$`Sample vial`
## [1] "1:F,8"
##
## [[2]]$`Sample volume`
## [1] "0"
##
## [[2]]$`Sample injection volume`
## [1] "2"
##
## [[2]]$`Sample row number`
## [1] "0"
##
## [[2]]$`Sample dilution factor`
## [1] "0"
##
## [[2]]$`Sample barcode`
## [1] ""
##
## [[2]]$`User text 0`
## [1] "1000"
##
## [[2]]$`User text 1`
## [1] ""
##
## [[2]]$`User text 2`
## [1] "FGCZ"
##
## [[2]]$`User text 3`
## [1] ""
##
## [[2]]$`User text 4`
## [1] ""
readRaw
- read Orbitrap raw fileread the two instrument raw files by using the rawDiag package.
rawfile |>
lapply(FUN = rawDiag::readRaw) |>
Reduce(f = rbind) -> x
## reading index for 18273831991816_3238.raw...
## determining ElapsedScanTimesec ...
## reading trailer AGC ...
## reading trailer Orbitrap Resolution ...
## reading trailer Ion Injection Time (ms) ...
## reading TIC ...
## reading BasePeakIntensity ...
## reading took 8.295 seconds.
## reading index for 1826043b0ae2f1_4590.raw...
## determining ElapsedScanTimesec ...
## reading trailer LM m/z-Correction (ppm) ...
## reading trailer AGC ...
## reading trailer AGC PS Mode ...
## reading trailer FT Resolution ...
## reading trailer Ion Injection Time (ms) ...
## reading TIC ...
## reading BasePeakIntensity ...
## reading took 21.974 seconds.
#BiocParallel::bplapply(FUN = rawDiag::readRaw) |>
This package provides several plot functions tailored toward MS data. The following list shows all available plot methods.
library(rawDiag)
ls("package:rawDiag") |>
grep(pattern = '^plot', value = TRUE) -> pm
pm |>
knitr::kable(col.names = "package:rawDiag plot functions")
package:rawDiag plot functions |
---|
plotChargeState |
plotCycleLoad |
plotCycleTime |
plotInjectionTime |
plotLockMassCorrection |
plotMassDistribution |
plotMzDistribution |
plotPrecursorHeatmap |
plotScanTime |
plotTicBasepeak |
An inherent problem of visualizing data is the fact that depending on the data at hand, specific visualizations lose their usefulness, e.g., overplotting in a scatter plot if too many data points are present. To address this problem, we implemented most of the plot functions in different versions inspired by the work of Cleveland (1993), Sarkar (2008), and Wickham (2009). The data can be displayed in trellis plot manner using the faceting functionality of ggplot2. Alternatively, overplotting using color coding or violin plots based on descriptive statistics values can be chosen, which allows the user to interactively change the appearance of the plots based on the situation at hand. For instance, a large number of files are best visualized by violin plots, giving the user an idea about the distribution of the data points. On the basis of this, a smaller subset of files can be selected and visualized with another technique.
The code snippet below applies all plot methods on the example data.
pm |>
lapply(FUN = function(plotFUN) {
lapply(c('trellis'), function(method) {
message("plotting", plotFUN, "using method", method, "...")
do.call(plotFUN, list(x, method))
})
})
## plottingplotChargeStateusing methodtrellis...
## plottingplotCycleLoadusing methodtrellis...
## plottingplotCycleTimeusing methodtrellis...
## plottingplotInjectionTimeusing methodtrellis...
## plottingplotLockMassCorrectionusing methodtrellis...
## plottingplotMassDistributionusing methodtrellis...
## plottingplotMzDistributionusing methodtrellis...
## plottingplotPrecursorHeatmapusing methodtrellis...
## plottingplotScanTimeusing methodtrellis...
## plottingplotTicBasepeakusing methodtrellis...
## [[1]]
## [[1]][[1]]
##
##
## [[2]]
## [[2]][[1]]
## `geom_smooth()` using formula = 'y ~ x'
##
##
## [[3]]
## [[3]][[1]]
##
##
## [[4]]
## [[4]][[1]]
##
##
## [[5]]
## [[5]][[1]]
## Warning: Removed 6164 rows containing non-finite outside the scale range
## (`stat_smooth()`).
## Warning: Removed 6164 rows containing missing values or values outside the scale range
## (`geom_line()`).
##
##
## [[6]]
## [[6]][[1]]
##
##
## [[7]]
## [[7]][[1]]
##
##
## [[8]]
## [[8]][[1]]
##
##
## [[9]]
## [[9]][[1]]
##
##
## [[10]]
## [[10]][[1]]
The appearance of each plot depends on the instrument, sample, and method used to acquire the data. Therefore, it is hard to say what each ideal plot should look like. In particular, in the example above, we use data generated on an Orbitrap Fusion Lumos, 18273831991816_3238.raw and Q Exactive HF Orbitrap, 1826043b0ae2f1_4590.raw instrument using data-independent acquisition (DIA) (Bruderer et al. 2017) and data-dependent acquisition (DDA) methods. For more information on the plot methods and its application, please read the package man pages and the application examples in the manuscript (Trachsel et al. 2018).
The package provides a simple interactive shiny-based graphical user interface for exploring Thermo Fisher Scientific raw data.
If you have a directory containing raw files, you can create a shiny application as follows:
rawfile |>
dirname() |>
rawDiag::buildRawDiagShinyApp() -> app
The shiny runApp function launches the app in our browser.
shiny::runApp(app)
By default, the application lets you choose the raw files in the provided directory and provides the visualizations of the raw data as output. The user can interactively change the by the rawDiag the package provided plot functions and arguments.
Additionally, the application provides PDF generation and download buttons. Optionally height and width can be changed in the user interface.
Of note, the rawDiag::rawDiagServer
module can be integrated into an existing
shinydashboard application, e.g., https://shiny-ms.fgcz.uzh.ch/fgczmsqc-dashboard/.
consider all raw files of your working directory, e.g., ~/Downloads
and load them.
file.path(Sys.getenv("HOME"), "Downloads") |>
setwd()
list.files() |>
grep(pattern = '*.raw$', value = TRUE) |>
lapply(FUN = rawDiag::readRaw) |>
Reduce(f = rbind) -> x
as alternative to lapply
you can utilize the
BiocParallel package bplapply
function.
sessionInfo()
## R Under development (unstable) (2024-10-21 r87258)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.21-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] rawDiag_1.3.0 tartare_1.19.0 ExperimentHub_2.15.0
## [4] AnnotationHub_3.15.0 BiocFileCache_2.15.0 dbplyr_2.5.0
## [7] BiocGenerics_0.53.0 BiocStyle_2.35.0
##
## loaded via a namespace (and not attached):
## [1] tidyselect_1.2.1 dplyr_1.1.4 farver_2.1.2
## [4] blob_1.2.4 filelock_1.0.3 Biostrings_2.75.0
## [7] fastmap_1.2.0 promises_1.3.0 digest_0.6.37
## [10] mime_0.12 lifecycle_1.0.4 KEGGREST_1.47.0
## [13] RSQLite_2.3.7 magrittr_2.0.3 compiler_4.5.0
## [16] rlang_1.1.4 sass_0.4.9 tools_4.5.0
## [19] utf8_1.2.4 yaml_2.3.10 knitr_1.48
## [22] labeling_0.4.3 bit_4.5.0 curl_5.2.3
## [25] plyr_1.8.9 BiocParallel_1.41.0 withr_3.0.2
## [28] purrr_1.0.2 grid_4.5.0 stats4_4.5.0
## [31] fansi_1.0.6 xtable_1.8-4 colorspace_2.1-1
## [34] ggplot2_3.5.1 scales_1.3.0 tinytex_0.53
## [37] cli_3.6.3 rmarkdown_2.28 crayon_1.5.3
## [40] generics_0.1.3 httr_1.4.7 reshape2_1.4.4
## [43] rawrr_1.15.0 DBI_1.2.3 cachem_1.1.0
## [46] stringr_1.5.1 zlibbioc_1.53.0 splines_4.5.0
## [49] parallel_4.5.0 AnnotationDbi_1.69.0 BiocManager_1.30.25
## [52] XVector_0.47.0 vctrs_0.6.5 Matrix_1.7-1
## [55] jsonlite_1.8.9 bookdown_0.41 IRanges_2.41.0
## [58] S4Vectors_0.45.0 bit64_4.5.2 magick_2.8.5
## [61] fontawesome_0.5.2 jquerylib_0.1.4 hexbin_1.28.4
## [64] glue_1.8.0 codetools_0.2-20 stringi_1.8.4
## [67] gtable_0.3.6 BiocVersion_3.21.1 later_1.3.2
## [70] GenomeInfoDb_1.43.0 UCSC.utils_1.3.0 munsell_0.5.1
## [73] tibble_3.2.1 pillar_1.9.0 rappdirs_0.3.3
## [76] htmltools_0.5.8.1 GenomeInfoDbData_1.2.13 R6_2.5.1
## [79] evaluate_1.0.1 shiny_1.9.1 lattice_0.22-6
## [82] Biobase_2.67.0 highr_0.11 png_0.1-8
## [85] memoise_2.0.1 httpuv_1.6.15 bslib_0.8.0
## [88] Rcpp_1.0.13 nlme_3.1-166 mgcv_1.9-1
## [91] xfun_0.48 pkgconfig_2.0.3