AnnotationForge 1.49.0
This document describes how to create a Bioconductor probe package from the reporter sequence information of a particular chip. Probe packages are a convenient way for distributing and storing the probe sequences and related information.
First, let us load the AnnotationForge package.
library("AnnotationForge")
In this section we see how a probe package can be created for Affymetrix
genechips from the tabulator-separated sequence files that can be obtained from
the vendor (at https://www.thermofisher.com/us/en/home/support.html). As an
example, the file HG-U95Av2_probe_tab.gz
is provided in the extdata
subdirectory of the AnnotationForge package.
filename <- system.file("extdata", "HG-U95Av2_probe_tab.gz",
package="AnnotationForge")
outdir <- tempdir()
me <- "Wolfgang Huber <w.huber@dkfz.de>"
species <- "Homo_sapiens"
makeProbePackage("HG-U95Av2",
datafile = gzfile(filename, open="r"),
outdir = outdir,
maintainer = me,
species = species,
version = "0.0.1")
## Importing the data.
## Creating package in /tmp/RtmpjinqI8/hgu95av2probe
## Writing the data.
## Checking the package.
## *** WARNINGS ***
## * checking data for ASCII and uncompressed saves ... WARNING Status: 1 WARNING, 1 NOTEBuilding the package.
## [1] "hgu95av2probe"
dir(outdir)
## [1] "BiocStyle" "hgu95av2probe"
To deal with different file formats and additional types of probe annotation
data from public or in-house databases, the function makeProbePackage
offers a
great deal of flexibility. The user can specify her own import function through
the importfun
argument. By default, its value is getProbeDataAffy
, a
function that reads tabular Affymetrix genechip sequence files. Import functions
for other types of arrays can be adapted from this prototype.
The help pages and R code contained in the produced packages are derived from a
template directory that obeys the usual R package conventions
(1999). The input parameters of an import function are A prototype
for such a directory is provided within the package
AnnotationForge. To facilitate the automated production of large
numbers of similar packages, we provide a text substitution mechanism similar to
the one used in the GNU configure
system.
a character string naming the array type
the input data files
a character string with the directory name of a package template directory,
containing at least a file DESCRIPTION
, a directory man
with a file
package.Rd
, a directory data
, and possible other directories and files,
conforming to the usual R package conventions.
… any sort of further parameters, as necessary.
The output of an import function is a named list with elements
pkgname
: a character string with the package name.
dataEnv
: an environment containing an arbitrary number of data objects;
these make the core of the package. Among these, there should be a data frame
whose name is the value of pkgname
with a column sequence
. This data frame
may have other columns such as the \(x\)- and \(y\)-position of the probes on the
array, identifiers linking a probe to genomic databases, or its length and
relative position within the gene it represents. The objects in the environment
will be saved as individual .rda
files of the same name into the data
directory of the produced package. Documentation needs to be provided for the
columns of the data frame, as well as for the other objects.
symVals
: a named list, containing the symbol-value substitutions that are
used in the text processing. It must at least contain the elements
`ARRAYTYPE`: name of the array
DATASOURCE
: a textual description of how the data were
obtained. It should contain the URL, or the name of the company
/ person.
For more details, please refer to the help files for the functions
makeProbePackage
and getProbeDataAffy
. For an example, refer to the source
code of getProbeDataAffy
.