A Curated ChIP-Seq Dataset of MDI-induced Differentiated Adipocytes (3T3-L1)
In this document, we introduce the purpose of
curatedAdipoChIP
package, its contents and its potential
use cases. This package is a curated dataset of ChIP-Seq samples. The
samples are MDI-induced pre-adipocytes (3T3-L1) at different time
points/stage of differentiation. The ChIP-antibodies used in this
dataset are of transcription factors, chromatin remodelers and histone
modifications. The package document the data collection, pre-processing
and processing. In addition to the documentation the package contains
the scripts that were used to generate the data in
inst/scripts
and access to the final
RangedSummarizedExperiment
object through
ExperimentHub
.
curatedAdipoChIP
?It is an R package for documenting and distributing a curated dataset. The package doesn’t contain any R functions.
curatedAdipoChIP
?The package contains two different things:
inst/scripts
RangedSummarizedExperiment
through
ExperimentHub
.curatedAdipoChIP
for?The RangedSummarizedExperiment
object contains the
peak_counts
, colData
, rowRanges
and metadata
which can be used for the purposes of
conducting differential peak binding or gene set enrichment analysis on
the cell line model.
The curatedAdipoChIP
package can be installed from
Bioconductor using BiocManager
.
The pre-processing and processing of the data setup environment is
available as a docker
image. This image is also suitable
for reproducing this document. The docker
image can be
obtained using the docker
CLI client.
$ docker pull bcmslab/adiporeg_chip:latest
curatedAdipoChIP
The term “3T3-L1” was used to search the NCBI SRA repository. The results were sent to the run selector. 1,176 runs were viewed. The runs were faceted by Assay Type and the “chip-seq” which resulted in 739 runs. Only 235 samples from 20 different studies were included after being manually reviewed to fit the following criteria: * The raw data is available from GEO and has a GEO identifier (GSM#) * The raw data is linked to a published publicly available article * The protocols for generating the data sufficiently describe the origin of the cell line, the differentiation medium and the time points when the samples were collected. * In case the experimental designs included treatment other than the differentiation medias, the control (non-treated) samples were included.
Note: The data quality and the platform discrepancies are not included in these criteria
The scripts to download and process the raw data are located in
inst/scripts/
and are glued together to run sequentially by
the GNU make file Makefile
. The following is basically a
description of the recipes in the Makefile
with emphasis on
the software versions, options, inputs and outputs.
download_fastq
wget
(1.18)run.csv
, the URLs column*.fastq.gz
-N
make_index
bowtie2-build
(2.3.0)*.bt2
bowtie2 index for the mouse genomeget_annotation
wget
(1.18)annotation.gtf
-N
align_reads
bowtie2
(2.3.0)*.fastq.gz
and mm10/
bowtie2 index
for the mouse genome*.sam
--no-unal
peak_calling
macs2
(2.1.2)*.bam
andpeaks.bed
-B --nomodel --SPMR
get_peakset
bedtools
(2.26.0)*.bed
peakset.bed
-c 4,4 -o count_distinct,distinct
count_features
bedtools multicov
(2.26.0)*.bam
and peakset.bed
peakset.txt
fastqc
fastqc
(0.11.5)*.fastq.gz
and *.sam
*_fastqc.zip
The aim of this step is to construct a self-contained object with minimal manipulations of the pre-processed data followed by a simple exploration of the data in the next section.
peak_object
The required steps to make this object from the pre-processed data
are documented in the script and are supposed to be fully reproducible
when run through this package. The output is a
RangedSummarizedExperiment
object containing the peak
counts and the phenotype and features data and metadata.
The RangedSummarizedExperiment
contains * The gene
counts matrix peak_counts
* The phenotype data
colData
. The column name
links each peak to
one or more samples * The feature data rowRanges
* The
metadata metadata
which contain a data.frame
of the studies from which the samples were collected.
peak_counts
objectIn this section, we conduct a simple exploration of the data objects to show the content of the package and how they can be loaded and used.
# loading required libraries
library(ExperimentHub)
library(SummarizedExperiment)
library(S4Vectors)
library(fastqcr)
library(DESeq2)
library(dplyr)
library(tidyr)
library(ggplot2)
# query package resources on ExperimentHub
eh <- ExperimentHub()
query(eh, "curatedAdipoChIP")
#> ExperimentHub with 5 records
#> # snapshotDate(): 2024-10-24
#> # $dataprovider: GEO, SRA
#> # $species: Mus musculus
#> # $rdataclass: SummarizedExperiment, RangedSummarizedExperiment
#> # additional mcols(): taxonomyid, genome, description,
#> # coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
#> # rdatapath, sourceurl, sourcetype
#> # retrieve records with, e.g., 'object[["EH2258"]]'
#>
#> title
#> EH2258 | A Curated ChIP-Seq Dataset of MDI-induced Differentiated Adipocy...
#> EH3286 | A Curated Microarrays Dataset of MDI-induced Differentiated Adi...
#> EH3287 | A Curated Microarrays Dataset (processed) of MDI-induced Differe...
#> EH3288 | A Curated Microarrays Dataset of MDI-induced Differentiated Adi...
#> EH3289 | A Curated Microarrays Dataset (processed) of MDI-induced Differe...
# load data from ExperimentHub
peak_counts <- query(eh, "curatedAdipoChIP")[[1]]
#> see ?curatedAdipoChIP and browseVignettes('curatedAdipoChIP') for documentation
#> loading from cache
# print object
peak_counts
#> class: RangedSummarizedExperiment
#> dim: 837030 207
#> metadata(1): studies
#> assays(1): peak_counts
#> rownames(837030): peak_1 peak_2 ... peak_837855 peak_837856
#> rowData names(10): peak name ... geneId distanceToTSS
#> colnames(207): GSM1017630 GSM1017631 ... GSM838021 GSM838022
#> colData names(11): id runs ... control_type qc
The count matrix can be accessed using assay
. Here we
show the first five entries of the first five samples.
# print count matrix
assay(peak_counts)[1:5, 1:5]
#> GSM1017630 GSM1017631 GSM1017632 GSM1017633 GSM1017634
#> peak_1 0 0 7 2 3
#> peak_2 0 10 28 24 5
#> peak_3 0 1 9 13 2
#> peak_4 4 9 9 20 6
#> peak_5 1 1 14 10 2
The phenotype/samples data is a data.frame
, It can be
accessed using colData
. The time
and
stage
columns encode the time point in hours and stage of
differentiation respectively. The column factor
records the
ChIP antibody used in the sample.
# names of the coldata object
names(colData(peak_counts))
#> [1] "id" "runs" "study" "pmid" "factor"
#> [6] "time" "stage" "bibtexkey" "control_id" "control_type"
#> [11] "qc"
# table of times column
table(colData(peak_counts)$time)
#>
#> -48 0 2 4 6 24 48 72 96 144 168
#> 2 46 1 43 1 4 21 3 9 7 22
# table of stage column
table(colData(peak_counts)$stage)
#>
#> 0 1 2 3
#> 67 73 9 40
# table of factor column
table(colData(peak_counts)$factor)
#>
#> ATF2 ATF7 CEBPA CEBPB CEBPD CREB1 CTCF E2F4
#> 2 1 2 9 2 3 10 1
#> EP300 FOSL2 GPS2 H3K27ac H3K27me3 H3K36me3 H3K4me1 H3K4me2
#> 6 2 2 6 6 3 7 5
#> H3K4me3 H3K79me2 H3K79me3 H3K9me2 H3K9me3 H4K20me1 HDAC2 HDAC3
#> 12 1 1 1 4 1 4 4
#> JUN JUND KDM1A KDM5A KDM5C KLF4 KLF5 KMT2B
#> 2 1 2 6 2 1 1 1
#> MBD1 MED1 NCOR1 NR3C1 NRF1 PBX1 POLR2A PPARG
#> 2 14 5 1 1 1 12 13
#> PSMB1 RXRG SETDB1 SMARCA4 SMC1A STAT1 STAT5A TCF7
#> 1 8 3 1 10 1 1 1
#> Ubiquitin VDR
#> 2 1
Other columns in colData
are selected information about
the samples/runs or identifiers to different databases. The following
table provides the description of each of these columns.
col_name | description |
---|---|
id | The GEO sample identifier. |
runs | A list of tibbles with information about individual run files. |
study | The SRA study identifier. |
pmid | The PubMed ID of the article where the data were published originally. |
factor | The name of the ChIP antibody that was used in the sample. |
time | The time point of the sample when collected in hours. The time is recorded from the beginning of the protocol as 0 hours. |
stage | The stage of differentiation of the sample when collected. Possible values are 0 to 3; 0 for non-differentiated; 1 for differentiating; and 2/3 for maturating samples. |
bibtexkey | The key of the study where the data were published originally. This maps to the studies object of the metadata which records the study information in bibtex format. |
control_id | The GEO sample identifier of the control (input) sample. |
control_type | The method for assigning the control samples. Possible values are “provided” when the same study has control sample/s or “other” when control samples were selected from other studies. |
library_layout | The type of RNA library. Possible values are SINGLE for single-end and PAIRED for paired-end runs. |
instrument_model | The name of the sequencing machine that was used to obtain the sequence reads. |
qc | The quality control output of fastqc on the separate files/runs. |
The features data are a GRanges
object and can be
accessed using rowRanges
.
# print GRanges object
rowRanges(peak_counts)
#> GRanges object with 837030 ranges and 10 metadata columns:
#> seqnames ranges strand |
#> <Rle> <IRanges> <Rle> |
#> peak_1 chr1 3029191-3029410 * |
#> peak_2 chr1 3031394-3031660 * |
#> peak_3 chr1 3037771-3037971 * |
#> peak_4 chr1 3049860-3050379 * |
#> peak_5 chr1 3051655-3051881 * |
#> ... ... ... ... .
#> peak_837852 chrY 90830107-90830335 * |
#> peak_837853 chrY 90830368-90830710 * |
#> peak_837854 chrY 90834938-90835190 * |
#> peak_837855 chrY 90835545-90836986 * |
#> peak_837856 chrY 90839338-90840071 * |
#> peak name
#> <CharacterList> <character>
#> peak_1 GSM2515964,GSM2515966 peak_1
#> peak_2 GSM1370450,GSM1893625,GSM1893626,... peak_2
#> peak_3 GSM2515948 peak_3
#> peak_4 GSM1412513,GSM2515948,GSM2515949,... peak_4
#> peak_5 GSM2515966 peak_5
#> ... ... ...
#> peak_837852 GSM340808 peak_837852
#> peak_837853 GSM340808 peak_837853
#> peak_837854 GSM686970 peak_837854
#> peak_837855 GSM1368009,GSM1368013,GSM1368014,... peak_837855
#> peak_837856 GSM1370470,GSM1412512,GSM2233369,... peak_837856
#> annotation geneChr geneStart geneEnd geneLength
#> <character> <integer> <integer> <integer> <integer>
#> peak_1 Distal Intergenic 1 3214482 3671498 457017
#> peak_2 Distal Intergenic 1 3214482 3671498 457017
#> peak_3 Distal Intergenic 1 3214482 3671498 457017
#> peak_4 Distal Intergenic 1 3214482 3671498 457017
#> peak_5 Distal Intergenic 1 3214482 3671498 457017
#> ... ... ... ... ... ...
#> peak_837852 Distal Intergenic 21 90785442 90816465 31024
#> peak_837853 Distal Intergenic 21 90785442 90816465 31024
#> peak_837854 Distal Intergenic 21 90785442 90816465 31024
#> peak_837855 Distal Intergenic 21 90785442 90816465 31024
#> peak_837856 Distal Intergenic 21 90785442 90816465 31024
#> geneStrand geneId distanceToTSS
#> <integer> <character> <numeric>
#> peak_1 2 Xkr4 642088
#> peak_2 2 Xkr4 639838
#> peak_3 2 Xkr4 633527
#> peak_4 2 Xkr4 621119
#> peak_5 2 Xkr4 619617
#> ... ... ... ...
#> peak_837852 1 Erdr1 44665
#> peak_837853 1 Erdr1 44926
#> peak_837854 1 Erdr1 49496
#> peak_837855 1 Erdr1 50103
#> peak_837856 1 Erdr1 53896
#> -------
#> seqinfo: 64 sequences from an unspecified genome; no seqlengths
Notice there are two types of data in this object. The first is the
coordinates of the identified peaks ranges(peak_counts)
.
The second is the annotation of the these regions
mcols(peak_counts)
. The following table show the
description of the second annotation item. Except for the first three
columns, all annotations were obtained using
ChIPSeeker::annotatePeak
as described in the
inst/scripts
.
col_name | description |
---|---|
peak | The GEO sample identifier of the samples containing this peak in a CharacterList. |
name | The name of the peak in the set of unique merged peaks. |
annotation | genomic feature of the peak, for instance if the peak is located in 5’UTR, it will annotated by 5’UTR. Possible annotation is Promoter-TSS, Exon, 5’ UTR, 3’ UTR, Intron, and Intergenic. |
geneChr | Chromosome of the nearest gene |
geneStart | Gene start |
geneEnd | Gene end |
geneLength | Gene length |
geneStrand | Gene Strand |
geneId | Gene symbol |
distanceToTSS | Distance from peak to gene TSS |
The metadata is a list of one object. studies
is a
data.frame
containing the bibliography information of the
studies from which the data were collected. Here we show the first entry
in the studies
object.
# print data of first study
metadata(peak_counts)$studies[1,]
#> # A tibble: 1 × 34
#> CATEGORY BIBTEXKEY ADDRESS ANNOTE AUTHOR BOOKTITLE CHAPTER CROSSREF EDITION
#> <chr> <chr> <chr> <chr> <list> <chr> <chr> <chr> <chr>
#> 1 ARTICLE Brier2017 <NA> <NA> <chr> <NA> <NA> <NA> <NA>
#> # ℹ 25 more variables: EDITOR <list>, HOWPUBLISHED <chr>, INSTITUTION <chr>,
#> # JOURNAL <chr>, KEY <chr>, MONTH <chr>, NOTE <chr>, NUMBER <chr>,
#> # ORGANIZATION <chr>, PAGES <chr>, PUBLISHER <chr>, SCHOOL <chr>,
#> # SERIES <chr>, TITLE <chr>, TYPE <chr>, VOLUME <chr>, YEAR <dbl>,
#> # ARCHIVEPREFIX <chr>, ARXIVID <chr>, DOI <chr>, EPRINT <chr>, ISBN <chr>,
#> # ISSN <chr>, PMID <chr>, URL <chr>
GEO series ID | PubMed ID | Num. of Samples | Time (hr) | Differentiation Stage | Factor |
---|---|---|---|---|---|
SRP000630 | 18981474 | 18 | 0/ 24/ 48/ 72/ 96/ 144 | 0/1/2/3 | PPARG/ RXRG/ POLR2A |
SRP002283 | 20442865 | 1 | NA | 3 | E2F4 |
SRP002337 | 20887899 | 15 | -48/ 0/ 48/ 168 | 0/1/3 | H3K4me3/ H3K27me3/ H3K36me3/ H3K4me2/ H3K4me1/ H3K27ac/ PPARG |
SRP002507 | 20478996 | 2 | 0/ 6 | 0/1 | CEBPB |
SRP006001 | 21427703 | 13 | 0/ 2/ 4/ 48/ 144 | 0/1/3 | CEBPB/ CEBPD/ NR3C1/ STAT5A/ RXRG/ PPARG/ POLR2A |
SRP008061 | 21914845 | 1 | 24 | 1 | TCF7 |
SRP009613 | 24315104 | 6 | NA | 0 | PPARG/ JUN/ CREB1/ PSMB1/ Ubiquitin |
SRP016054 | 23178591 | 4 | NA/ 168 | 0/3 | H3K4me3/ H3K27me3/ H3K9me2/ H3K9me3 |
SRP028367 | 23885096 | 7 | 168 | 3 | PPARG/ MED1/ CEBPA/ POLR2A/ CREB1 |
SRP029985 | 24912735 | 3 | 0/ 168 | 0/3 | KDM1A/ NRF1 |
SRP041129 | 24788520 | 10 | NA | 3 | MED1/ CREB1/ EP300/ NCOR1/ CEBPA/ CEBPB/ ATF2/ JUND/ FOSL2 |
SRP041249 | 24857652 | 19 | 4 | 1 | ATF2/ ATF7/ JUN/ FOSL2/ KLF4/ KLF5/ PBX1/ STAT1/ VDR/ RXRG/ MED1/ EP300/ SMARCA4/ H3K27ac/ H3K4me1/ H3K4me2 |
SRP042079 | 24953653 | 2 | 0 | 0 | GPS2 |
SRP043216 | 25503565 | 8 | NA | 0 | H3K27ac/ H3K27me3/ H3K36me3/ H3K4me1/ H3K4me3/ H3K79me2/ H3K79me3/ H4K20me1 |
SRP064188 | 26590716 | 11 | 0/ 144 | 0/3 | H3K27me3/ H3K9me3/ SETDB1/ MBD1/ POLR2A |
SRP065028 | 28398509 | 1 | 168 | 3 | KMT2B |
SRP078506 | 27899593 | 14 | 0/ 4/ 48 | 0/1 | H3K4me3/ KDM5A/ KDM5C |
SRP080809 | 28107648 | 2 | NA | 0 | CEBPB |
SRP100871 | 28475875 | 52 | 4/ 0/ 48/ 96/ 168 | 1/0/2/3 | CTCF/ H3K27ac/ H3K4me1/ H3K4me2/ HDAC2/ HDAC3/ MED1/ NCOR1/ EP300/ SMC1A |
Some of the samples included in this object are control samples.
# show the number of control samples
table(is.na(peak_counts$control_id))
#>
#> FALSE TRUE
#> 189 18
Also, some entries are recorded as NA
such as in the
column time
when the the data are missing or couldn’t be
determined from the metadata.
curatedAdipoChIP
All the samples in this dataset come from the 3T3-L1 cell line. The MDI
induction media, were used to induce adipocyte differentiation. The two
important variables in the dataset are time
and
stage
, which correspond to the time point and stage of
differentiation when the sample were captured. Ideally, this dataset
should be treated as a time course. However, for the purposes of this
example, we only used samples from two differentiation stages 0 and 1
for the transcription factor (CEBPB) and treated them
as independent groups. The goal of this example is to show how a typical
differential binding analysis can be applied in the dataset. The main
focus is to explain how the the data and metadata in
peak_counts
fit in each main piece of the analysis. We
started by sub-setting the object to the samples of interest, filtering
the low quality samples and low count genes. Then we applied the
DESeq2
method with the default values.
First, we subset the peak_counts
object to all samples
in the differentiation stage 0 or 1 for the transcription factor CEBPB.
Using the samples IDs we get chose only the features/peaks that was
found in at least one of samples. Notice that other criteria for
choosing peaks can be applied depending on the purpose of the analysis.
The total number of samples is 8 with 4 samples in each stage. The total
number of peaks is 49028.
# subset peaks_count object
# select samples for the factor CEBPB and stage 0 and 1
sample_ind <- (peak_counts$factor == 'CEBPB') & (peak_counts$stage %in% c(0, 1))
sample_ids <- colnames(peak_counts)[sample_ind]
# select peaks from the selected samples
peak_ind <- lapply(mcols(peak_counts)$peak, function(x) sum(sample_ids %in% x))
peak_ind <- unlist(peak_ind) > 2
# subset the object
se <- peak_counts[peak_ind, sample_ind]
# show the number of samples in each group
table(se$stage)
#>
#> 0 1
#> 4 4
# show the number of peak in each sample
table(unlist(mcols(se)$peak))[sample_ids]
#>
#> GSM2257705 GSM2257706 GSM544720 GSM544721 GSM686970 GSM686971 GSM686972
#> 749 677 1806 14251 14074 15381 7653
#> GSM686973
#> 3367
Since the quality metrics are reported per run file, we need to get
the SSR* id for each of the samples. Notice that, some samples would
have more than one file. In this case because some of the samples are
paired-end, so each of them would have two files SRR\*_1
and SRR\*_2
. This kind of information is available in
colData
.
# check the number of files in qc
qc <- se$qc
table(lengths(qc))
#>
#> 1 2
#> 6 2
# flattening qc list
qc <- unlist(qc, recursive = FALSE)
length(qc)
#> [1] 10
The qc
object of the metadata contains the output of fastqc
in a qc_read
class. More information on this object can be
accessed by calling ?fastqcr::qc_read
. Here, we only use
the per_base_sequence_quality
to filter out low quality
samples. This is by no means enough quality control but it should drive
the point home.
# extracting per_base_sequence_quality
per_base <- lapply(qc, function(x) {
df <- x[['per_base_sequence_quality']]
df %>%
dplyr::select(Base, Mean) %>%
transform(Base = strsplit(as.character(Base), '-')) %>%
unnest(Base) %>%
mutate(Base = as.numeric(Base))
}) %>%
bind_rows(.id = 'run')
After tidying the data, we get a data.frame
with three
columns; run
, Mean
and Base
for
the run ID, the mean quality score and the base number in each read. fastqc
provide thorough documentation of this quality control module and
others.
# a quick look at quality scores
summary(per_base)
#> run Base Mean
#> Length:362 Min. : 1.0 Min. :28.24
#> Class :character 1st Qu.:10.0 1st Qu.:31.90
#> Mode :character Median :19.0 Median :34.80
#> Mean :18.6 Mean :33.96
#> 3rd Qu.:28.0 3rd Qu.:35.57
#> Max. :37.0 Max. :36.50
To identify the low quality samples, we categorize the runs
run_average
which correspond to the average of the per base
mean scores. The following figure should make it easier to see why this
cutoff were used in this case.
# find low quality runs
per_base <- per_base %>%
group_by(run) %>%
mutate(run_average = mean(Mean) > 34)
# plot average per base quality
per_base %>%
ggplot(aes(x = Base, y = Mean, group = run, color = run_average)) +
geom_line() +
lims(y = c(0,40))
The run IDs of the “bad” samples is then used to remove them from the dataset.
# get run ids of low quality samples
bad_runs <- unique(per_base$run[per_base$run_average == FALSE])
bad_samples <- lapply(se$runs, function(x) sum(x$run %in% bad_runs))
# subset the counts object
se2 <- se[, unlist(bad_samples) == 0]
# show remaining samples by stage
table(se2$stage)
#>
#> 0 1
#> 3 3
To identify the low count feature/peaks, we keep only the features with at least 10 reads in 2 or more samples. Then we subset the object to exclude the rest.
One expects that samples from the same group should reflect similar biology. In fact, this is also expected for the differential binding analysis to be reliable. In the context of ChIP-Seq, several measures were proposed to quantify the similarities and discrepancies between the samples, check this article for more. Here, we use scatter plots of the pairs of samples in each group to show that there is a general agreement in terms of the counts in the peakset.
# plot scatters of samples from each group
par(mfrow = c(2,3))
lapply(split(colnames(se3), se3$stage), function(x) {
# get counts of three samples
y1 <- assay(se3)[, x[1]]
y2 <- assay(se3)[, x[2]]
y3 <- assay(se3)[, x[3]]
# plot scatters of pairs of samples
plot(log10(y1 + 1), log10(y2 + 1), xlab = x[1], ylab = x[2])
plot(log10(y1 + 1), log10(y3 + 1), xlab = x[1], ylab = x[3])
plot(log10(y2 + 1), log10(y3 + 1), xlab = x[2], ylab = x[3])
})
#> $`0`
#> NULL
#>
#> $`1`
#> NULL
DESeq2
DESeq2
is a well documented and widely used R package for the differential
binding analysis of ChIP-Seq data. Here we use the default values of
DESeq
to find the peaks which are deferentially bound in
stage 1 compared to 0.
# differential binding analysis
colData(se3) <- colData(se3)[, -2]
se3$stage <- factor(se3$stage)
dds <- DESeqDataSet(se3, ~stage)
#> renaming the first element in assays to 'counts'
#> converting counts to integer mode
dds <- DESeq(dds)
#> estimating size factors
#> estimating dispersions
#> gene-wise dispersion estimates
#> mean-dispersion relationship
#> final dispersion estimates
#> fitting model and testing
res <- results(dds)
table(res$padj < .1)
#>
#> FALSE TRUE
#> 13297 1240
In this example, we didn’t attempt to correct for the between study
factors that might confound the results. To show how is this possible,
we use the PCA
plots with a few of these factors in the following graphs. The first
uses the stage
factor which is the factor of interest in
this case. We see that the DESeq
transformation did a good
job separating the samples to their expected groups. However, it also
seems that the stage
is not the only factor in play. For
example, we show in the second and the third graphs two other factors
library_layout
and instrument_model
which
might explain some of the variance between the samples. This is expected
because the data were collected from different studies using slightly
different protocols and different sequencing machines. Therefore, it is
necessary to account for these differences to obtain reliable results.
There are multiple methods to do that such as Removing Unwanted
Variation (RUV) and Surrogate
Variable Analysis (SVA).
Speaking of studies, as mentioned earlier the studies
object contains full information of the references of the original
studies in which the data were published. Please cite them when using
this dataset.
curatedAdipoChIP
For citing the package use:
# citing the package
citation("curatedAdipoChIP")
#> To cite package 'curatedAdipoChIP' in publications use:
#>
#> Ahmed M (2024). _curatedAdipoChIP: A Curated ChIP-Seq Dataset of
#> MDI-induced Differentiated Adipocytes (3T3-L1)_.
#> doi:10.18129/B9.bioc.curatedAdipoChIP
#> <https://doi.org/10.18129/B9.bioc.curatedAdipoChIP>, R package
#> version 1.21.0, <https://bioconductor.org/packages/curatedAdipoChIP>.
#>
#> A BibTeX entry for LaTeX users is
#>
#> @Manual{,
#> title = {curatedAdipoChIP: A Curated ChIP-Seq Dataset of MDI-induced Differentiated Adipocytes
#> (3T3-L1)},
#> author = {Mahmoud Ahmed},
#> year = {2024},
#> note = {R package version 1.21.0},
#> url = {https://bioconductor.org/packages/curatedAdipoChIP},
#> doi = {10.18129/B9.bioc.curatedAdipoChIP},
#> }
devtools::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R Under development (unstable) (2024-10-21 r87258)
#> os Ubuntu 24.04.1 LTS
#> system x86_64, linux-gnu
#> ui X11
#> language (EN)
#> collate C
#> ctype en_US.UTF-8
#> tz America/New_York
#> date 2024-10-27
#> pandoc 3.1.3 @ /usr/bin/ (via rmarkdown)
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date (UTC) lib source
#> abind 1.4-8 2024-09-12 [2] CRAN (R 4.5.0)
#> AnnotationDbi 1.67.0 2024-10-25 [2] Bioconductor 3.21 (R 4.5.0)
#> AnnotationHub * 3.13.3 2024-10-25 [2] Bioconductor 3.21 (R 4.5.0)
#> Biobase * 2.65.1 2024-10-25 [2] Bioconductor 3.21 (R 4.5.0)
#> BiocFileCache * 2.13.2 2024-10-25 [2] Bioconductor 3.21 (R 4.5.0)
#> BiocGenerics * 0.51.3 2024-10-25 [2] Bioconductor 3.21 (R 4.5.0)
#> BiocManager 1.30.25 2024-08-28 [2] CRAN (R 4.5.0)
#> BiocParallel 1.39.0 2024-10-25 [2] Bioconductor 3.21 (R 4.5.0)
#> BiocVersion 3.20.0 2024-10-25 [2] Bioconductor 3.21 (R 4.5.0)
#> Biostrings 2.73.2 2024-10-25 [2] Bioconductor 3.21 (R 4.5.0)
#> bit 4.5.0 2024-09-20 [2] CRAN (R 4.5.0)
#> bit64 4.5.2 2024-09-22 [2] CRAN (R 4.5.0)
#> blob 1.2.4 2023-03-17 [2] CRAN (R 4.5.0)
#> bslib 0.8.0 2024-07-29 [2] CRAN (R 4.5.0)
#> cachem 1.1.0 2024-05-16 [2] CRAN (R 4.5.0)
#> cli 3.6.3 2024-06-21 [2] CRAN (R 4.5.0)
#> codetools 0.2-20 2024-03-31 [3] CRAN (R 4.5.0)
#> colorspace 2.1-1 2024-07-26 [2] CRAN (R 4.5.0)
#> crayon 1.5.3 2024-06-20 [2] CRAN (R 4.5.0)
#> curatedAdipoChIP * 1.21.0 2024-10-27 [1] Bioconductor 3.21 (R 4.5.0)
#> curl 5.2.3 2024-09-20 [2] CRAN (R 4.5.0)
#> DBI 1.2.3 2024-06-02 [2] CRAN (R 4.5.0)
#> dbplyr * 2.5.0 2024-03-19 [2] CRAN (R 4.5.0)
#> DelayedArray 0.31.14 2024-10-25 [2] Bioconductor 3.21 (R 4.5.0)
#> DESeq2 * 1.45.3 2024-10-25 [2] Bioconductor 3.21 (R 4.5.0)
#> devtools 2.4.5 2022-10-11 [2] CRAN (R 4.5.0)
#> digest 0.6.37 2024-08-19 [2] CRAN (R 4.5.0)
#> dplyr * 1.1.4 2023-11-17 [2] CRAN (R 4.5.0)
#> ellipsis 0.3.2 2021-04-29 [2] CRAN (R 4.5.0)
#> evaluate 1.0.1 2024-10-10 [2] CRAN (R 4.5.0)
#> ExperimentHub * 2.13.1 2024-10-25 [2] Bioconductor 3.21 (R 4.5.0)
#> fansi 1.0.6 2023-12-08 [2] CRAN (R 4.5.0)
#> farver 2.1.2 2024-05-13 [2] CRAN (R 4.5.0)
#> fastmap 1.2.0 2024-05-15 [2] CRAN (R 4.5.0)
#> fastqcr * 0.1.3 2023-02-18 [2] CRAN (R 4.5.0)
#> filelock 1.0.3 2023-12-11 [2] CRAN (R 4.5.0)
#> fs 1.6.4 2024-04-25 [2] CRAN (R 4.5.0)
#> generics 0.1.3 2022-07-05 [2] CRAN (R 4.5.0)
#> GenomeInfoDb * 1.41.2 2024-10-25 [2] Bioconductor 3.21 (R 4.5.0)
#> GenomeInfoDbData 1.2.13 2024-10-23 [2] Bioconductor
#> GenomicRanges * 1.57.2 2024-10-25 [2] Bioconductor 3.21 (R 4.5.0)
#> ggplot2 * 3.5.1 2024-04-23 [2] CRAN (R 4.5.0)
#> glue 1.8.0 2024-09-30 [2] CRAN (R 4.5.0)
#> gtable 0.3.6 2024-10-25 [2] CRAN (R 4.5.0)
#> highr 0.11 2024-05-26 [2] CRAN (R 4.5.0)
#> htmltools 0.5.8.1 2024-04-04 [2] CRAN (R 4.5.0)
#> htmlwidgets 1.6.4 2023-12-06 [2] CRAN (R 4.5.0)
#> httpuv 1.6.15 2024-03-26 [2] CRAN (R 4.5.0)
#> httr 1.4.7 2023-08-15 [2] CRAN (R 4.5.0)
#> IRanges * 2.39.2 2024-10-25 [2] Bioconductor 3.21 (R 4.5.0)
#> jquerylib 0.1.4 2021-04-26 [2] CRAN (R 4.5.0)
#> jsonlite 1.8.9 2024-09-20 [2] CRAN (R 4.5.0)
#> KEGGREST 1.45.1 2024-10-25 [2] Bioconductor 3.21 (R 4.5.0)
#> knitr 1.48 2024-07-07 [2] CRAN (R 4.5.0)
#> labeling 0.4.3 2023-08-29 [2] CRAN (R 4.5.0)
#> later 1.3.2 2023-12-06 [2] CRAN (R 4.5.0)
#> lattice 0.22-6 2024-03-20 [3] CRAN (R 4.5.0)
#> lifecycle 1.0.4 2023-11-07 [2] CRAN (R 4.5.0)
#> locfit 1.5-9.10 2024-06-24 [2] CRAN (R 4.5.0)
#> magrittr 2.0.3 2022-03-30 [2] CRAN (R 4.5.0)
#> Matrix 1.7-1 2024-10-18 [3] CRAN (R 4.5.0)
#> MatrixGenerics * 1.17.1 2024-10-25 [2] Bioconductor 3.21 (R 4.5.0)
#> matrixStats * 1.4.1 2024-09-08 [2] CRAN (R 4.5.0)
#> memoise 2.0.1 2021-11-26 [2] CRAN (R 4.5.0)
#> mime 0.12 2021-09-28 [2] CRAN (R 4.5.0)
#> miniUI 0.1.1.1 2018-05-18 [2] CRAN (R 4.5.0)
#> munsell 0.5.1 2024-04-01 [2] CRAN (R 4.5.0)
#> pillar 1.9.0 2023-03-22 [2] CRAN (R 4.5.0)
#> pkgbuild 1.4.4 2024-03-17 [2] CRAN (R 4.5.0)
#> pkgconfig 2.0.3 2019-09-22 [2] CRAN (R 4.5.0)
#> pkgload 1.4.0 2024-06-28 [2] CRAN (R 4.5.0)
#> png 0.1-8 2022-11-29 [2] CRAN (R 4.5.0)
#> profvis 0.4.0 2024-09-20 [2] CRAN (R 4.5.0)
#> promises 1.3.0 2024-04-05 [2] CRAN (R 4.5.0)
#> purrr 1.0.2 2023-08-10 [2] CRAN (R 4.5.0)
#> R6 2.5.1 2021-08-19 [2] CRAN (R 4.5.0)
#> rappdirs 0.3.3 2021-01-31 [2] CRAN (R 4.5.0)
#> Rcpp 1.0.13 2024-07-17 [2] CRAN (R 4.5.0)
#> remotes 2.5.0 2024-03-17 [2] CRAN (R 4.5.0)
#> rlang 1.1.4 2024-06-04 [2] CRAN (R 4.5.0)
#> rmarkdown 2.28 2024-08-17 [2] CRAN (R 4.5.0)
#> RSQLite 2.3.7 2024-05-27 [2] CRAN (R 4.5.0)
#> S4Arrays 1.5.11 2024-10-25 [2] Bioconductor 3.21 (R 4.5.0)
#> S4Vectors * 0.43.2 2024-10-25 [2] Bioconductor 3.21 (R 4.5.0)
#> sass 0.4.9 2024-03-15 [2] CRAN (R 4.5.0)
#> scales 1.3.0 2023-11-28 [2] CRAN (R 4.5.0)
#> sessioninfo 1.2.2 2021-12-06 [2] CRAN (R 4.5.0)
#> shiny 1.9.1 2024-08-01 [2] CRAN (R 4.5.0)
#> SparseArray 1.5.45 2024-10-25 [2] Bioconductor 3.21 (R 4.5.0)
#> SummarizedExperiment * 1.35.5 2024-10-25 [2] Bioconductor 3.21 (R 4.5.0)
#> tibble 3.2.1 2023-03-20 [2] CRAN (R 4.5.0)
#> tidyr * 1.3.1 2024-01-24 [2] CRAN (R 4.5.0)
#> tidyselect 1.2.1 2024-03-11 [2] CRAN (R 4.5.0)
#> UCSC.utils 1.1.0 2024-10-25 [2] Bioconductor 3.21 (R 4.5.0)
#> urlchecker 1.0.1 2021-11-30 [2] CRAN (R 4.5.0)
#> usethis 3.0.0 2024-07-29 [2] CRAN (R 4.5.0)
#> utf8 1.2.4 2023-10-22 [2] CRAN (R 4.5.0)
#> vctrs 0.6.5 2023-12-01 [2] CRAN (R 4.5.0)
#> withr 3.0.1 2024-07-31 [2] CRAN (R 4.5.0)
#> xfun 0.48 2024-10-03 [2] CRAN (R 4.5.0)
#> xtable 1.8-4 2019-04-21 [2] CRAN (R 4.5.0)
#> XVector 0.45.0 2024-10-25 [2] Bioconductor 3.21 (R 4.5.0)
#> yaml 2.3.10 2024-07-26 [2] CRAN (R 4.5.0)
#> zlibbioc 1.51.2 2024-10-25 [2] Bioconductor 3.21 (R 4.5.0)
#>
#> [1] /tmp/RtmptBYSdY/Rinst1eba505f565371
#> [2] /home/biocbuild/bbs-3.21-bioc/R/site-library
#> [3] /home/biocbuild/bbs-3.21-bioc/R/library
#>
#> ──────────────────────────────────────────────────────────────────────────────