TCGA.pipe
is a function for easily downloading TCGA data
from GDC using TCGAbiolinks package (Colaprico et
al. 2016) and performing all the analyses in ELMER. For
illustration purpose, we skip the downloading step. The user can use the
getTCGA
function to download TCGA data or use
TCGA.pipe
by including “download” in the analysis
option.
The following command will do distal DNA methylation analysis and predict putative target genes, motif analysis and identify regulatory transcription factors.
TCGA.pipe("LUSC",
wd = "./ELMER.example",
cores = parallel::detectCores()/2,
mode = "unsupervised"
permu.size = 300,
Pe = 0.01,
analysis = c("distal.probes","diffMeth","pair","motif","TF.search"),
diff.dir = "hypo",
rm.chr = paste0("chr",c("X","Y")))
In this new version we added the argument mode
in the
TCGA.pipe
function. This will automatically set the
minSubgroupFrac
to the following values:
Modes available:
unsupervised
:
minSubgroupFrac
= 0.2 in get.diff.meth
)minSubgroupFrac
= 0.4 in get.pairs
and
get.TFs
functions)supervised
:
The unsupervised
mode should be used when want to be
able to detect a specific (possibly unknown) molecular subtype among
tumor; these subtypes often make up only a minority of samples, and 20%
was chosen as a lower bound for the purposes of statistical power. If
you are using pre-defined group labels, such as treated replicates
vs. untreated replicated, use supervised
mode (all
samples),
For more information please read the analysis section of the vignette.
We add in TCGA.pipe
function (download step) the option
to identify mutant samples to perform WT vs Mutant analysis. It will
download open MAF
file from GDC database (Grossman et al.
2016), select a gene and identify the which are the mutant
samples based on the following classification: (it can be changed using
the atgument mutant_variant_classification
).
Argument | Description |
---|---|
Frame_Shift_Del | Mutant |
Frame_Shift_Ins | Mutant |
Missense_Mutation | Mutant |
Nonsense_Mutation | Mutant |
Splice_Site | Mutant |
In_Frame_Del | Mutant |
In_Frame_Ins | Mutant |
Translation_Start_Site | Mutant |
Nonstop_Mutation | Mutant |
Silent | WT |
3’UTR | WT |
5’UTR | WT |
3’Flank | WT |
5’Flank | WT |
IGR1 (intergenic region) | WT |
Intron | WT |
RNA | WT |
Target_region | WT |
The arguments to be used are below:
TCGA.pipe
mutation arguments
Argument | Description |
---|---|
genes | List of genes for which mutations will be verified. A column in the MAE with the name of the gene will be created with two groups WT (tumor samples without mutation), MUT (tumor samples w/ mutation), NA (not tumor samples) |
mutant_variant_classification | List of GDC variant classification from MAF files to consider a samples mutant. Only used when argument gene is set. |
group.col | A column defining the groups of the sample. You can view the available columns using: colnames(MultiAssayExperiment::colData(data)). |
group1 | A group from group.col. ELMER will run group1 vs group2. That means, if direction is hyper, get probes hypermethylated in group 1 compared to group 2. |
group2 | A group from group.col. ELMER will run group1 vs group2. That means, if direction is hyper, get probes hypermethylated in group 1 compared to group 2. |
Here is an example we TCGA-LUSC data is downloaded and we will compare TP53 Mutant vs TP53 WT samples.
TCGA.pipe("LUSC",
wd = "./ELMER.example",
cores = parallel::detectCores()/2,
mode = "supervised"
genes = "TP53",
group.col = "TP53",
group1 = "Mutant",
group2 = "WT",
permu.size = 300,
Pe = 0.01,
analysis = c("download","diffMeth","pair","motif","TF.search"),
diff.dir = "hypo",
rm.chr = paste0("chr",c("X","Y")))
## R Under development (unstable) (2024-10-21 r87258)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.21-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] MultiAssayExperiment_1.33.0 SummarizedExperiment_1.37.0
## [3] Biobase_2.67.0 MatrixGenerics_1.19.0
## [5] matrixStats_1.4.1 GenomicRanges_1.59.0
## [7] GenomeInfoDb_1.43.0 IRanges_2.41.0
## [9] S4Vectors_0.45.0 sesameData_1.23.0
## [11] ExperimentHub_2.15.0 AnnotationHub_3.15.0
## [13] BiocFileCache_2.15.0 dbplyr_2.5.0
## [15] BiocGenerics_0.53.0 BiocStyle_2.35.0
## [17] dplyr_1.1.4 DT_0.33
## [19] ELMER_2.31.0 ELMER.data_2.29.0
##
## loaded via a namespace (and not attached):
## [1] later_1.3.2 BiocIO_1.17.0
## [3] bitops_1.0-9 filelock_1.0.3
## [5] tibble_3.2.1 XML_3.99-0.17
## [7] rpart_4.1.23 lifecycle_1.0.4
## [9] httr2_1.0.5 rstatix_0.7.2
## [11] doParallel_1.0.17 vroom_1.6.5
## [13] processx_3.8.4 lattice_0.22-6
## [15] ensembldb_2.31.0 crosstalk_1.2.1
## [17] backports_1.5.0 magrittr_2.0.3
## [19] plotly_4.10.4 Hmisc_5.2-0
## [21] sass_0.4.9 rmarkdown_2.28
## [23] jquerylib_0.1.4 yaml_2.3.10
## [25] Gviz_1.51.0 chromote_0.3.1
## [27] DBI_1.2.3 RColorBrewer_1.1-3
## [29] abind_1.4-8 zlibbioc_1.53.0
## [31] rvest_1.0.4 purrr_1.0.2
## [33] AnnotationFilter_1.31.0 biovizBase_1.55.0
## [35] RCurl_1.98-1.16 nnet_7.3-19
## [37] VariantAnnotation_1.53.0 rappdirs_0.3.3
## [39] circlize_0.4.16 GenomeInfoDbData_1.2.13
## [41] ggrepel_0.9.6 codetools_0.2-20
## [43] DelayedArray_0.33.0 xml2_1.3.6
## [45] tidyselect_1.2.1 shape_1.4.6.1
## [47] farver_2.1.2 UCSC.utils_1.3.0
## [49] TCGAbiolinksGUI.data_1.25.0 base64enc_0.1-3
## [51] GenomicAlignments_1.43.0 jsonlite_1.8.9
## [53] GetoptLong_1.0.5 Formula_1.2-5
## [55] iterators_1.0.14 systemfonts_1.1.0
## [57] foreach_1.5.2 tools_4.5.0
## [59] progress_1.2.3 ragg_1.3.3
## [61] Rcpp_1.0.13 glue_1.8.0
## [63] BiocBaseUtils_1.9.0 gridExtra_2.3
## [65] SparseArray_1.7.0 xfun_0.48
## [67] websocket_1.4.2 withr_3.0.2
## [69] BiocManager_1.30.25 fastmap_1.2.0
## [71] latticeExtra_0.6-30 fansi_1.0.6
## [73] digest_0.6.37 mime_0.12
## [75] R6_2.5.1 textshaping_0.4.0
## [77] colorspace_2.1-1 jpeg_0.1-10
## [79] dichromat_2.0-0.1 biomaRt_2.63.0
## [81] RSQLite_2.3.7 utf8_1.2.4
## [83] tidyr_1.3.1 generics_0.1.3
## [85] data.table_1.16.2 rtracklayer_1.67.0
## [87] prettyunits_1.2.0 httr_1.4.7
## [89] htmlwidgets_1.6.4 S4Arrays_1.7.0
## [91] pkgconfig_2.0.3 gtable_0.3.6
## [93] blob_1.2.4 ComplexHeatmap_2.23.0
## [95] XVector_0.47.0 htmltools_0.5.8.1
## [97] carData_3.0-5 ProtGenerics_1.39.0
## [99] clue_0.3-65 scales_1.3.0
## [101] png_0.1-8 knitr_1.48
## [103] rstudioapi_0.17.1 reshape2_1.4.4
## [105] tzdb_0.4.0 rjson_0.2.23
## [107] checkmate_2.3.2 curl_5.2.3
## [109] cachem_1.1.0 GlobalOptions_0.1.2
## [111] stringr_1.5.1 BiocVersion_3.21.1
## [113] parallel_4.5.0 foreign_0.8-87
## [115] AnnotationDbi_1.69.0 restfulr_0.0.15
## [117] reshape_0.8.9 pillar_1.9.0
## [119] grid_4.5.0 vctrs_0.6.5
## [121] promises_1.3.0 ggpubr_0.6.0
## [123] car_3.1-3 cluster_2.1.6
## [125] archive_1.1.9 htmlTable_2.4.3
## [127] evaluate_1.0.1 TCGAbiolinks_2.35.0
## [129] readr_2.1.5 GenomicFeatures_1.59.0
## [131] cli_3.6.3 compiler_4.5.0
## [133] Rsamtools_2.23.0 rlang_1.1.4
## [135] crayon_1.5.3 ggsignif_0.6.4
## [137] labeling_0.4.3 interp_1.1-6
## [139] ps_1.8.1 plyr_1.8.9
## [141] stringi_1.8.4 viridisLite_0.4.2
## [143] deldir_2.0-4 BiocParallel_1.41.0
## [145] munsell_0.5.1 Biostrings_2.75.0
## [147] lazyeval_0.2.2 Matrix_1.7-1
## [149] BSgenome_1.75.0 hms_1.1.3
## [151] bit64_4.5.2 ggplot2_3.5.1
## [153] KEGGREST_1.47.0 highr_0.11
## [155] fontawesome_0.5.2 broom_1.0.7
## [157] memoise_2.0.1 bslib_0.8.0
## [159] bit_4.5.0 downloader_0.4