tradeSeq
is an R
package that allows
analysis of gene expression along trajectories. While it has been
developed and applied to single-cell RNA-sequencing (scRNA-seq) data,
its applicability extends beyond that, and also allows the analysis of,
e.g., single-cell ATAC-seq data along trajectories or bulk RNA-seq time
series datasets. For every gene in the dataset, tradeSeq
fits a generalized additive model (GAM) by building on the
mgcv
R package. It then allows statistical inference on the
GAM by assessing contrasts of the parameters of the fitted GAM model,
aiding in interpreting complex datasets. All details about the
tradeSeq
model and statistical tests are described in our
preprint (Van den Berge et al. 2020).
In this vignette, we analyze a subset of the data from (Paul et al. 2015). A
SingleCellExperiment
object of the data has been provided
with the tradeSeq
package and can be retrieved as shown below. The data and UMAP reduced
dimensions were derived from following the Monocle
3 vignette.
The main
vignette focuses on using tradeSeq
downstream of
slingshot
. Here, we present how to use
tradeSeq
downstream of monocle
(Qiu et al. 2017).
library(tradeSeq)
library(RColorBrewer)
library(SingleCellExperiment)
# For reproducibility
palette(brewer.pal(8, "Dark2"))
data(countMatrix, package = "tradeSeq")
counts <- as.matrix(countMatrix)
rm(countMatrix)
data(celltype, package = "tradeSeq")
As of now (06/2020), monocle3(Cao et al.
2019), is still in its beta version. Therefore, we have no plan
yet to include a S4 method for monocle3 while it is not on CRAN or
Bioconductor and the format is still moving. However, we present below a
way to use tradeSeq
downstream of monocle3
as
of version ‘0.2’, for a fully connected graph. We follow the tutorial
from the monocle3
website.
You will need to install monocle3 from here before running the code below.
set.seed(22)
library(monocle3)
# Create a cell_data_set object
cds <- new_cell_data_set(counts, cell_metadata = pd,
gene_metadata = data.frame(gene_short_name = rownames(counts),
row.names = rownames(counts)))
# Run PCA then UMAP on the data
cds <- preprocess_cds(cds, method = "PCA")
cds <- reduce_dimension(cds, preprocess_method = "PCA",
reduction_method = "UMAP")
# First display, coloring by the cell types from Paul et al
plot_cells(cds, label_groups_by_cluster = FALSE, cell_size = 1,
color_cells_by = "cellType")
# Running the clustering method. This is necessary to the construct the graph
cds <- cluster_cells(cds, reduction_method = "UMAP")
# Visualize the clusters
plot_cells(cds, color_cells_by = "cluster", cell_size = 1)
# Construct the graph
# Note that, for the rest of the code to run, the graph should be fully connected
cds <- learn_graph(cds, use_partition = FALSE)
# We find all the cells that are close to the starting point
cell_ids <- colnames(cds)[pd$cellType == "Multipotent progenitors"]
closest_vertex <- cds@principal_graph_aux[["UMAP"]]$pr_graph_cell_proj_closest_vertex
closest_vertex <- as.matrix(closest_vertex[colnames(cds), ])
closest_vertex <- closest_vertex[cell_ids, ]
closest_vertex <- as.numeric(names(which.max(table(closest_vertex))))
mst <- principal_graph(cds)$UMAP
root_pr_nodes <- igraph::V(mst)$name[closest_vertex]
# We compute the trajectory
cds <- order_cells(cds, root_pr_nodes = root_pr_nodes)
plot_cells(cds, color_cells_by = "pseudotime")
library(magrittr)
# Get the closest vertice for every cell
y_to_cells <- principal_graph_aux(cds)$UMAP$pr_graph_cell_proj_closest_vertex %>%
as.data.frame()
y_to_cells$cells <- rownames(y_to_cells)
y_to_cells$Y <- y_to_cells$V1
# Get the root vertices
# It is the same node as above
root <- cds@principal_graph_aux$UMAP$root_pr_nodes
# Get the other endpoints
endpoints <- names(which(igraph::degree(mst) == 1))
endpoints <- endpoints[!endpoints %in% root]
# For each endpoint
cellWeights <- lapply(endpoints, function(endpoint) {
# We find the path between the endpoint and the root
path <- igraph::shortest_paths(mst, root, endpoint)$vpath[[1]]
path <- as.character(path)
# We find the cells that map along that path
df <- y_to_cells[y_to_cells$Y %in% path, ]
df <- data.frame(weights = as.numeric(colnames(cds) %in% df$cells))
colnames(df) <- endpoint
return(df)
}) %>% do.call(what = 'cbind', args = .) %>%
as.matrix()
rownames(cellWeights) <- colnames(cds)
pseudotime <- matrix(pseudotime(cds), ncol = ncol(cellWeights),
nrow = ncol(cds), byrow = FALSE)
sce <- fitGAM(counts = counts,
pseudotime = pseudotime,
cellWeights = cellWeights)
Then, the sce
object can be analyzed following the main
vignette.
sessionInfo()
## R Under development (unstable) (2024-10-21 r87258)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.21-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] SingleCellExperiment_1.29.0 SummarizedExperiment_1.37.0
## [3] Biobase_2.67.0 GenomicRanges_1.59.0
## [5] GenomeInfoDb_1.43.0 IRanges_2.41.0
## [7] S4Vectors_0.45.0 BiocGenerics_0.53.0
## [9] MatrixGenerics_1.19.0 matrixStats_1.4.1
## [11] RColorBrewer_1.1-3 tradeSeq_1.21.0
## [13] knitr_1.48
##
## loaded via a namespace (and not attached):
## [1] gtable_0.3.6 xfun_0.48 bslib_0.8.0
## [4] ggplot2_3.5.1 lattice_0.22-6 vctrs_0.6.5
## [7] tools_4.5.0 generics_0.1.3 parallel_4.5.0
## [10] slingshot_2.15.0 tibble_3.2.1 fansi_1.0.6
## [13] pkgconfig_2.0.3 Matrix_1.7-1 TrajectoryUtils_1.15.0
## [16] lifecycle_1.0.4 GenomeInfoDbData_1.2.13 compiler_4.5.0
## [19] statmod_1.5.0 munsell_0.5.1 codetools_0.2-20
## [22] htmltools_0.5.8.1 sass_0.4.9 yaml_2.3.10
## [25] pillar_1.9.0 crayon_1.5.3 jquerylib_0.1.4
## [28] BiocParallel_1.41.0 limma_3.63.0 DelayedArray_0.33.0
## [31] cachem_1.1.0 viridis_0.6.5 abind_1.4-8
## [34] nlme_3.1-166 princurve_2.1.6 locfit_1.5-9.10
## [37] tidyselect_1.2.1 digest_0.6.37 dplyr_1.1.4
## [40] splines_4.5.0 fastmap_1.2.0 grid_4.5.0
## [43] colorspace_2.1-1 cli_3.6.3 SparseArray_1.7.0
## [46] magrittr_2.0.3 S4Arrays_1.7.0 utf8_1.2.4
## [49] edgeR_4.5.0 scales_1.3.0 UCSC.utils_1.3.0
## [52] rmarkdown_2.28 XVector_0.47.0 httr_1.4.7
## [55] igraph_2.1.1 gridExtra_2.3 pbapply_1.7-2
## [58] evaluate_1.0.1 viridisLite_0.4.2 mgcv_1.9-1
## [61] rlang_1.1.4 Rcpp_1.0.13 glue_1.8.0
## [64] jsonlite_1.8.9 R6_2.5.1 zlibbioc_1.53.0