The UPDhmm’s User Guide

list(name = “Marta Sevilla Porras”, affiliation = c(“Universitat Pompeu Fabra (UPF)”, “Centro de Investigación Biomédica en Red (CIBERER)”), email = “marta.sevilla@upf.edu”) list(name = “Carlos Ruiz Arenas”, affiliation = “Universidad de Navarra (UNAV)”, email = “cruizarenas@unav.es”)

2024-10-29

Introduction

Background

Uniparental disomy (UPD) is a rare genetic condition where an individual inherits both copies of a chromosome from one parent, as opposed to the typical inheritance of one copy from each parent. The extent of its contribution as a causative factor in rare genetic diseases remains largely unexplored. UPDs can lead to disease either by inheriting a carrier pathogenic mutation as homozygous from a carrier parent or by causing errors in genetic imprinting. Nevertheless, there are currently no standardized methods available for the detection and characterization of these events.

We have developed UPDhmm R/Bioconductor package. The package provides a tool method to detect, classify and stablish the location of uniparental disomy events.

Method overview

UPDhmm relies on a Hidden Markov Model (HMM) to identify regions with UPD.
In our HMM model, observations are the combination of the genotypes from the father, mother and proband for every genomic variant in the input data. The hidden states of the model represent five inheritance patterns: normal (mendelian inheritance), maternal isodisomy, paternal isodisomy, maternal heterodisomy and paternal heterodisomy. Emission probabilities were defined based on the inheritance patterns. Viterbi algorithm was employed to infer the most likely combination of hidden states underlying a sequence of observations, thus defining the most likely inheritance pattern for each genetic variant. UPDhmm reports segments in the genome having an inheritance pattern different than normal, and thus, likely harbouring a UPD.”

Set-up the packages

Loading libraries

library(UPDhmm)
library(dplyr)

Quick start

The input of the package is a multisample vcf file with the following requirements:

Datasets

The UPDhmm package includes one example dataset, adapted from GIB This dataset serves as a practical illustration and can be utilized for testing and familiarizing users with the functionality of the package.

Preprocessing

After reading the VCF file, the vcfCheck() function is employed for preprocessing the input data. This function facilitates reading the VCF in the suitable format for the UPDhmm package.

library(UPDhmm)
library(VariantAnnotation)
file <- system.file(package = "UPDhmm", "extdata", "test_het_mat.vcf.gz")
vcf <- readVcf(file)
processedVcf <- vcfCheck(
    vcf,
    proband = "NA19675", mother = "NA19678", father = "NA19679"
)

Uniparental disomy detection

The principal function of UPDhmm package, calculateEvents(), is the central function for identifying Uniparental Disomy (UPD) events. It takes the output from the previous vcfCheck() function and systematically analyzes genomic data, splitting the VCF into chromosomes and applying the Viterbi algorithm.

calculateEvents() function encompasses a serie of subprocesses for identifying Uniparental disomies (UPDs):

(1) Split vcf into chromosomes (2) Apply Viterbi algorithm to every chromosome (3) Transform the inferred hidden states to coordinates data.frame (4) Create blocks of contiguous variants with the same state (5) Calculate the statistics (log-likelihood and p-value).

updEvents <- calculateEvents(largeCollapsedVcf = processedVcf)
head(updEvents)
  seqnames     start       end   group n_snps log_likelihood      p_value
1        3 100374740 197574936 het_mat     47       75.21933 4.212224e-18
2        6  32489853  32490000 iso_mat      6       38.63453 5.110683e-10
3        6  32609207  32609341 iso_mat     11       95.23543 1.690378e-22
4       11  55322539  55587117 iso_mat      7       40.02082 2.512703e-10
5       14 105408955 105945891 het_fat     30       20.84986 4.967274e-06
6       15  22382655  23000272 iso_mat     12       48.33859 3.586255e-12
  n_mendelian_error
1                 2
2                 5
3                 7
4                 3
5                 2
6                 3

Results description

The calculateEvents function returns a data.frame containing all detected events in the provided trio. If no events are found, the function will return an empty data.frame.

Column name Description
seqnames Chromosome
start Start position of the block
end End position of the block
n_snps Number of variants within the event
group Predicted state
log_likelihood log likelihood ratio
p_value p-value
n_mendelian_error Count of Mendelian errors (genotypic combinations that are inconsistent with Mendelian inheritance principles)

Results Visualization

To visualize the results, the karyoploteR package can be used. Here, a custom function is provided to facilitate easy implementation with the output format results. This function enables the visualization of the detected blocks by the calculateEvents function.

In this example, we can observe how the package detects the simulated event on chromosome 3, as well as the specific type of event. In the plot, the autosomes chromosomes are represented, and we can see that the entire q arm of chromosome 3 is colored. With the legend, we can identify that the event is a maternal heterodisomy.

library(karyoploteR)
library(regioneR)
plotUPDKp <- function(updEvents) {
    updEvents$seqnames <- paste0("chr", updEvents$seqnames)
    het_fat <- toGRanges(subset(
        updEvents,
        group == "het_fat"
    )[, c("seqnames", "start", "end")])
    het_mat <- toGRanges(subset(
        updEvents,
        group == "het_mat"
    )[, c("seqnames", "start", "end")])
    iso_fat <- toGRanges(subset(
        updEvents,
        group == "iso_fat"
    )[, c("seqnames", "start", "end")])
    iso_mat <- toGRanges(subset(
        updEvents,
        group == "iso_mat"
    )[, c("seqnames", "start", "end")])

    kp <- plotKaryotype(genome = "hg19")
    kpPlotRegions(kp, het_fat, col = "#AAF593")
    kpPlotRegions(kp, het_mat, col = "#FFB6C1")
    kpPlotRegions(kp, iso_fat, col = "#A6E5FC")
    kpPlotRegions(kp, iso_mat, col = "#E9B864")

    colors <- c("#AAF593", "#FFB6C1", "#A6E5FC", "#E9B864")
    legend("topright",
        legend = c("Het_Fat", "Het_Mat", "Iso_Fat", "Iso_Mat"),
        fill = colors
    )
}

plotUPDKp(updEvents)

plot of chunk unnamed-chunk-1

Session Info

sessionInfo()
R Under development (unstable) (2024-10-21 r87258)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 24.04.1 LTS

Matrix products: default
BLAS:   /home/biocbuild/bbs-3.21-bioc/R/lib/libRblas.so 
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_GB              LC_COLLATE=C              
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: America/New_York
tzcode source: system (glibc)

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] karyoploteR_1.33.0          regioneR_1.39.0            
 [3] VariantAnnotation_1.53.0    Rsamtools_2.23.0           
 [5] Biostrings_2.75.0           XVector_0.47.0             
 [7] SummarizedExperiment_1.37.0 Biobase_2.67.0             
 [9] GenomicRanges_1.59.0        GenomeInfoDb_1.43.0        
[11] IRanges_2.41.0              S4Vectors_0.45.0           
[13] MatrixGenerics_1.19.0       matrixStats_1.4.1          
[15] BiocGenerics_0.53.0         dplyr_1.1.4                
[17] UPDhmm_1.3.0               

loaded via a namespace (and not attached):
 [1] DBI_1.2.3                bitops_1.0-9             gridExtra_2.3           
 [4] rlang_1.1.4              magrittr_2.0.3           biovizBase_1.55.0       
 [7] compiler_4.5.0           RSQLite_2.3.7            GenomicFeatures_1.59.0  
[10] png_0.1-8                vctrs_0.6.5              ProtGenerics_1.39.0     
[13] stringr_1.5.1            pkgconfig_2.0.3          crayon_1.5.3            
[16] fastmap_1.2.0            backports_1.5.0          utf8_1.2.4              
[19] rmarkdown_2.28           UCSC.utils_1.3.0         bit_4.5.0               
[22] xfun_0.48                zlibbioc_1.53.0          cachem_1.1.0            
[25] jsonlite_1.8.9           blob_1.2.4               highr_0.11              
[28] DelayedArray_0.33.0      BiocParallel_1.41.0      parallel_4.5.0          
[31] cluster_2.1.6            R6_2.5.1                 stringi_1.8.4           
[34] RColorBrewer_1.1-3       bezier_1.1.2             rtracklayer_1.67.0      
[37] rpart_4.1.23             Rcpp_1.0.13              knitr_1.48              
[40] base64enc_0.1-3          Matrix_1.7-1             nnet_7.3-19             
[43] tidyselect_1.2.1         rstudioapi_0.17.1        dichromat_2.0-0.1       
[46] abind_1.4-8              yaml_2.3.10              codetools_0.2-20        
[49] curl_5.2.3               lattice_0.22-6           tibble_3.2.1            
[52] KEGGREST_1.47.0          evaluate_1.0.1           foreign_0.8-87          
[55] pillar_1.9.0             BiocManager_1.30.25      checkmate_2.3.2         
[58] generics_0.1.3           RCurl_1.98-1.16          ensembldb_2.31.0        
[61] ggplot2_3.5.1            munsell_0.5.1            scales_1.3.0            
[64] BiocStyle_2.35.0         glue_1.8.0               lazyeval_0.2.2          
[67] Hmisc_5.2-0              tools_4.5.0              BiocIO_1.17.0           
[70] data.table_1.16.2        BSgenome_1.75.0          GenomicAlignments_1.43.0
[73] XML_3.99-0.17            grid_4.5.0               AnnotationDbi_1.69.0    
[76] colorspace_2.1-1         GenomeInfoDbData_1.2.13  htmlTable_2.4.3         
[79] restfulr_0.0.15          Formula_1.2-5            cli_3.6.3               
[82] HMM_1.0.1                fansi_1.0.6              S4Arrays_1.7.0          
[85] AnnotationFilter_1.31.0  gtable_0.3.6             digest_0.6.37           
[88] SparseArray_1.7.0        rjson_0.2.23             htmlwidgets_1.6.4       
[91] memoise_2.0.1            htmltools_0.5.8.1        lifecycle_1.0.4         
[94] httr_1.4.7               bit64_4.5.2              bamsignals_1.39.0       

References