Contents

1 Introduction

The epimutacionsData package is a repository of datasets for the epimutacions package. It includes 2 datasets to use as an example:

2 Installation

The following code explains how to access to the data:

library(ExperimentHub)
eh <- ExperimentHub()
query(eh, c("epimutacionsData"))
## ExperimentHub with 3 records
## # snapshotDate(): 2024-04-29
## # $dataprovider: GEO, Illumina 450k array
## # $species: Homo sapiens
## # $rdataclass: RGChannelSet, GenomicRatioSet, GRanges
## # additional mcols(): taxonomyid, genome, description,
## #   coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
## #   rdatapath, sourceurl, sourcetype 
## # retrieve records with, e.g., 'object[["EH6690"]]' 
## 
##            title                   
##   EH6690 | Control and case samples
##   EH6691 | Reference panel         
##   EH6692 | Candidate epimutations

2.1 Candidate epimutations

In Illumina 450K array (Reproducibility 2012), probes are unequally distributed along the genome, limiting the number of regions that can fulfil the requirements to be considered an epimutation. So, we have computed a dataset containing the regions that are candidates to become an epimutation.

To define the candidate epimutations, we relied on the clustering from bumphunter (Jaffe et al. 2012). We defined a primary dataset with all the CpGs from the Illumina 450K array. Then, we run bumphunter and selected those regions with at least 3 CpGs. As a result, we found 40408 candidate epimutations which are available in the candRegsGR dataset.

candRegsGR <- eh[["EH6692"]]

2.2 Example datasets

2.2.1 Reference panel

The package includes an RGChannelSet class reference panel (reference_panel) which contains 22 whole cord blood samples from healthy children born via caesarian from the GSE127824 cohort (Gervin et al. 2019).

The reference panel can be found in EH6691 record of the eh object:

reference_panel <- eh[["EH6691"]]

2.2.2 Control and case samples

The methy dataset includes 51 DNA methylation profiling of whole blood samples. 48 controls from GSE104812 (Shi et al. 2018) cohort and 3 cases from GSE97362 (Butcher et al. 2017). it is a GenomicRatioSet class object.

methy <- eh[["EH6690"]]

2.3 IDAT files

The IDAT files contain raw microarray intensities of 4 case samples from GSE131350 cohort. The files are located on the external data of epimutacionsData package:

library(minfi)
baseDir <- system.file("extdata", package = "epimutacionsData")
targets <- read.metharray.sheet(baseDir)

3 sessionInfo()

sessionInfo()
## R version 4.4.0 beta (2024-04-15 r86425)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 22.04.4 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.19-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] parallel  stats4    stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] minfi_1.50.0                bumphunter_1.46.0          
##  [3] locfit_1.5-9.9              iterators_1.0.14           
##  [5] foreach_1.5.2               Biostrings_2.72.0          
##  [7] XVector_0.44.0              SummarizedExperiment_1.34.0
##  [9] Biobase_2.64.0              MatrixGenerics_1.16.0      
## [11] matrixStats_1.3.0           GenomicRanges_1.56.0       
## [13] GenomeInfoDb_1.40.0         IRanges_2.38.0             
## [15] S4Vectors_0.42.0            epimutacionsData_1.8.0     
## [17] ExperimentHub_2.12.0        AnnotationHub_3.12.0       
## [19] BiocFileCache_2.12.0        dbplyr_2.5.0               
## [21] BiocGenerics_0.50.0         BiocStyle_2.32.0           
## 
## loaded via a namespace (and not attached):
##   [1] RColorBrewer_1.1-3        jsonlite_1.8.8           
##   [3] magrittr_2.0.3            GenomicFeatures_1.56.0   
##   [5] rmarkdown_2.26            BiocIO_1.14.0            
##   [7] zlibbioc_1.50.0           vctrs_0.6.5              
##   [9] multtest_2.60.0           memoise_2.0.1            
##  [11] Rsamtools_2.20.0          DelayedMatrixStats_1.26.0
##  [13] RCurl_1.98-1.14           askpass_1.2.0            
##  [15] htmltools_0.5.8.1         S4Arrays_1.4.0           
##  [17] curl_5.2.1                Rhdf5lib_1.26.0          
##  [19] SparseArray_1.4.0         rhdf5_2.48.0             
##  [21] sass_0.4.9                nor1mix_1.3-3            
##  [23] bslib_0.7.0               plyr_1.8.9               
##  [25] cachem_1.0.8              GenomicAlignments_1.40.0 
##  [27] mime_0.12                 lifecycle_1.0.4          
##  [29] pkgconfig_2.0.3           Matrix_1.7-0             
##  [31] R6_2.5.1                  fastmap_1.1.1            
##  [33] GenomeInfoDbData_1.2.12   digest_0.6.35            
##  [35] siggenes_1.78.0           reshape_0.8.9            
##  [37] AnnotationDbi_1.66.0      RSQLite_2.3.6            
##  [39] base64_2.0.1              filelock_1.0.3           
##  [41] fansi_1.0.6               httr_1.4.7               
##  [43] abind_1.4-5               compiler_4.4.0           
##  [45] beanplot_1.3.1            rngtools_1.5.2           
##  [47] bit64_4.0.5               withr_3.0.0              
##  [49] BiocParallel_1.38.0       DBI_1.2.2                
##  [51] HDF5Array_1.32.0          MASS_7.3-60.2            
##  [53] openssl_2.1.2             rappdirs_0.3.3           
##  [55] DelayedArray_0.30.0       rjson_0.2.21             
##  [57] tools_4.4.0               quadprog_1.5-8           
##  [59] glue_1.7.0                restfulr_0.0.15          
##  [61] nlme_3.1-164              rhdf5filters_1.16.0      
##  [63] grid_4.4.0                generics_0.1.3           
##  [65] tzdb_0.4.0                preprocessCore_1.66.0    
##  [67] tidyr_1.3.1               hms_1.1.3                
##  [69] data.table_1.15.4         xml2_1.3.6               
##  [71] utf8_1.2.4                BiocVersion_3.19.1       
##  [73] pillar_1.9.0              limma_3.60.0             
##  [75] genefilter_1.86.0         splines_4.4.0            
##  [77] dplyr_1.1.4               lattice_0.22-6           
##  [79] survival_3.6-4            rtracklayer_1.64.0       
##  [81] bit_4.0.5                 GEOquery_2.72.0          
##  [83] annotate_1.82.0           tidyselect_1.2.1         
##  [85] knitr_1.46                bookdown_0.39            
##  [87] xfun_0.43                 scrime_1.3.5             
##  [89] statmod_1.5.0             UCSC.utils_1.0.0         
##  [91] yaml_2.3.8                evaluate_0.23            
##  [93] codetools_0.2-20          tibble_3.2.1             
##  [95] BiocManager_1.30.22       cli_3.6.2                
##  [97] xtable_1.8-4              jquerylib_0.1.4          
##  [99] Rcpp_1.0.12               png_0.1-8                
## [101] XML_3.99-0.16.1           readr_2.1.5              
## [103] blob_1.2.4                mclust_6.1.1             
## [105] doRNG_1.8.6               sparseMatrixStats_1.16.0 
## [107] bitops_1.0-7              illuminaio_0.46.0        
## [109] purrr_1.0.2               crayon_1.5.2             
## [111] rlang_1.1.3               KEGGREST_1.44.0

References

Butcher, Darci T, Cheryl Cytrynbaum, Andrei L Turinsky, Michelle T Siu, Michal Inbar-Feigenberg, Roberto Mendoza-Londono, David Chitayat, et al. 2017. “CHARGE and Kabuki Syndromes: Gene-Specific Dna Methylation Signatures Identify Epigenetic Mechanisms Linking These Clinically Overlapping Conditions.” The American Journal of Human Genetics 100 (5): 773–88.

Gervin, Kristina, Lucas A Salas, Kelly M Bakulski, Menno C Van Zelm, Devin C Koestler, John K Wiencke, Liesbeth Duijts, et al. 2019. “Systematic Evaluation and Validation of Reference and Library Selection Methods for Deconvolution of Cord Blood Dna Methylation Data.” Clinical Epigenetics 11 (1): 1–15.

Jaffe, Andrew E, Peter Murakami, Hwajin Lee, Jeffrey T Leek, M Daniele Fallin, Andrew P Feinberg, and Rafael A Irizarry. 2012. “Bump Hunting to Identify Differentially Methylated Regions in Epigenetic Epidemiology Studies.” International Journal of Epidemiology 41 (1): 200–209.

Reproducibility, Unrivaled Assay. 2012. “Infinium Humanmethylation450 Beadchip.”

Shi, Lei, Fan Jiang, Fengxiu Ouyang, Jun Zhang, Zhimin Wang, and Xiaoming Shen. 2018. “DNA Methylation Markers in Combination with Skeletal and Dental Ages to Improve Age Estimation in Children.” Forensic Science International: Genetics 33: 1–9.