There are four testing scenarios depending on the type format of the query set and database sets. They are shown with the respective testing scenario in the table below. testEnrichment
, testEnrichmentSEA
are for Fisher’s exact test and Set Enrichment Analysis respectively.
Continuous Database Set | Discrete Database Set | |
---|---|---|
Continuous Query | Correlation-based | Set Enrichment Analysis |
Discrete Query | Set Enrichment Analysis | Fisher’s Exact Test |
The query may be a named continuous vector. In that case, either a gene enrichment score will be calculated (if the database is discrete) or a Spearman correlation will be calculated (if the database is continuous as well). The three other cases are shown below using biologically relevant examples.
To display this functionality, let’s load two numeric database sets individually. One is a database set for CpG density and the other is a database set corresponding to the distance of the nearest transcriptional start site (TSS) to each probe.
sesameDataCache(data_titles = c("KYCG.MM285.seqContextN.20210630"))
res <- testEnrichmentSEA(query, "MM285.seqContextN")
main_stats <- c("dbname", "test", "estimate", "FDR", "nQ", "nD", "overlap")
res[,main_stats]
The estimate here is enrichment score.
NOTE: Negative enrichment score suggests enrichment of the categorical database with the higher values (in the numerical database). Positive enrichment score represent enrichment with the smaller values. As expected, the designed TSS CpGs are significantly enriched in smaller TSS distance and higher CpG density.
Alternatively one can test the enrichment of a continuous query with discrete databases. Here we will use the methylation level from a sample as the query and test it against the chromHMM chromatin states.
library(sesame)
sesameDataCache(data_titles = c("MM285.1.SigDF"))
beta_values <- getBetas(sesameDataGet("MM285.1.SigDF"))
res <- testEnrichmentSEA(beta_values, "MM285.chromHMM")
main_stats <- c("dbname", "test", "estimate", "FDR", "nQ", "nD", "overlap")
res[,main_stats]
As expected, chromatin states Tss
, Enh
has negative enrichment score, meaning these databases are associated with small values of the query (DNA methylation level). On the contrary, Het
and Quies
states are associated with high methylation level.
## R version 4.4.2 (2024-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.20-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] sesame_1.24.0 knitr_1.49
## [3] gprofiler2_0.2.3 SummarizedExperiment_1.36.0
## [5] Biobase_2.66.0 GenomicRanges_1.58.0
## [7] GenomeInfoDb_1.42.1 IRanges_2.40.1
## [9] S4Vectors_0.44.0 MatrixGenerics_1.18.0
## [11] matrixStats_1.4.1 sesameData_1.24.0
## [13] ExperimentHub_2.14.0 AnnotationHub_3.14.0
## [15] BiocFileCache_2.14.0 dbplyr_2.5.0
## [17] BiocGenerics_0.52.0 knowYourCG_1.2.5
##
## loaded via a namespace (and not attached):
## [1] DBI_1.2.3 bitops_1.0-9 rlang_1.1.4
## [4] magrittr_2.0.3 compiler_4.4.2 RSQLite_2.3.9
## [7] png_0.1-8 vctrs_0.6.5 reshape2_1.4.4
## [10] stringr_1.5.1 pkgconfig_2.0.3 crayon_1.5.3
## [13] fastmap_1.2.0 XVector_0.46.0 fontawesome_0.5.3
## [16] rmarkdown_2.29 tzdb_0.4.0 UCSC.utils_1.2.0
## [19] preprocessCore_1.68.0 purrr_1.0.2 bit_4.5.0.1
## [22] xfun_0.49 zlibbioc_1.52.0 cachem_1.1.0
## [25] jsonlite_1.8.9 blob_1.2.4 DelayedArray_0.32.0
## [28] BiocParallel_1.40.0 parallel_4.4.2 R6_2.5.1
## [31] bslib_0.8.0 stringi_1.8.4 RColorBrewer_1.1-3
## [34] jquerylib_0.1.4 Rcpp_1.0.13-1 wheatmap_0.2.0
## [37] readr_2.1.5 Matrix_1.7-1 tidyselect_1.2.1
## [40] abind_1.4-8 yaml_2.3.10 codetools_0.2-20
## [43] curl_6.0.1 lattice_0.22-6 tibble_3.2.1
## [46] plyr_1.8.9 withr_3.0.2 KEGGREST_1.46.0
## [49] evaluate_1.0.1 Biostrings_2.74.1 pillar_1.10.0
## [52] BiocManager_1.30.25 filelock_1.0.3 plotly_4.10.4
## [55] generics_0.1.3 RCurl_1.98-1.16 BiocVersion_3.20.0
## [58] hms_1.1.3 ggplot2_3.5.1 munsell_0.5.1
## [61] scales_1.3.0 glue_1.8.0 lazyeval_0.2.2
## [64] tools_4.4.2 data.table_1.16.4 grid_4.4.2
## [67] tidyr_1.3.1 AnnotationDbi_1.68.0 colorspace_2.1-1
## [70] GenomeInfoDbData_1.2.13 cli_3.6.3 rappdirs_0.3.3
## [73] S4Arrays_1.6.0 viridisLite_0.4.2 dplyr_1.1.4
## [76] gtable_0.3.6 sass_0.4.9 digest_0.6.37
## [79] SparseArray_1.6.0 ggrepel_0.9.6 htmlwidgets_1.6.4
## [82] memoise_2.0.1 htmltools_0.5.8.1 lifecycle_1.0.4
## [85] httr_1.4.7 bit64_4.5.2