Contents

1 Background

Several tools exist to display gene expression data in a click-and-plot system - most notably iSEE. More packages are listed in Related Tools. Geyser is unique from iSEE and others in that it does not do much (aside from showing expression data). The advantage of not doing much is that the number of dials and options is substantially reduced and it is, I hope, very simple to operate.

2 Install

Commented out for now as it is not on Bioconductor

if (!requireNamespace("BiocManager", quietly=TRUE))
    install.packages("BiocManager")
BiocManager::install("geyser")

3 Development Version

The latest version can be installed via github

if (!requireNamespace("remotes", quietly=TRUE))
     install.packages("remotes")
remotes::install_github("davemcg/geyser")

4 Load Test Data

library(geyser)
load(system.file('extdata/tiny_rse.Rdata', package = 'geyser'))

5 Run

Running geyser is as simple as giving the the SummarizedExperiment object to the geyser function.

if (interactive()){
  geyser(tiny_rse)
}

6 Screenshots of some core views

The Shiny-based GUI first shows you the metadata (colData slot) of the SummarizedExperiment (SE) object in a reactive DT data table.

The idea is that this helps you ID which are the relevant fields to plot against (tissue and disease).

You can then click over to the Plotting section of the app and start typing in those fields (tissue and disease) in the “Sample Grouping(s)” box. The text will auto-complete as you type.

After that you can type in the genes you are interested in. Again, the genes will auto-complete as you type.

When you click the orange “Draw Box Plot” button the plot will be made

You can custom filter which samples are shown by clicking on the “triangle” next to “Sample Filtering” and then selecting the samples you want to display. Here we select only the normal samples and then use the Heatmap visualization (you can swap between Box Plot and Heatmap by clicking between the tabs).

If you want to reset the custom sample filtering, just click the “Clear Rows” button

The plots can be “outputted” by either right-clicking or by

7 How to use recount3 to display a pre-processed dataset

We also do some light tweaking of the metadata to make human useable splits

# If needed: BiocManager::install("recount3")
if (interactive()){
  library(recount3)
  library(geyser)
  human_projects <- available_projects()
  proj_info <- subset( 
    human_projects,
    project == "SRP107937" & project_type == "data_sources" 
  )
  rse_SRP107937 <- create_rse(proj_info)
  assay(rse_SRP107937, "counts") <- transform_counts(rse_SRP107937)
  # first tweak that glues the gene name onto the gene id in the row names
  rownames(rse_SRP107937) <- paste0(rowData(rse_SRP107937)$gene_name, ' (', row.names(rse_SRP107937), ')')
  # creates two new metadata fields 
  colData(rse_SRP107937)$tissue <- colData(rse_SRP107937)$sra.sample_title %>% stringr::str_extract(.,'PRC|PR')
  colData(rse_SRP107937)$disease <- colData(rse_SRP107937)$sra.sample_title %>% stringr::str_extract(.,'AMD|Normal')
  
  geyser(rse_SRP107937, " geyser: SRP107937")
}

8 How to turn a count matrix into a SummarizedExperiment

The key step is to get matched metadata (where each row corresponds to each column of the count matrix).

library(SummarizedExperiment)
#> Loading required package: MatrixGenerics
#> Loading required package: matrixStats
#> 
#> Attaching package: 'MatrixGenerics'
#> The following objects are masked from 'package:matrixStats':
#> 
#>     colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
#>     colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
#>     colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
#>     colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
#>     colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
#>     colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
#>     colWeightedMeans, colWeightedMedians, colWeightedSds,
#>     colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
#>     rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
#>     rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
#>     rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
#>     rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
#>     rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
#>     rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
#>     rowWeightedSds, rowWeightedVars
#> Loading required package: GenomicRanges
#> Loading required package: stats4
#> Loading required package: BiocGenerics
#> Loading required package: generics
#> 
#> Attaching package: 'generics'
#> The following objects are masked from 'package:base':
#> 
#>     as.difftime, as.factor, as.ordered, intersect, is.element, setdiff,
#>     setequal, union
#> 
#> Attaching package: 'BiocGenerics'
#> The following objects are masked from 'package:stats':
#> 
#>     IQR, mad, sd, var, xtabs
#> The following objects are masked from 'package:base':
#> 
#>     Filter, Find, Map, Position, Reduce, anyDuplicated, aperm, append,
#>     as.data.frame, basename, cbind, colnames, dirname, do.call,
#>     duplicated, eval, evalq, get, grep, grepl, is.unsorted, lapply,
#>     mapply, match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
#>     rank, rbind, rownames, sapply, saveRDS, table, tapply, unique,
#>     unsplit, which.max, which.min
#> Loading required package: S4Vectors
#> 
#> Attaching package: 'S4Vectors'
#> The following object is masked from 'package:utils':
#> 
#>     findMatches
#> The following objects are masked from 'package:base':
#> 
#>     I, expand.grid, unname
#> Loading required package: IRanges
#> Loading required package: GenomeInfoDb
#> Loading required package: Biobase
#> Welcome to Bioconductor
#> 
#>     Vignettes contain introductory material; view with
#>     'browseVignettes()'. To cite Bioconductor, see
#>     'citation("Biobase")', and for packages 'citation("pkgname")'.
#> 
#> Attaching package: 'Biobase'
#> The following object is masked from 'package:MatrixGenerics':
#> 
#>     rowMedians
#> The following objects are masked from 'package:matrixStats':
#> 
#>     anyMissing, rowMedians
counts <- matrix(runif(10 * 6, 1, 1e4), 10)
row.names(counts) <- paste0('gene', seq(1,10))
colnames(counts) <- LETTERS[1:6]

sample_info <- data.frame(Condition = c(rep("Unicorn", 3),
                                       rep("Horse", 3)),
                                       row.names = LETTERS[1:6])
                         
se_object <- SummarizedExperiment(assays=list(counts = counts), 
                                  colData = sample_info)

if (interactive()){
  geyser(se_object, "Magical Creatures")
}

9 How to turn a DESeqDataSet into a SummarizedExperiment

This example is taken from the DESeq2 guide. A DESeqDataSet is highly similar to the SummarizedExperiment class.

library(DESeq2)
library(airway)
data(airway)
ddsSE <- DESeqDataSet(airway, design = ~ cell + dex)

if (interactive()){
  geyser(ddsSE, "DESeq Airway Example")
}

11 Session Info

sessionInfo()
#> R Under development (unstable) (2024-10-21 r87258)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.1 LTS
#> 
#> Matrix products: default
#> BLAS:   /home/biocbuild/bbs-3.21-bioc/R/lib/libRblas.so 
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_GB              LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: America/New_York
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats4    stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#>  [1] airway_1.27.0               DESeq2_1.47.1              
#>  [3] SummarizedExperiment_1.37.0 Biobase_2.67.0             
#>  [5] GenomicRanges_1.59.1        GenomeInfoDb_1.43.2        
#>  [7] IRanges_2.41.2              S4Vectors_0.45.2           
#>  [9] BiocGenerics_0.53.3         generics_0.1.3             
#> [11] MatrixGenerics_1.19.1       matrixStats_1.5.0          
#> [13] geyser_0.99.8               BiocStyle_2.35.0           
#> 
#> loaded via a namespace (and not attached):
#>  [1] beeswarm_0.4.0          shape_1.4.6.1           circlize_0.4.16        
#>  [4] gtable_0.3.6            rjson_0.2.23            xfun_0.50              
#>  [7] bslib_0.8.0             ggplot2_3.5.1           GlobalOptions_0.1.2    
#> [10] lattice_0.22-6          vctrs_0.6.5             tools_4.5.0            
#> [13] parallel_4.5.0          tibble_3.2.1            cluster_2.1.8          
#> [16] pkgconfig_2.0.3         Matrix_1.7-1            RColorBrewer_1.1-3     
#> [19] lifecycle_1.0.4         GenomeInfoDbData_1.2.13 compiler_4.5.0         
#> [22] munsell_0.5.1           codetools_0.2-20        ComplexHeatmap_2.23.0  
#> [25] clue_0.3-66             vipor_0.4.7             httpuv_1.6.15          
#> [28] htmltools_0.5.8.1       sass_0.4.9              yaml_2.3.10            
#> [31] tidyr_1.3.1             pillar_1.10.1           later_1.4.1            
#> [34] crayon_1.5.3            jquerylib_0.1.4         BiocParallel_1.41.0    
#> [37] DelayedArray_0.33.3     cachem_1.1.0            iterators_1.0.14       
#> [40] abind_1.4-8             foreach_1.5.2           mime_0.12              
#> [43] locfit_1.5-9.10         tidyselect_1.2.1        digest_0.6.37          
#> [46] purrr_1.0.2             dplyr_1.1.4             bookdown_0.42          
#> [49] fastmap_1.2.0           grid_4.5.0              colorspace_2.1-1       
#> [52] cli_3.6.3               SparseArray_1.7.2       magrittr_2.0.3         
#> [55] S4Arrays_1.7.1          scales_1.3.0            UCSC.utils_1.3.0       
#> [58] promises_1.3.2          ggbeeswarm_0.7.2        rmarkdown_2.29         
#> [61] XVector_0.47.2          httr_1.4.7              GetoptLong_1.0.5       
#> [64] png_0.1-8               shiny_1.10.0            evaluate_1.0.3         
#> [67] knitr_1.49              doParallel_1.0.17       rlang_1.1.4            
#> [70] Rcpp_1.0.14             xtable_1.8-4            glue_1.8.0             
#> [73] BiocManager_1.30.25     jsonlite_1.8.9          R6_2.5.1