1Institute for Molecular Life Sciences, University of Zurich, Zurich, Switzerland 2Department of Health Sciences and Technology, ETH Zurich, Zurich, Switzerland
This package provides multiple approaches for comparing two partitions1 A partition is a way of organizing the data points of a dataset into distinct, non-overlapping, and non-empty subsets. For example, a clustering is a partition. of the same dataset, and evaluating the alignment between a dataset’s embedding/graph representations and its partition.
Besides, this package further offers methods for comparing two fuzzy partitions2 In ‘hard’ partitions, each data point belongs to one and only one subset. However, clustering can also generate fuzzy partitions, in which data points can belong to multiple subsets with varying degrees (or probability) of membership. as well as for comparing a hard partition with a fuzzy partition. This allows the evaluation of fuzzy partition results by assessing its agreement to a fuzzy or a hard ground-truth partition.
Finally, the package implements visualization and evaluation metrics tailored for domain detection in spatially-resolved -omics data.
These include especially external evaluation metrics (i.e. based on a comparison to ground truth labels), but also internal metrics. For a detailed description on how to work with SpatialExperiment objects, we refer to another vignette of poem.
1.2 Main functions
The package poem includes many metrics to perform different kinds of evaluations, and these metrics can be retrieved via 6 main wrapper functions. Unless specified, “partition” means “hard” partition. They are:
getEmbeddingMetrics(): Metrics to compare an embedding of data points to a partition of these data points.
getGraphMetrics(): Metrics to compare a graph (e.g. kNN/sNN) to a partition, where nodes in the graph are data points in the partition.
getPartitionMetrics(): Metrics to compare two partitions of the same dataset.
getfuzzyPartitionMetrics(): Metrics to compare two fuzzy partitions, or to compare between a fuzzy and a hard partition of the same dataset.
getSpatialExternalMetrics(): External metrics for evaluating spatial clustering results in a spatial-aware fashion. For non-spatial-aware evaluation, one can directly use getPartitionMetrics().
getSpatialInternalMetrics(): Internal metrics for evaluating spatial clustering results in a spatial-aware fashion.
There are 3 different levels where one can perform the above-mentioned evaluation: element-level, class-level, and dataset-level. Element-level evaluation reports metric values for each data point; Class-level evaluation reports metrics for each classes3 In this vignette, classes refer to groups in the ground-truth partition. or clusters4 In this vignette, clusters refer to groups in the predicted partition.; and dataset-level evaluation returns a single metric value for the whole dataset.
The following table illustrates available metrics at different evaluation levels, and the main functions used to retrieve them.
data(metric_info)
DT::datatable(metric_info)
2 Getting started
2.1 Example data
To showcase the main functions, we will use some simulated datasets as examples in this vignette.
The two datasets, g1 and g2, both contain 80 data points with x and y coordinates and of 4 different classes.
Let’s assume g1 and g2 contain two different embeddings of the same set of objects. A “good” embedding should put objects of the same class together, and objects of different class apart.
Since we know the ground-truth class of each object, one can evaluation such “goodness” of an embedding by calculating embedding evaluation metrics.
One can calculate such metrics element-wise, for each class/cluster, or for the whole dataset.
3.1 Element-level evaluation
For example, at the element level, one can calculate the Silhouette Width by specifying level="element" and metrics=c("SW"):
One can also evaluate at each class level, by specifying level="class". Check ?getEmbeddingMetrics to see what are the allowed metrics at the class level. For example:
Instead of directly using the distances or densities in the embedding space for evaluation, one may want to evaluate from a connectivity stand point by looking at the graph structure constructed from the above datasets. getGraphMetrics() can perform k nearest neighbor (KNN) graph or shared nearest neighbor graph (SNN) construction from an embedding and then apply graph-based evaluation metrics.
# Some functions for plotting
plotGraphs <- function(d, k=7){
gn <- dplyr::bind_rows(lapply(split(d[,-1],d$graph), FUN=function(d1){
nn <- emb2knn(as.matrix(d1[,c("x","y")]), k=k)
g <- poem:::.nn2graph(nn, labels=d1$class)
ggnetwork(g, layout=as.matrix(d1[,seq_len(2)]), scale=FALSE)
}), .id="graph")
ggplot(gn, aes(x = x, y = y, xend = xend, yend = yend)) + theme_blank() +
theme(legend.position = "right") + geom_edges(alpha=0.5, colour="grey") +
geom_nodes(aes(colour=class, shape=class), size=2) +
facet_wrap(~graph, nrow=1)
}
For our examples g1 and g2, the constructed graphs will look like:
plotGraphs(bind_rows(list(g1,g2), .id="graph"))
Use ?getGraphMetrics() to check optional arguments for KNN/SNN graph construction.
Similarly, level can be "element", "class" or "dataset".
Alternatively, getGraphMetrics() can take an igraph object as x, which enables the application of the evaluation metrics to a general graph, or a list of nearest neighbors as x, to accelerate the computation for large datasets.
5 Partition evaluation
We construct SNN graph from g1 and g2 embeddings, and then apply Louvain algorithm to get partitions out of them.
We then compare the predictions with the known labels using the partition metrics:
# for g1
getPartitionMetrics(true=g1$class, pred=g1$cluster, level="dataset",
metrics = c("RI", "WC", "WH", "ARI", "AWC", "AWH",
"FM", "AMI"))
## RI WC WH ARI AWC AWH FM AMI
## 1 0.9753165 0.95 0.9475066 0.9324949 0.9341118 0.9308836 0.9749844 0.9242305
# for g2
getPartitionMetrics(true=g2$class, pred=g2$cluster, level="dataset",
metrics = c("RI", "WC", "WH", "ARI", "AWC", "AWH",
"FM", "AMI"))
## RI WC WH ARI AWC AWH FM AMI
## 1 0.721519 0.95 0.4616368 0.4400954 0.9010025 0.2911552 0.6501669 0.4193846
Note that for class-level metrics, some are reported per class, while some (specifically, “WH”, "AWH) are reported per cluster.
getPartitionMetrics(true=g1$class, pred=g2$cluster, level="class")
## WC AWC FM class WH AWH cluster
## 1 0.9 0.802005 0.6551724 class1 NA NA <NA>
## 2 0.9 0.802005 0.6551724 class2 NA NA <NA>
## 3 1.0 1.000000 0.6451613 class3 NA NA <NA>
## 4 1.0 1.000000 0.6451613 class4 NA NA <NA>
## 5 NA NA NA <NA> 0.4864865 0.3238739 1
## 6 NA NA NA <NA> 0.4413473 0.2644406 2
6 Fuzzy partition evaluation
For comparing two fuzzy partitions or comparing a fuzzy partition to a hard patition, one can use getFuzzyPartitionMetrics().
The fuzzy reprensentation of a partion should look like the following, where each row is a data point, and the value is the class memberships to each class. Each row sums up to 1.
# a hard truth:
hardTrue <- apply(fuzzyTrue,1,FUN=which.max)
# some predicted labels:
hardPred <- c(1,1,1,1,1,1,2,2,2)
getFuzzyPartitionMetrics(hardPred=hardPred, hardTrue=hardTrue,
fuzzyTrue=fuzzyTrue, nperms=3, level="class")
## fuzzyWC fuzzyAWC class fuzzyWH fuzzyAWH cluster
## 1 0.7195238 0.4332906 1 NA NA NA
## 2 1.0000000 NaN 2 NA NA NA
## 3 1.0000000 NaN 3 NA NA NA
## 4 NA NA NA 1.00000000 1.000000 1
## 5 NA NA NA 0.06166667 -1.219448 2
By using the input hardPred, hardTrue, fuzzyPred, fuzzyTrue, one can control whether the fuzzy or hard version of the two partitions is used in comparison. For example, when fuzzyTrue and fuzzyPred are not NULL, metrics for comparing two fuzzy partitions will be used.
7 Spatial clustering evaluation
7.1 Example data
We use another toy example dataset in the package, sp_toys, to illustrate spatial clustering evaluation.
Here in C, the spots are colored by the ground-truth class. In P1 and P2, the color inside each spot is according to the ground-truth class, while the color of the border is according to clustering predictions. P1 and P2 misclassified the same amount of red spots into the blue cluster.
7.2 External metrics
Let’s quantify this by calculating external spatial metrics:
By specifying fuzzy_true and fuzzy_pred, one can control whether the fuzzy or hard version of true and pred is used in comparison. If fuzzy_true or fuzzy_pred is TRUE, the spatial neighborhood information will be used to construct the fuzzy representation of the class/cluster memberships.
getSpatialExternalMetrics(true=sp_toys$label, pred=sp_toys$p1,
location=sp_toys[,c("x","y")], level="class")
## SpatialWH SpatialAWH SpatialWC SpatialAWC class cluster
## 1 NA NA 0.8078698 0.5926787 1 NA
## 2 NA NA 1.0000000 1.0000000 2 NA
## 3 1.0000000 1.0000000 NA NA NA 1
## 4 0.8323893 0.6499484 NA NA NA 2
When the evaluation is non-spatial-aware, P1 and P2 get the same ARI score. However, with spatial-aware metrics like SpatialARI and SpatialAccuracy, P2 gets a higher scores than P1.
7.3 Internal metrics
Last but not least, there are internal metrics for spatial clustering evaluation:
sp_toys$c_elsa <- getSpatialInternalMetrics(label=sp_toys$label,
location=sp_toys[,c("x","y")], level="element",
metrics=c("ELSA"))$ELSA
## the specified variable is considered as categorical...
sp_toys$p1_elsa <- getSpatialInternalMetrics(label=sp_toys$p1,
location=sp_toys[,c("x","y")], level="element",
metrics=c("ELSA"))$ELSA
## the specified variable is considered as categorical...
sp_toys$p2_elsa <- getSpatialInternalMetrics(label=sp_toys$p2,
location=sp_toys[,c("x","y")], level="element",
metrics=c("ELSA"))$ELSA
## the specified variable is considered as categorical...