Associativity-Peakiness Metric for Contingency Tables

April 24, 20262604.22655

cs.LG

TLDR

Introduces the Associativity Peakiness (AP) metric, a novel, efficient, and high dynamic range performance metric for comparing clustering algorithms via contingency tables.

Key contributions

Proposes the Associativity Peakiness (AP) metric for evaluating clustering algorithms via contingency tables.
AP metric characterizes critical performance aspects not captured by existing vector-pair metrics.
Demonstrates higher dynamic range and computational efficiency compared to other public metrics.
Validated through simulations on 500 generated contingency tables across multiple scenarios.

Why it matters

This paper fills a critical gap by providing a dedicated metric for comparing clustering algorithms via contingency tables. The AP metric offers a more detailed, efficient, and higher dynamic range evaluation, crucial for predicting real-world deployment success and aiding researchers.

Original Abstract

For the use case of comparing the performance of clustering algorithms whose output is a contingency table, a single performance metric for contingency tables is needed. Such a metric is vital for comparative performance analysis of clustering algorithms. A survey of publicly available literature did not show the presence of such a metric. Metrics do exist for vector pairs of truth values and predicted values, which are an alternative form of output of clustering algorithms. However, the metrics for vector pairs do not reveal the presence of detailed features that are apparent in contingency tables. This paper presents the Associativity Peakiness (AP) metric, which characterizes aspects of clustering algorithm performance that are critical for predicting a clustering algorithm's performance when deployed. The AP metric is analogous to measures of quality for confusion matrices that are outputs of supervised learning algorithms. This paper presents results from simulations in which 500 contingency tables were generated for multiple test scenarios. The results show that for the use case of evaluating clustering algorithms, the AP metric characterizes performance of contingency tables with higher dynamic range than publicly available metrics, and that it is computationally more efficient than comparable publicly available metrics.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers