An effective variant of the Hartigan $k$-means algorithm

April 23, 20262604.21798

cs.LG

TLDR

A minor variant of Hartigan's k-means algorithm achieves an additional 2-5% performance improvement over the original method.

Key contributions

Hartigan's k-means generally outperforms Lloyd's algorithm by 5-10%.
Introduces a minor variation to Hartigan's k-means algorithm.
This variant yields an additional 2-5% performance improvement.
Gains are larger with increased dimensions or number of clusters (k).

Why it matters

This paper offers a simple yet effective enhancement to the widely used Hartigan's k-means algorithm. Its improved performance, especially in high-dimensional or large-k scenarios, makes it valuable for practical clustering applications.

Original Abstract

The k-means problem is perhaps the classical clustering problem and often synonymous with Lloyd's algorithm (1957). It has become clear that Hartigan's algorithm (1975) gives better results in almost all cases, Telgarsky-Vattani note a typical improvement of $5\%$ -- $10\%$. We point out that a very minor variation of Hartigan's method leads to another $2\%$ -- $5\%$ improvement; the improvement tends to become larger when either dimension or $k$ increase.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers