An effective variant of the Hartigan $k$-means algorithm
François Clément, Stefan Steinerberger
TLDR
A minor variant of Hartigan's k-means algorithm achieves an additional 2-5% performance improvement over the original method.
Key contributions
- Hartigan's k-means generally outperforms Lloyd's algorithm by 5-10%.
- Introduces a minor variation to Hartigan's k-means algorithm.
- This variant yields an additional 2-5% performance improvement.
- Gains are larger with increased dimensions or number of clusters (k).
Why it matters
This paper offers a simple yet effective enhancement to the widely used Hartigan's k-means algorithm. Its improved performance, especially in high-dimensional or large-k scenarios, makes it valuable for practical clustering applications.
Original Abstract
The k-means problem is perhaps the classical clustering problem and often synonymous with Lloyd's algorithm (1957). It has become clear that Hartigan's algorithm (1975) gives better results in almost all cases, Telgarsky-Vattani note a typical improvement of $5\%$ -- $10\%$. We point out that a very minor variation of Hartigan's method leads to another $2\%$ -- $5\%$ improvement; the improvement tends to become larger when either dimension or $k$ increase.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.