Pack only the essentials: Adaptive dictionary learning for kernel ridge regression
Daniele Calandriello, Alessandro Lazaric, Michal Valko
TLDR
SQUEAK is a new algorithm for kernel ridge regression that uses adaptive dictionary learning to achieve efficient Nystrom approximations with reduced space complexity.
Key contributions
- Introduces SQUEAK, an algorithm for efficient kernel ridge regression.
- Uses unnormalized Ridge Leverage Scores (RLS) for Nystrom approximation.
- Achieves space complexity only a constant factor worse than exact RLS sampling.
- Simplifies prior methods by removing effective dimension estimation.
Why it matters
Kernel Ridge Regression (KRR) is powerful but limited by high memory for large datasets. SQUEAK offers a simpler, more scalable Nystrom approximation, making KRR practical for larger applications and advancing efficient machine learning.
Original Abstract
One of the major limits of kernel ridge regression (KRR) is that storing and manipulating the kernel matrix K_n for n samples requires O(n^2) space, which rapidly becomes unfeasible for large n. Nystrom approximations reduce the space complexity to O(nm) by sampling m columns from K_n. Uniform sampling preserves KRR accuracy (up to epsilon) only when m is proportional to the maximum degree of freedom of K_n, which may require O(n) columns for datasets with high coherence. Sampling columns according to their ridge leverage scores (RLS) gives accurate Nystrom approximations with m proportional to the effective dimension, but computing exact RLS also requires O(n^2) space. (Calandriello et al. 2016) propose INK-Estimate, an algorithm that processes the dataset incrementally and updates RLS, effective dimension, and Nystrom approximations on-the-fly. Its space complexity scales with the effective dimension but introduces a dependency on the largest eigenvalue of K_n, which in the worst case is O(n). In this paper we introduce SQUEAK, a new algorithm that builds on INK-Estimate but uses unnormalized RLS. As a consequence, the algorithm is simpler, does not need to estimate the effective dimension for normalization, and achieves a space complexity that is only a constant factor worse than exact RLS sampling.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.