AdaCubic: An Adaptive Cubic Regularization Optimizer for Deep Learning

April 10, 20262604.09437

Ioannis Tsingalis, Constantine Kotropoulos, Corentin Briat

cs.LG

TLDR

AdaCubic is a novel adaptive cubic regularization optimizer for deep learning that dynamically adjusts its cubic term weight and offers strong performance.

Key contributions

Dynamically adjusts the cubic term weight using an auxiliary optimization problem for adaptive regularization.
Approximates the Hessian matrix using Hutchinson's method, significantly reducing computational cost.
Inherits local convergence guarantees from cubically regularized Newton methods.
Achieves strong performance across CV, NLP, and Signal Processing tasks without hyperparameter fine-tuning.

Why it matters

AdaCubic introduces the first scalable cubic regularization optimizer for deep learning. Its adaptive nature and ability to perform well without hyperparameter fine-tuning make it highly practical. This simplifies deployment and offers robust optimization across diverse applications.

Original Abstract

A novel regularization technique, AdaCubic, is proposed that adapts the weight of the cubic term. The heart of AdaCubic is an auxiliary optimization problem with cubic constraints that dynamically adjusts the weight of the cubic term in Newton's cubic regularized method. We use Hutchinson's method to approximate the Hessian matrix, thereby reducing computational cost. We demonstrate that AdaCubic inherits the cubically regularized Newton method's local convergence guarantees. Our experiments in Computer Vision, Natural Language Processing, and Signal Processing tasks demonstrate that AdaCubic outperforms or competes with several widely used optimizers. Unlike other adaptive algorithms that require hyperparameter fine-tuning, AdaCubic is evaluated with a fixed set of hyperparameters, rendering it a highly attractive optimizer in settings where fine-tuning is infeasible. This makes AdaCubic an attractive option for researchers and practitioners alike. To our knowledge, AdaCubic is the first optimizer to leverage cubic regularization in scalable deep learning applications.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers