Refining Covariance Matrix Estimation in Stochastic Gradient Descent Through Bias Reduction
Ziyang Wei, Wanrong Zhu, Jingyang Lyu, Wei Biao Wu
TLDR
A new online de-biased covariance estimator for SGD significantly improves accuracy and convergence without needing second-order derivatives.
Key contributions
- Proposes a novel, fully online de-biased covariance estimator for Stochastic Gradient Descent (SGD).
- Eliminates the need for inaccessible second-order (Hessian) derivative information.
- Achieves significantly improved estimation accuracy and a faster convergence rate than existing methods.
Why it matters
Estimating covariance in SGD is crucial for online inference, but existing methods are often slow or require complex Hessian data. This paper offers a practical, accurate, and faster solution, making SGD more reliable for large-scale machine learning applications.
Original Abstract
We study online inference and asymptotic covariance estimation for the stochastic gradient descent (SGD) algorithm. While classical methods (such as plug-in and batch-means estimators) are available, they either require inaccessible second-order (Hessian) information or suffer from slow convergence. To address these challenges, we propose a novel, fully online de-biased covariance estimator that eliminates the need for second-order derivatives while significantly improving estimation accuracy. Our method employs a bias-reduction technique to achieve a convergence rate of $n^{(α-1)/2} \sqrt{\log n}$, outperforming existing Hessian-free alternatives.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.