A Sufficient-Statistic Reduction of the Information Bottleneck to a Low-Dimensional Problem
TLDR
This paper shows the Information Bottleneck problem can be loss-free reduced to a lower-dimensional sufficient statistic, making it computationally tractable.
Key contributions
- Shows Information Bottleneck (IB) problem can be loss-free reduced via a sufficient statistic φ(T).
- Computational complexity of IB is now governed by the statistic's dimension, not source's.
- Provides an exact structural condition for tractable IB, bridging discrete and Gaussian regimes.
- Derives classical Gaussian IB as a corollary and proposes a nonlinear-Gaussian generalization.
Why it matters
This paper makes the Information Bottleneck (IB) problem tractable by reducing it to a low-dimensional sufficient statistic. This drastically cuts complexity, enabling high-dimensional IB curve computation and unifying solutions.
Original Abstract
We show that if the conditional distribution p(C | T) factors through a sufficient statistic φ(T), then the Information Bottleneck (IB) problem for (T, C) is exactly equivalent to the IB problem for (φ(T), C). The reduction is loss-free: it preserves the full IB curve, the Lagrangian optimum at every trade-off parameter \b{eta}, and the optimal representations up to pullback through φ. As a result, the computational complexity of solving the IB problem is governed by the dimension of the sufficient statistic rather than the ambient dimension of the source. This identifies an exact structural condition under which the generic IB problem becomes tractable, and gives a formal bridge between the discrete and linear-Gaussian regimes. We then show that the classical Gaussian IB solution of Chechik, Globerson, Tishby and Weiss is an immediate corollary of this reduction, and we state a nonlinear-Gaussian generalisation. A small numerical example illustrates the practical consequence: when a low-dimensional sufficient statistic is available, the exact IB curve can be computed on the reduced problem at a cost determined by the statistic rather than by the ambient source dimension.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.