ArXiv TLDR

Networked Information Aggregation for Binary Classification

🐦 Tweet
2605.01082

MohammadHossein Bateni, Zahra Hadizadeh, MohammadTaghi Hajiaghayi, Mahdi JafariRaviz, Shayan Taherijam

cs.LGcs.GTecon.TH

TLDR

This paper analyzes networked binary classification on DAGs, showing network depth is a bottleneck for information aggregation in sequential logistic regression.

Key contributions

  • Studies networked binary classification on DAGs where agents sequentially train logistic predictors.
  • Extends information aggregation analysis from linear regression to the more complex logistic regression.
  • Proves an O(M/√D) upper bound on excess loss for depth-D paths under collective feature observation.
  • Establishes an Ω(k/D) lower bound, identifying network depth as a fundamental bottleneck.

Why it matters

This paper extends the study of information aggregation to networked binary classification using logistic regression, a more complex and practical setting. It identifies network depth as a fundamental bottleneck, providing critical insights for designing efficient distributed learning systems.

Original Abstract

We study networked binary classification on a directed acyclic graph (DAG) where each agent observes only a subset of the feature columns of a shared dataset. Agents act sequentially along the DAG: each receives prediction columns from its parents (if any), augments its local features with these columns, fits a logistic predictor by minimizing binary cross-entropy (BCE), and forwards its prediction column to its outgoing neighbors. We ask whether this sequential distributed training procedure achieves information aggregation, meaning that some agent attains small excess loss compared to the best logistic predictor trained with access to all feature columns. This question was studied for linear regression under squared loss by Kearns, Roth, and Ryu (SODA 2026). Extending their guarantees to classification is nontrivial because their analysis relies on quadratic structure that does not directly transfer to BCE with a logistic link. We analyze the resulting sequential logit-passing protocol and prove: (i) an excess loss upper bound of $O(M/\sqrt{D})$ on depth-$D$ paths under the condition that every $M$ contiguous subsequence of $M$ agents collectively observe all features, and (ii) a close lower bound showing instances with excess loss of at least $Ω(k/D)$ where $k$ is the dimension of the feature space. Together, these results identify network depth as a fundamental bottleneck for information aggregation in networked logistic regression.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.