ArXiv TLDR

Budget-Constrained Causal Bandits: Bridging Uplift Modeling and Sequential Decision-Making

🐦 Tweet
2604.26169

Abhirami Pillai

cs.LGecon.EMstat.ML

TLDR

BCCB is an online framework for budget-constrained ad allocation, learning user responses and pacing spending, outperforming offline methods in cold-start.

Key contributions

  • Introduces Budget-Constrained Causal Bandits (BCCB) for online, sequential ad allocation under budget limits.
  • Unifies learning ad effectiveness, user exploration, and budget pacing into one sequential decision process.
  • Demonstrates high data efficiency, working effectively from the first user in cold-start scenarios.
  • Outperforms offline methods (needing 10k data points) and other online methods with lower variance.

Why it matters

This paper addresses a key challenge in digital advertising: allocating ad budgets effectively in cold-start scenarios. BCCB offers a data-efficient online solution, significantly improving performance and reliability over traditional offline methods. This makes campaign planning more practical.

Original Abstract

Treatment allocation under budget constraints is a central challenge in digital advertising: advertisers must decide which users to show ads to while spending a limited budget wisely. The standard approach follows a two-stage offline pipeline - first collect historical data to estimate heterogeneous treatment effects (HTE), then solve a constrained optimization to allocate the budget. This works well with abundant data, but fails in cold-start settings such as new campaigns, new markets, or new customer segments where little historical data exists. We propose Budget-Constrained Causal Bandits (BCCB), an online framework that learns which users respond to ads while simultaneously spending the budget, making treatment decisions one user at a time. BCCB unifies three components into a single sequential process: learning individual-level ad effectiveness, exploring users whose response is uncertain, and pacing the budget over time. We evaluated on the Criteo Uplift dataset, a large-scale advertising dataset from a real randomized controlled trial. Our key finding is a data-efficiency crossover: offline methods require approximately 10,000 historical observations to produce reliable results, while BCCB operates effectively from the very first user. Furthermore, BCCB exhibits 3-5x lower performance variance between runs, making it more practical for real campaign planning. Among purely online methods, BCCB consistently outperforms standard Thompson Sampling, budgeted Thompson Sampling, and greedy HTE estimation across all budget levels tested.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.