ArXiv TLDR

BatchBench: Toward a Workload-Aware Benchmark for Autoscaling Policies in Big Data Batch Processing -- A Proposed Framework

🐦 Tweet
2605.12272

Venkata Krishna Prasanth Budigi, Siri Chandana Sirigiri

cs.IRcs.DB

TLDR

The paper proposes BatchBench, an open framework to benchmark diverse autoscaling policies for big data batch processing.

Key contributions

  • A taxonomy of six batch processing workload classes derived from existing benchmarks and public cluster traces.
  • A parameterized workload generator with a robust validation methodology using K-S and earth-mover distance.
  • A five-axis evaluation harness for cost, SLA, responsiveness, thrash, and interpretability, including LLM inference cost.
  • A standardized agent interface to evaluate rule-based, LLM, and RL autoscalers via a single API.

Why it matters

The lack of a shared benchmark hinders consistent comparison of autoscaling policies for big data processing. BatchBench proposes an open framework to standardize evaluation, enabling fair comparison across all policy types and accelerating research.

Original Abstract

Autoscaling has become a baseline expectation for cloud-native big data processing, and the design space has expanded beyond rule-based heuristics to include learned controllers and, most recently, large language model (LLM) agents. Yet despite a growing body of work spanning these paradigms, the community lacks a shared benchmark for comparing them. Existing evaluations rely on synthetic TPC-style queries, vendor blog posts with proprietary baselines, or narrow trace replays. Each new policy reports favorable numbers against a different baseline, on a different workload, with a different cost model, making cross-paper comparison effectively impossible. This is a position paper. We propose BatchBench, an open benchmarking framework designed to place rule-based, learned, and agentic autoscaling policies on equal experimental footing. The contribution is the design of the framework, not empirical results. We contribute: (1) a workload taxonomy of six batch processing classes synthesized from published autoscaling benchmarks and publicly released cluster traces; (2) the design of a parameterized workload generator with a validation methodology based on two-sample Kolmogorov-Smirnov and earth-mover distance; (3) a five-axis evaluation harness specification covering cost, SLA attainment, scaling responsiveness, scaling thrash, and decision interpretability, with first-class accounting for LLM inference cost; and (4) a standardized agent interface that lets LLM-based and reinforcement-learning autoscalers be evaluated alongside rule-based controllers with a single API. We discuss the expected evaluation surface, identify open research questions the framework is designed to answer, and outline a roadmap for the empirical paper that will follow. BatchBench's reference implementation is in active development and will be released as open source.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.