Deepbullwhip: An Open-Source Simulation and Benchmarking for Multi-Echelon Bullwhip Analyses
TLDR
Deepbullwhip is an open-source Python package for simulating and benchmarking multi-echelon supply chain bullwhip effects, offering speed and a standardized framework.
Key contributions
- Introduces deepbullwhip, an open-source Python package for multi-echelon supply chain simulation.
- Features a vectorized Monte Carlo engine, achieving 50-90x speedup for complex supply chain scenarios.
- Provides a standardized benchmarking framework with diverse policies, metrics, and real-world datasets.
- Demonstrates 427x bullwhip amplification and a 155x disparity between synthetic and real demand data.
Why it matters
Deepbullwhip offers a crucial open-source platform for simulating and benchmarking the persistent bullwhip effect, addressing key research gaps. It enables efficient analysis and comparison of mitigation strategies using real-world data, advancing supply chain management.
Original Abstract
The bullwhip effect remains operationally persistent despite decades of analytical research. Two computational deficiencies hinder progress: the absence of modular open-source simulation tools for multi-echelon inventory dynamics with asymmetric costs, and the lack of a standardized benchmarking protocol for comparing mitigation strategies across shared metrics and datasets. This paper introduces deepbullwhip, an open-source Python package that integrates a simulation engine for serial supply chains (with pluggable demand generators, ordering policies, and cost functions via abstract base classes, and a vectorized Monte Carlo engine achieving 50 to 90 times speedup) with a registry-based benchmarking framework shipping a curated catalog of ordering policies, forecasting methods, six bullwhip metrics, and demand datasets including WSTS semiconductor billings. Five sets of experiments on a four-echelon semiconductor chain demonstrate cumulative amplification of 427x (Monte Carlo mean across 1,000 paths), a stochastic filtering phenomenon at upstream tiers (CV = 0.01), super-exponential lead time sensitivity, and scalability to 20.8 million simulation cells in under 7 seconds. Benchmark experiments reveal a 155x disparity between synthetic AR(1) and real WSTS bullwhip severity under the Order-Up-To policy, and quantify the BWR-NSAmp tradeoff across ordering policies, demonstrating that no single metric captures policy quality.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.