A Comparative Study of Dynamic Programming and Reinforcement Learning in Finite Horizon Dynamic Pricing
Lev Razumovskiy, Nikolay Karenin
TLDR
This paper compares Dynamic Programming and Reinforcement Learning for finite-horizon dynamic pricing across complex environments, analyzing trade-offs.
Key contributions
- Compares Fitted Dynamic Programming (DP) and Reinforcement Learning (RL) in finite-horizon dynamic pricing.
- Evaluates methods across diverse environments, from single to multi-typology with constraints.
- Applies DP to complex, multi-dimensional settings, unlike prior simplified comparisons.
- Assesses revenue performance, stability, constraint satisfaction, and computational scaling.
Why it matters
This study provides a comprehensive comparison of DP and RL in dynamic pricing, addressing limitations of prior work by applying DP to complex, multi-dimensional problems. It offers insights into their trade-offs, crucial for practitioners and researchers designing pricing strategies.
Original Abstract
This paper provides a systematic comparison between Fitted Dynamic Programming (DP), where demand is estimated from data, and Reinforcement Learning (RL) methods in finite-horizon dynamic pricing problems. We analyze their performance across environments of increasing structural complexity, ranging from a single typology benchmark to multi-typology settings with heterogeneous demand and inter-temporal revenue constraints. Unlike simplified comparisons that restrict DP to low-dimensional settings, we apply dynamic programming in richer, multi-dimensional environments with multiple product types and constraints. We evaluate revenue performance, stability, constraint satisfaction behavior, and computational scaling, highlighting the trade-offs between explicit expectation-based optimization and trajectory-based learning.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.