Optimal Semiparametric Dynamic Pricing with Feature Diversity
Jinhang Chai, Yaqi Duan, Jianqing Fan, Kaizheng Wang
TLDR
This paper introduces an optimal semiparametric dynamic pricing algorithm that achieves state-of-the-art regret bounds by exploiting feature diversity.
Key contributions
- Proposes a stagewise greedy pricing algorithm for semiparametric demand models.
- Iteratively refines the unknown market-noise distribution F using local polynomial regression.
- Exploits feature diversity to reuse endogenous samples, avoiding costly global random exploration.
- Achieves optimal regret rates, improving best known bounds and matching the parametric √T rate.
Why it matters
Existing dynamic pricing methods either have suboptimal regret or rely on restrictive assumptions. This work offers a novel, efficient algorithm that overcomes these limitations. It establishes optimal regret rates, achieving the parametric √T rate under certain conditions, making it a significant advancement in contextual dynamic pricing.
Original Abstract
We study contextual dynamic pricing under a semiparametric demand model in which the purchase probability is $1-F(p-m(\mathbf{x}))$, where $m(\mathbf{x})$ captures mean utility as a function of product features and buyer covariates, and $F$ is an unknown market-noise distribution. Existing methods either incur suboptimal regret or rely on restrictive structural assumptions. We propose a stagewise greedy pricing algorithm that iteratively refines the estimate of $F$ via local polynomial regression while pricing greedily with current estimates. By exploiting feature diversity, the algorithm reuses endogenous samples collected during exploitation for nonparametric estimation, avoiding costly global random exploration used in prior work. We establish a general regret bound that applies to any estimator $\hat m$ of the utility function, and derive explicit rates for linear, nonparametric additive, and sparse linear classes of $m$. For the linear class, our regret scales as $T^{\max\{1/2,\,3/(2β+1)\}}$, where $β$ is the smoothness of $F$ and $T$ is the time horizon. This improves the best known rates for semiparametric contextual pricing and achieves the parametric $\sqrt{T}$ rate when $β\ge 5/2$. We further prove a matching lower bound, showing the optimality of our rate, and present numerical experiments that corroborate the theory and demonstrate the practical advantages of iterative refinement.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.