Replay-buffer engineering for noise-robust quantum circuit optimization
TLDR
This paper introduces replay-buffer engineering techniques to significantly improve sample efficiency, speed, and noise robustness in deep RL for quantum circuit optimization.
Key contributions
- Introduces ReaPER$^+$, an annealed replay rule, boosting sample efficiency by 4-32x for compact quantum circuits.
- Proposes OptCRLQAS to amortize quantum-classical evaluations, reducing wall-clock time by up to 67.5%.
- Develops a replay-buffer transfer scheme, cutting steps to chemical accuracy by 85-90% in noisy environments.
Why it matters
This work tackles key bottlenecks in applying deep RL to quantum circuit optimization, making the process significantly more efficient, faster, and robust to hardware noise. By treating the replay buffer as a primary algorithmic lever, it enables more scalable and practical quantum algorithm development.
Original Abstract
Deep reinforcement learning (RL) for quantum circuit optimization faces three fundamental bottlenecks: replay buffers that ignore the reliability of temporal-difference (TD) targets, curriculum-based architecture search that triggers a full quantum-classical evaluation at every environment step, and the routine discard of noiseless trajectories when retraining under hardware noise. We address all three by treating the replay buffer as a primary algorithmic lever for quantum optimization. We introduce ReaPER$+$, an annealed replay rule that transitions from TD error-driven prioritization early in training to reliability-aware sampling as value estimates mature, achieving $4-32\times$ gains in sample efficiency over fixed PER, ReaPER, and uniform replay while consistently discovering more compact circuits across quantum compilation and QAS benchmarks; validation on LunarLander-v3 confirms the principle is domain-agnostic. Furthermore we eliminate the quantum-classical evaluation bottleneck in curriculum RL by introducing OptCRLQAS which amortizes expensive evaluations over multiple architectural edits, cutting wall-clock time per episode by up to $67.5\%$ on a 12-qubit optimization problem without degrading solution quality. Finally we introduce a lightweight replay-buffer transfer scheme that warm-starts noisy-setting learning by reusing noiseless trajectories, without network-weight transfer or $ε$-greedy pretraining. This reduces steps to chemical accuracy by up to $85-90\%$ and final energy error by up to $90\%$ over from-scratch baselines on 6-, 8-, and 12-qubit molecular tasks. Together, these results establish that experience storage, sampling, and transfer are decisive levers for scalable, noise-robust quantum circuit optimization.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.