When to ASK: Uncertainty-Gated Language Assistance for Reinforcement Learning
Juarez Monteiro, Nathan Gavenski, Gianlucca Zuin, Adriano Veloso
TLDR
ASK enhances RL agents' out-of-distribution generalization by selectively querying language models for action suggestions only when uncertainty is high.
Key contributions
- Introduces ASK, a method combining RL policies with smaller LMs for OOD generalization.
- Employs Monte Carlo Dropout to assess uncertainty, querying LMs for actions only when uncertainty is high.
- Enhances out-of-distribution generalization for RL agents without requiring policy retraining.
- Achieves robust navigation in transfer tasks, demonstrating a 0.95 reward in the FrozenLake environment.
Why it matters
RL agents struggle with new, unseen scenarios. This paper offers ASK, a method that selectively leverages language models' knowledge to improve OOD generalization. By querying LMs only when uncertain, it maintains efficiency while boosting robustness, crucial for real-world RL deployments.
Original Abstract
Reinforcement learning (RL) agents often struggle with out-of-distribution (OOD) scenarios, leading to high uncertainty and random behavior. While language models (LMs) contain valuable world knowledge, larger ones incur high computational costs, hindering real-time use, and exhibit limitations in autonomous planning. We introduce Adaptive Safety through Knowledge (ASK), which combines smaller LMs with trained RL policies to enhance OOD generalization without retraining. ASK employs Monte Carlo Dropout to assess uncertainty and queries the LM for action suggestions only when uncertainty exceeds a set threshold. This selective use preserves the efficiency of existing policies while leveraging the language model's reasoning in uncertain situations. In experiments on the FrozenLake environment, ASK shows no improvement in-domain, but demonstrates robust navigation in transfer tasks, achieving a reward of 0.95. Our findings indicate that effective neuro-symbolic integration requires careful orchestration rather than simple combination, highlighting the need for sufficient model scale and effective hybridization mechanisms for successful OOD generalization.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.