ArXiv TLDR

SAVGO: Learning State-Action Value Geometry with Cosine Similarity for Continuous Control

🐦 Tweet
2605.00787

Stavros Orfanoudakis, Pedro P. Vergara

cs.LG

TLDR

SAVGO uses cosine similarity to learn state-action value geometry, guiding policy updates in continuous control for improved RL sample efficiency.

Key contributions

  • Learns a joint state-action embedding space with cosine similarity for value estimates.
  • Generates a similarity kernel from this geometry to guide policy updates beyond local gradients.
  • Unifies representation learning, value estimation, and policy optimization in one objective.

Why it matters

SAVGO bridges the gap between similarity learning and direct policy updates in action space. It unifies key RL components, leading to more efficient policy improvement. This approach shows significant gains on complex continuous control tasks.

Original Abstract

While representation and similarity learning have improved the sample efficiency of Reinforcement Learning (RL), they are rarely used to shape policy updates directly in the action space. To bridge this gap, a geometry-aware RL algorithm that explicitly incorporates value-based similarity into the policy update, State-Action Value Geometry Optimization (SAVGO), is proposed. In detail, SAVGO learns a joint state-action embedding space in which pairs with similar action-value estimates exhibit high cosine similarity, while dissimilar pairs are mapped to distinct directions. This learned geometry enables the generation of a similarity kernel over candidate actions sampled at each update, allowing policy improvement to be guided directly toward higher-value regions beyond local gradient-based updates. As a result, representation learning, value estimation, and policy optimization are unified within a single geometry-consistent objective, while preserving the scalability of off-policy actor-critic training. The proposed method is evaluated on standard MuJoCo continuous-control benchmarks, demonstrating improvements over strong baselines on challenging high-dimensional tasks. Ablation studies are done to analyze the contributions of value-geometry learning and similarity-based policy updates.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.