Robotics
Research on robot control, manipulation, navigation, and human-robot interaction.
cs.RO · 524 papersHarmoWAM: Harmonizing Generalizable and Precise Manipulation via Adaptive World Action Models
HarmoWAM unifies predictive and reactive control in robot manipulation, achieving both generalizable transit and precise interaction through adaptive expert coordination.
Variational Inference for Lévy Process-Driven SDEs via Neural Tilting
This paper introduces a neural exponential tilting framework for variational inference in Lévy-driven SDEs, addressing challenges in modeling extreme events.
PriorVLA: Prior-Preserving Adaptation for Vision-Language-Action Models
PriorVLA adapts Vision-Language-Action models by preserving broad pretrained priors, using a frozen expert and an adaptation expert for superior performance.
RoboMemArena: A Comprehensive and Challenging Robotic Memory Benchmark
RoboMemArena is a new, large-scale robotic memory benchmark with 26 tasks, real-world evaluation, and VLM-generated annotations, alongside the PrediMem VLA.
MDrive: Benchmarking Closed-Loop Cooperative Driving for End-to-End Multi-agent Systems
MDrive is a new closed-loop cooperative driving benchmark with 225 diverse scenarios, revealing challenges and benefits of multi-agent systems.
CapVector: Learning Transferable Capability Vectors in Parametric Space for Vision-Language-Action Models
CapVector learns transferable capability vectors in parametric space for VLA models, enhancing performance and reducing adaptation costs during finetuning.
Safe Aerial 3D Path Planning for Autonomous UAVs using Magnetic Potential Fields
This paper introduces 3DMaxConvNet, a magnetic potential field planner for safe, real-time 3D UAV navigation in urban environments.
Is Your Driving World Model an All-Around Player?
WorldLens is a new benchmark, dataset, and agent for evaluating driving world models beyond visual realism, focusing on physical and behavioral fidelity.
Unified Noise Steering for Efficient Human-Guided VLA Adaptation
UniSteer efficiently adapts VLA models for robotics by unifying human action-space guidance with noise-space RL, boosting success rates quickly.
ALAM: Algebraically Consistent Latent Transitions for Vision-Language-Action Models
ALAM learns algebraically consistent latent transitions from action-free videos, significantly boosting VLA policy performance on complex robot manipulation tasks.
MAGS-SLAM: Monocular Multi-Agent Gaussian Splatting SLAM for Geometrically and Photometrically Consistent Reconstruction
MAGS-SLAM is the first RGB-only multi-agent 3D Gaussian Splatting SLAM for collaborative, photorealistic 3D reconstruction without depth sensors.
C-CoT: Counterfactual Chain-of-Thought with Vision-Language Models for Safe Autonomous Driving
C-CoT uses VLMs and counterfactual chain-of-thought to improve safe autonomous driving decisions, especially in complex, high-risk scenarios.
Decentralized Contingency MPC based on Safe Sets for Nonlinear Multi-agent Collision Avoidance
This paper introduces a decentralized contingency MPC for nonlinear multi-agent collision avoidance, ensuring safety and feasibility without inter-agent communication.
ObjView-Bench: Rethinking Difficulty and Deployment for Object-Centric View Planning
ObjView-Bench is a new framework for evaluating object-centric view planning, disentangling difficulty factors and considering real-world deployment constraints.
xApp Empowered Resource Management for Non-Terrestrial Users in 5G O-RAN Networks
This paper proposes a DDQN xApp for proactive UAV mobility management in 5G O-RAN, reducing handovers and outages.
Embodied AI in Action: Insights from SAE World Congress 2026 on Safety, Trust, Robotics, and Real-World Deployment
This paper summarizes key insights from the SAE World Congress 2026 on safely and trustworthily deploying embodied AI in real-world systems.
DeepSight: Long-Horizon World Modeling via Latent States Prediction for End-to-End Autonomous Driving
DeepSight improves end-to-end autonomous driving with a world model predicting long-horizon latent states and adaptive text reasoning.
VEGA: Visual Encoder Grounding Alignment for Spatially-Aware Vision-Language-Action Models
VEGA enhances VLA models' spatial reasoning by directly aligning their visual encoder outputs with 3D-aware features, improving robotic manipulation.
VISOR: A Vision-Language Model-based Test Oracle for Testing Robot
VISOR is a VLM-based test oracle that automates robot task assessment, replacing manual evaluation and quantifying task correctness and quality.
Neuromorphic Reinforcement Learning for Quadruped Locomotion Control on Uneven Terrain
This paper introduces a neuromorphic reinforcement learning framework using equilibrium propagation for quadruped locomotion on uneven terrain, enabling on-robot adaptation.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.