SpaceDex: Generalizable Dexterous Grasping in Tiered Workspaces

April 20, 20262604.17888

Wensheng Wang, Chuanjun Guo, Wei Wei, Tong Wu, Ning Tan

cs.RO

TLDR

SpaceDex is a hierarchical framework that improves dexterous grasping in constrained, occluded tiered workspaces by decoupling arm and hand control.

Key contributions

Introduces SpaceDex, a hierarchical framework for dexterous grasping in complex, tiered workspaces.
High-level VLM planner provides structured spatial guidance for target objects in occluded scenes.
Low-level Feature Separation Network decouples arm navigation from hand articulation for robust control.
Achieves 63% success on 30 unseen objects in real-world tiered environments, outperforming baselines.

Why it matters

Dexterous grasping in complex, real-world environments with occlusions and tight spaces is a major challenge for robotics. SpaceDex offers a novel hierarchical approach that significantly improves success rates in these difficult scenarios. This advancement brings us closer to deploying robots in unstructured human environments.

Original Abstract

Generalizable grasping with high-degree-of-freedom (DoF) dexterous hands remains challenging in tiered workspaces, where occlusion, narrow clearances, and height-dependent constraints are substantially stronger than in open tabletop scenes. Most existing methods are evaluated in relatively unoccluded settings and typically do not explicitly model the distinct control requirements of arm navigation and hand articulation under spatial constraints. We present SpaceDex, a hierarchical framework for dexterous manipulation in constrained 3D environments. At the high level, a Vision-Language Model (VLM) planner parses user intent, reasons about occlusion and height relations across multiple camera views, and generates target bounding boxes for zero-shot segmentation and mask tracking. This stage provides structured spatial guidance for downstream control instead of relying on single-view target selection. At the low level, we introduce an arm-hand Feature Separation Network that decouples global trajectory control for the arm from geometry-aware grasp mode selection for the hand, reducing feature interference between reaching and grasping objectives. The controller further integrates multi-view perception, fingertip tactile sensing, and a small set of recovery demonstrations to improve robustness to partial observability and off-nominal contacts. In 100 real-world trials involving over 30 unseen objects across four categories, SpaceDex achieves a 63.0\% success rate, compared with 39.0\% for a strong tabletop baseline. These results indicate that combining hierarchical spatial planning with arm-hand representation decoupling improves dexterous grasping performance in spatially constrained environments.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers