ArXiv TLDR

MAPRPose: Mask-Aware Proposal and Amodal Refinement for Multi-Object 6D Pose Estimation

🐦 Tweet
2604.20650

Yang Luo, Yan Gong, Yongsheng Gao, Xiaoying Sun, Jie Zhao

cs.CV

TLDR

MAPRPose improves 6D object pose estimation in cluttered scenes using mask-aware proposals and amodal refinement, achieving SOTA accuracy and speed.

Key contributions

  • Proposes MAPRPose, a two-stage framework for 6D pose estimation using mask-aware proposals and amodal refinement.
  • Mask-Aware Pose Proposal (MAPP) stage generates robust pose hypotheses from 2D-3D keypoint correspondences.
  • Amodal Mask Prediction and ROI Re-Alignment (AMPR) refines poses by reconstructing full object geometry.
  • Achieves state-of-the-art 76.5% AR on BOP benchmark, outperforming FoundationPose by 3.1% with 43x speedup.

Why it matters

This paper significantly advances 6D object pose estimation, a critical task for robotics and AR, by robustly handling severe occlusion. Its novel two-stage approach delivers state-of-the-art accuracy and dramatically faster inference, making it practical for real-world applications.

Original Abstract

6D object pose estimation in cluttered scenes remains challenging due to severe occlusion and sensor noise. We propose MAPRPose, a two-stage framework that leverages mask-aware correspondences for pose proposal and amodal-driven Region-of-Interest (ROI) prediction for robust refinement. In the Mask-Aware Pose Proposal (MAPP) stage, we lift 2D correspondences into 3D space to establish reliable keypoint matches and generate geometrically consistent pose hypotheses based on correspondence-level scoring, from which the top-$K$ candidates are selected. In the refinement stage, we introduce a tensorized render-and-compare pipeline integrated with an Amodal Mask Prediction and ROI Re-Alignment (AMPR) module. By reconstructing complete object geometry and dynamically adjusting the ROI, AMPR mitigates localization errors and spatial misalignment under heavy occlusion. Furthermore, our GPU-accelerated RGB-XYZ reprojection enables simultaneous refinement of all $N \times B$ pose hypotheses in a single forward pass. Evaluated on the BOP benchmark, MAPRPose achieves a state-of-the-art Average Recall (AR) of 76.5%, outperforming FoundationPose by 3.1% AR while delivering a 43x speedup in multi-object inference.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.