ArXiv TLDR

PASR: Pose-Aware 3D Shape Retrieval from Occluded Single Views

🐦 Tweet
2604.22658

Jiaxin Shi, Guofeng Zhang, Wufei Ma, Naifu Liang, Adam Kortylewski + 1 more

cs.CV

TLDR

PASR introduces a pose-aware analysis-by-synthesis framework for robust 3D shape retrieval from occluded single views, outperforming prior methods.

Key contributions

  • Proposes PASR, a novel analysis-by-synthesis framework for 3D shape retrieval from single views.
  • Distills knowledge from DINOv3 into a 3D encoder, aligning pose-conditioned 3D projections with 2D features.
  • Employs test-time optimization to jointly search for shape and pose, robust to occlusion and fine details.
  • Achieves state-of-the-art performance on clean and occluded datasets, with multi-task capabilities.

Why it matters

PASR significantly improves 3D shape retrieval from occluded single views. Its analysis-by-synthesis approach, leveraging 2D foundation models, offers superior robustness, interpretability, and multi-task capabilities for real-world applications.

Original Abstract

Single-view 3D shape retrieval is a fundamental yet challenging task that is increasingly important with the growth of available 3D data. Existing approaches largely fall into two categories: those using contrastive learning to map point cloud features into existing vision-language spaces and those that learn a common embedding space for 2D images and 3D shapes. However, these feed-forward, holistic alignments are often difficult to interpret, which in turn limits their robustness and generalization to real-world applications. To address this problem, we propose Pose-Aware 3D Shape Retrieval (PASR), a framework that formulates retrieval as a feature-level analysis-by-synthesis problem by distilling knowledge from a 2D foundation model (DINOv3) into a 3D encoder. By aligning pose-conditioned 3D projections with 2D feature maps, our method bridges the gap between real-world images and synthetic meshes. During inference, PASR performs a test-time optimization via analysis-by-synthesis, jointly searching for the shape and pose that best reconstruct the patch-level feature map of the input image. This synthesis-based optimization is inherently robust to partial occlusion and sensitive to fine-grained geometric details. PASR substantially outperforms existing methods on both clean and occluded 3D shape retrieval datasets by a wide margin. Additionally, PASR demonstrates strong multi-task capabilities, achieving robust shape retrieval, competitive pose estimation, and accurate category classification within a single framework.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.