AnchorD: Metric Grounding of Monocular Depth Using Factor Graphs

May 4, 20262605.02667

Simon Dorer, Martin Büchner, Nick Heppert, Abhinav Valada

cs.ROcs.CV

TLDR

AnchorD grounds monocular depth predictions in metric units using factor graphs, improving accuracy for robotics without training.

Key contributions

Proposes AnchorD, a training-free framework for metric grounding of monocular depth.
Utilizes factor graph optimization for patch-wise affine alignment of depth priors.
Preserves fine-grained geometric structure and discontinuities in depth maps.
Introduces a new benchmark dataset with dense ground truth for non-Lambertian objects.

Why it matters

This paper addresses a critical challenge in robotics: accurately scaling monocular depth predictions. By providing a training-free method to ground these predictions in metric units, it significantly improves depth performance, especially for challenging surfaces. This enhances robotic manipulation and navigation without needing sensor-specific retraining.

Original Abstract

Dense and accurate depth estimation is essential for robotic manipulation, grasping, and navigation, yet currently available depth sensors are prone to errors on transparent, specular, and general non-Lambertian surfaces. To mitigate these errors, large-scale monocular depth estimation approaches provide strong structural priors, but their predictions can be potentially skewed or mis-scaled in metric units, limiting their direct use in robotics. Thus, in this work, we propose a training-free depth grounding framework that anchors monocular depth estimation priors from a depth foundation model in raw sensor depth through factor graph optimization. Our method performs a patch-wise affine alignment, locally grounding monocular predictions in metric real-world depth while preserving fine-grained geometric structure and discontinuities. To facilitate evaluation in challenging real-world conditions, we introduce a benchmark dataset with dense scene-wide ground truth depth in the presence of non-Lambertian objects. Ground truth is obtained via matte reflection spray and multi-camera fusion, overcoming the reliance on object-only CAD-based annotations used in prior datasets. Extensive evaluations across diverse sensors and domains demonstrate consistent improvements in depth performance without any (re-)training. We make our implementation publicly available at https://anchord.cs.uni-freiburg.de.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers