ArXiv TLDR

Asset Harvester: Extracting 3D Assets from Autonomous Driving Logs for Simulation

🐦 Tweet
2604.18468

Tianshi Cao, Jiawei Ren, Yuxuan Zhang, Jaewoo Seo, Jiahui Huang + 10 more

cs.CVcs.AIcs.GRcs.LG

TLDR

Asset Harvester extracts complete 3D object assets from sparse autonomous driving logs, enabling scalable simulation for AV development.

Key contributions

  • Converts sparse, in-the-wild object observations from AV logs into complete, simulation-ready 3D assets.
  • System-level design integrates large-scale data curation, geometry-aware preprocessing, and robust training.
  • Introduces SparseViewDiT, designed to handle limited-angle views and real-world AV data challenges.
  • Enables scalable conversion of sparse AV object observations into reusable 3D assets for simulation.

Why it matters

Current neural scene reconstruction lacks complete 3D object assets for AV simulation. Asset Harvester fills this gap by providing reusable 3D models from real driving logs. This enhances agent manipulation and novel-view synthesis, crucial for scalable AV testing and development.

Original Abstract

Closed-loop simulation is a core component of autonomous vehicle (AV) development, enabling scalable testing, training, and safety validation before real-world deployment. Neural scene reconstruction converts driving logs into interactive 3D environments for simulation, but it does not produce complete 3D object assets required for agent manipulation and large-viewpoint novel-view synthesis. To address this challenge, we present Asset Harvester, an image-to-3D model and end-to-end pipeline that converts sparse, in-the-wild object observations from real driving logs into complete, simulation-ready assets. Rather than relying on a single model component, we developed a system-level design for real-world AV data that combines large-scale curation of object-centric training tuples, geometry-aware preprocessing across heterogeneous sensors, and a robust training recipe that couples sparse-view-conditioned multiview generation with 3D Gaussian lifting. Within this system, SparseViewDiT is explicitly designed to address limited-angle views and other real-world data challenges. Together with hybrid data curation, augmentation, and self-distillation, this system enables scalable conversion of sparse AV object observations into reusable 3D assets.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.