ArXiv TLDR

neuralCAD-Edit: An Expert Benchmark for Multimodal-Instructed 3D CAD Model Editing

🐦 Tweet
2604.16170

Toby Perrett, Matthew Bouchard, William McCarthy

cs.CVcs.CE

TLDR

neuralCAD-Edit is a new benchmark for 3D CAD model editing, using multimodal expert instructions to evaluate foundation models against human performance.

Key contributions

  • Introduces neuralCAD-Edit, the first expert-collected benchmark for 3D CAD model editing.
  • Collects realistic multimodal editing requests (video, speech, gestures) from professional designers.
  • Benchmarks leading foundation models against human experts, revealing a large performance gap.
  • Shows even top models like GPT 5.2 are 53% less accepted than human edits in trials.

Why it matters

This paper introduces a critical benchmark for 3D CAD editing, addressing the gap in realistic, multimodal evaluation. It highlights the significant limitations of current foundation models compared to human experts. This work provides a vital foundation for advancing AI in complex CAD design.

Original Abstract

We introduce neuralCAD-Edit, the first benchmark for editing 3D CAD models collected from expert CAD engineers. Instead of text conditioning as in prior works, we collect realistic CAD editing requests by capturing videos of professional designers, interacting directly with CAD models in CAD software, while talking, pointing and drawing. We recruited ten consenting designers to contribute to this contained study. We benchmark leading foundation models against human CAD experts carrying out edits, and find a large performance gap in both automatic metrics and human evaluations. Even the best foundation model (GPT 5.2) scores 53% lower (absolute) than CAD experts in human acceptance trials, demonstrating the challenge of neuralCAD-Edit. We hope neuralCAD-Edit will provide a solid foundation against which 3D CAD editing approaches and foundation models can be developed. Code/data: https://autodeskailab.github.io/neuralCAD-Edit

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.