Towards Highly-Constrained Human Motion Generation with Retrieval-Guided Diffusion Noise Optimization
Hanchao Liu, Fang-Lue Zhang, Shining Zhang, Tai-Jiang Mu, Shi-Min Hu
TLDR
This paper introduces a retrieval-guided diffusion noise optimization method to generate human motion under highly challenging spatiotemporal constraints.
Key contributions
- Solves highly constrained human motion generation tasks, like navigating severe obstacles or specific steps.
- Uses a retrieval-guided, training-free diffusion noise optimization framework.
- Employs relational task parsing to identify and handle difficult constraints with retrieved references.
- Improves diffusion noise initialization using a reward-guided mask combining random and retrieved noise.
Why it matters
This paper is critical for advancing controllable character animation and virtual agent behavior synthesis. It addresses a major gap by enabling motion generators to handle extremely challenging spatiotemporal restrictions, which current methods fail to do. This allows for more precise and intelligent control over generated human movements.
Original Abstract
Generating human motion that satisfies customized zero-shot goal functions, enabling applications such as controllable character animation and behavior synthesis for virtual agents, is a critical capability. While current approaches handle many unseen constraints, they fail on tasks with very challenging spatiotemporal restrictions, such as severe spatial obstacles or specified numbers of walking steps. To equip motion generators for these highly constrained tasks, we present a retrieval-guided method built on the training-free diffusion noise optimization framework. The key idea is to search within large motion datasets for guidance that can potentially satisfy difficult constraints. We introduce relational task parsing to group target constraints and identify the difficult ones to be handled by retrieved reference. A better initialization for diffusion noise is then obtained via a reward-guided mask that combines random noise with retrieved noise. By optimizing diffusion noise from this improved initialization, we successfully solve highly constrained generation tasks. By leveraging LLM for relational task parsing, the whole framework is further enabled to automatically reason for what to retrieve, improving the intelligence of moving agents under a training-free optimization scheme.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.