Towards Unconstrained Human-Object Interaction

April 15, 20262604.14069

Francesco Tonini, Alessandro Conti, Lorenzo Vaquero, Cigdem Beyan, Elisa Ricci

cs.CV

TLDR

This paper introduces Unconstrained Human-Object Interaction (U-HOI), leveraging MLLMs to detect interactions without predefined vocabularies.

Key contributions

Defines Unconstrained HOI (U-HOI), removing the need for predefined interaction vocabularies.
Applies Multimodal Large Language Models (MLLMs) to enable in-the-wild HOI detection.
Introduces a pipeline for U-HOI, including test-time inference and language-to-graph conversion.
Evaluates MLLMs on U-HOI, demonstrating their value over traditional HOI detectors.

Why it matters

Current HOI models are limited by static, predefined interaction vocabularies. This work addresses a critical limitation in human-object interaction detection by enabling models to recognize novel interactions in dynamic, real-world settings. It paves the way for more flexible and robust computer vision systems.

Original Abstract

Human-Object Interaction (HOI) detection is a longstanding computer vision problem concerned with predicting the interaction between humans and objects. Current HOI models rely on a vocabulary of interactions at training and inference time, limiting their applicability to static environments. With the advent of Multimodal Large Language Models (MLLMs), it has become feasible to explore more flexible paradigms for interaction recognition. In this work, we revisit HOI detection through the lens of MLLMs and apply them to in-the-wild HOI detection. We define the Unconstrained HOI (U-HOI) task, a novel HOI domain that removes the requirement for a predefined list of interactions at both training and inference. We evaluate a range of MLLMs on this setting and introduce a pipeline that includes test-time inference and language-to-graph conversion to extract structured interactions from free-form text. Our findings highlight the limitations of current HOI detectors and the value of MLLMs for U-HOI. Code will be available at https://github.com/francescotonini/anyhoi

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers