GAViD: A Large-Scale Multimodal Dataset for Context-Aware Group Affect Recognition from Videos

April 17, 20262604.16214

Deepak Kumar, Abhishek Pratap Singh, Puneet Kumar, Xiaobai Li, Balasubramanian Raman

cs.CV

TLDR

GAViD is a new large-scale multimodal dataset with 5091 videos and a novel CAGNet model for context-aware group affect recognition.

Key contributions

Introduces GAViD, a large-scale multimodal dataset with 5091 video clips for group affect recognition.
GAViD includes video, audio, and context, annotated with valence, emotions, and action cues.
Presents CAGNet, a novel network for context-aware group affect recognition, achieving 63.20% accuracy.
Dataset and code are publicly available to foster research in context-aware group affect.

Why it matters

This paper addresses the critical lack of large-scale, context-rich datasets for group affect recognition. GAViD and CAGNet provide essential resources and a strong baseline for advancing research in understanding complex human-human interactions, crucial for developing more sophisticated AI systems.

Original Abstract

Understanding affective dynamics in real-world social systems is fundamental to modeling and analyzing human-human interactions in complex environments. Group affect emerges from intertwined human-human interactions, contextual influences, and behavioral cues, making its quantitative modeling a challenging computational social systems problem. However, computational modeling of group affect in in-the-wild scenarios remains challenging due to limited large-scale annotated datasets and the inherent complexity of multimodal social interactions shaped by contextual and behavioral variability. The lack of comprehensive datasets annotated with multimodal and contextual information further limits advances in the field. To address this, we introduce the Group Affect from ViDeos (GAViD) dataset, comprising 5091 video clips with multimodal data (video, audio and context), annotated with ternary valence and discrete emotion labels and enriched with VideoGPT-generated contextual metadata and human-annotated action cues. We also present Context-Aware Group Affect Recognition Network (CAGNet) for multimodal context-aware group affect recognition. CAGNet achieves 63.20\% test accuracy on GAViD, comparable to state-of-the-art performance. The dataset and code are available at github.com/deepakkumar-iitr/GAViD.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers