GAViD: A Large-Scale Multimodal Dataset for Context-Aware Group Affect Recognition from Videos
Deepak Kumar, Abhishek Pratap Singh, Puneet Kumar, Xiaobai Li, Balasubramanian Raman
TLDR
GAViD is a new large-scale multimodal dataset with 5091 videos and a novel CAGNet model for context-aware group affect recognition.
Key contributions
- Introduces GAViD, a large-scale multimodal dataset with 5091 video clips for group affect recognition.
- GAViD includes video, audio, and context, annotated with valence, emotions, and action cues.
- Presents CAGNet, a novel network for context-aware group affect recognition, achieving 63.20% accuracy.
- Dataset and code are publicly available to foster research in context-aware group affect.
Why it matters
This paper addresses the critical lack of large-scale, context-rich datasets for group affect recognition. GAViD and CAGNet provide essential resources and a strong baseline for advancing research in understanding complex human-human interactions, crucial for developing more sophisticated AI systems.
Original Abstract
Understanding affective dynamics in real-world social systems is fundamental to modeling and analyzing human-human interactions in complex environments. Group affect emerges from intertwined human-human interactions, contextual influences, and behavioral cues, making its quantitative modeling a challenging computational social systems problem. However, computational modeling of group affect in in-the-wild scenarios remains challenging due to limited large-scale annotated datasets and the inherent complexity of multimodal social interactions shaped by contextual and behavioral variability. The lack of comprehensive datasets annotated with multimodal and contextual information further limits advances in the field. To address this, we introduce the Group Affect from ViDeos (GAViD) dataset, comprising 5091 video clips with multimodal data (video, audio and context), annotated with ternary valence and discrete emotion labels and enriched with VideoGPT-generated contextual metadata and human-annotated action cues. We also present Context-Aware Group Affect Recognition Network (CAGNet) for multimodal context-aware group affect recognition. CAGNet achieves 63.20\% test accuracy on GAViD, comparable to state-of-the-art performance. The dataset and code are available at github.com/deepakkumar-iitr/GAViD.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.