Zeitgeist-Aware Multimodal (ZAM) Datasets of Pro-Eating Disorder Short-Form Videos: An Idea Worth Researching
Eden Shaveet, Zefan Sramek, Yumi Hamamoto, Jing Du, Scott Griffiths + 6 more
TLDR
This paper proposes Zeitgeist-Aware Multimodal (ZAM) datasets to improve real-time detection of evolving pro-eating disorder content in short-form videos.
Key contributions
- Identifies limitations of current pro-ED content detection: text-only and unable to adapt to evolving online trends.
- Proposes "Zeitgeist-Aware Multimodal (ZAM) datasets" for real-time, expert-annotated pro-ED content.
- ZAM datasets' inclusion criteria continuously evolve with the "memetic zeitgeist" of online culture.
- Outlines the rationale, characteristics, and curation approaches for these dynamic multimodal datasets.
Why it matters
This paper addresses a critical gap in detecting harmful pro-eating disorder content online by proposing a dynamic, multimodal dataset approach. It offers a robust framework for researchers to track evolving online trends and improve moderation efforts, enhancing the safety of short-form video platforms.
Original Abstract
Objective: Reliable identification of pro-eating disorder (pro-ED) content online suffers from two pervasive problems: 1) existing methods predominantly rely on text-based signals, failing to capture the inherently multimodal nature of multimedia content; and 2) these methods struggle to keep pace with the rapid evolution of references, memes, terminology, and contextual cues that underlie this content. Together, these limitations point to a gap: the absence of an expert-annotated reference standard capable of supporting real-time research and robust multimodal detection model training for pro-ED content on short-form video platforms. Method: To address this, we propose "zeitgeist-aware" multimodal (ZAM) datasets: continuously curated collections of annotated multimodal pro-ED content with inclusion criteria that evolve alongside the memetic zeitgeist: the variable essence of what is considered pro-ED as new media and references come into the cultural zeitgeist and are absorbed and interpreted in online spaces. Results: We present a rationale for such datasets, define their core characteristics, outline approaches for their curation, and describe our progress toward that end. Discussion: This dataset and pipeline architecture may benefit researchers across several fields who are interested in how pro-ED sentiment is encoded and transmitted through short-form video content across time, including for the purpose of responsive moderation efforts.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.