ArXiv TLDR

SCENE: Recognizing Social Norms and Sanctioning in Group Chats

🐦 Tweet
2605.07823

Mateusz Jacniacki, Maksymilian Bilski

cs.CL

TLDR

SCENE is a new benchmark for evaluating LLMs' ability to recognize and adapt to implicit social norms and sanctions in group chats.

Key contributions

  • Introduces SCENE, a benchmark for LLM social norm recognition and adaptation in group chats.
  • SCENE uses non-roleplay scenarios with scripted personas to test norm violation and sanctioning.
  • Proposes metrics for LLM responsiveness to sanctions and norm adaptation from peer behavior.
  • Finds Claude Opus 4.7 and Gemini 3.1 Pro adapt better than open-weight models on SCENE.

Why it matters

This paper addresses a crucial gap in LLM evaluation: their ability to adapt to implicit social norms. It offers a new benchmark and metrics to assess how well LLMs navigate complex social dynamics, essential for developing more socially intelligent AI agents in human-centric online spaces.

Original Abstract

Online group chats are social spaces with implicit behavior patterns that, when broken, are often met with social sanctioning from the group. The ability and willingness of LLM-based agents to recognize and adapt to these norms remains mostly unexplored. We introduce SCENE, a social-interaction benchmark focused on implicit norms and social sanctioning in multi-party chat. SCENE generates plausible non-roleplay scenarios with scripted personas that follow a hidden norm, create opportunities for the subject agent to violate it, and sanction breaches when they occur. We further propose behavioral evaluation metrics for two functional adaptation abilities: responsiveness to negative sanctioning, and adapting norm from peers behavior. We evaluate six frontier and open-weight models on SCENE. Our results show that Claude Opus 4.7 and Gemini 3.1 Pro adapt to implicit norms significantly more than the evaluated open-weight models. SCENE contributes one benchmark in the direction of recent calls for dynamic, interactional evaluation of LLM social capabilities.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.