ArXiv TLDR

SemEval-2026 Task 4: Narrative Story Similarity and Narrative Representation Learning

🐦 Tweet
2604.21782

Hans Ole Hatzel, Ekaterina Artemova, Haimo Paul Stiemer, Evelyn Gius, Chris Biemann

cs.CL

TLDR

SemEval-2026 Task 4 introduces NSNRL, a shared task for narrative story similarity and representation learning using a novel similarity definition.

Key contributions

  • Introduces SemEval-2026 Task 4 (NSNRL) for narrative story similarity and representation learning.
  • Defines narrative similarity as a binary classification problem with a novel, theory-compatible concept.
  • Created a dataset of over 1,000 story summary triples with robust annotations for evaluation.
  • Analyzed 71 system submissions, noting LLM ensembles excel in classification and pre-processed embeddings.

Why it matters

This paper presents a crucial shared task for advancing narrative understanding in AI. It provides a new benchmark dataset and a novel definition of narrative similarity, pushing the boundaries for automated story analysis and representation. The findings highlight current system strengths and areas for future research.

Original Abstract

We present the shared task on narrative similarity and narrative representation learning - NSNRL (pronounced "nass-na-rel"). The task operationalizes narrative similarity as a binary classification problem: determining which of two stories is more similar to an anchor story. We introduce a novel definition of narrative similarity, compatible with both narrative theory and intuitive judgment. Based on the similarity judgments collected under this concept, we also evaluate narrative embedding representations. We collected at least two annotations each for more than 1,000 story summary triples, with each annotation being backed by at least two annotators in agreement. This paper describes the sampling and annotation process for the dataset; further, we give an overview of the submitted systems and the techniques they employ. We received a total of 71 final submissions from 46 teams across our two tracks. In our triple-based classification setup, LLM ensembles make up many of the top-scoring systems, while in the embedding setup, systems with pre- and post-processing on pretrained embedding models perform about on par with custom fine-tuned solutions. Our analysis identifies potential headroom for improvement of automated systems in both tracks. The task website includes visualizations of embeddings alongside instance-level classification results for all teams.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.