ArXiv TLDR

Grounded Satirical Generation with RAG

🐦 Tweet
2605.10853

Oona Itkonen, Yuxin Su, Linyao Du, Ona De Gibert

cs.CL

TLDR

This paper introduces a RAG-based pipeline for grounded satire generation, finding it improves political relevance but not humor.

Key contributions

  • Presents a novel RAG pipeline for generating grounded satirical dictionary definitions from news.
  • Introduces a new human-annotated dataset and evaluation framework for satire generation.
  • Finds RAG and topic selection improve political relevance, but not humor, in generated satire.
  • Demonstrates LLMs correlate with human political relevance judgments, but perform poorly on humor.

Why it matters

This paper tackles the challenge of generating context-aware satire with LLMs, showing that while RAG improves political relevance, humor remains elusive. It provides a valuable dataset and evaluation framework, offering crucial insights into LLM limitations for nuanced humor generation.

Original Abstract

Humor generation remains challenging task for Large Language Models (LLMs), due to their subjective nature. We focus on satire, a form of humor strongly shaped by context. In this work, we present a novel pipeline for grounded satire generation that uses Retrieval-Augmented Generation (RAG) over current news to produce satirical dictionary definitions in the Finnish context. We also introduce a new task-specific evaluation framework and annotate 100 generated definitions with six human annotators, enabling analysis across multiple experimental conditions, including cultural background, source-word type, and the presence or absence of RAG. Our results show that the generated definitions are perceived as more political than humorous. Both topic-based word selection and RAG improve the political relevance of the outputs, but neither yields clear gains in humor generation. In addition, our LLM-as-a-judge evaluation of five state-of-the-art models indicates that LLMs correlate well with human judgments on political relevance, but perform poorly on humor. We release our code and annotated dataset to support further research on grounded satire generation and evaluation.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.