ArXiv TLDR

mdok-style at SemEval-2026 Task 9: Finetuning LLMs for Multilingual Polarization Detection

🐦 Tweet
2605.02695

Dominik Macko, Alok Debnath, Jakub Simko

cs.CLcs.AI

TLDR

This paper describes finetuning LLMs with QLoRA and augmented data for robust multilingual polarization detection in SemEval-2026 Task 9.

Key contributions

  • Finetunes mid-size LLMs for multilingual polarization detection.
  • Applies QLoRA for parameter-efficient finetuning.
  • Augments training data with anonymized, cased, and homoglyphied versions.
  • Addresses SemEval-2026 Task 9 across 22 languages.

Why it matters

Online polarization leads to hate speech and social fragmentation. Early detection is crucial for safer online spaces. This work provides a robust method using LLMs to address this critical issue.

Original Abstract

SemEval-2026 Task 9 is focused on multilingual polarization detection. Specifically, it covers the identification of multilingual, multicultural and multievent polarization along three axes (in subtasks), namely detection, type, and manifestation. Online polarization presents a concern, because it is often followed by hate speech, offensive discourse, and social fragmentation. Therefore, its detection before it escalates is crucial for a safer and more inclusive online space. We have coped with this SemEval task by finetuning mid-size LLMs for the sequence-classification task using the QLoRA parameter-efficient finetuning technique. The training data augmented the multilingual (22 languages) training sets by anonymized, lower-cased, upper-cased, and homoglyphied counterparts, making the detection more robust.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.