mdok-style at SemEval-2026 Task 9: Finetuning LLMs for Multilingual Polarization Detection

May 4, 20262605.02695

Dominik Macko, Alok Debnath, Jakub Simko

cs.CLcs.AI

TLDR

This paper describes finetuning LLMs with QLoRA and augmented data for robust multilingual polarization detection in SemEval-2026 Task 9.

Key contributions

Finetunes mid-size LLMs for multilingual polarization detection.
Applies QLoRA for parameter-efficient finetuning.
Augments training data with anonymized, cased, and homoglyphied versions.
Addresses SemEval-2026 Task 9 across 22 languages.

Why it matters

Online polarization leads to hate speech and social fragmentation. Early detection is crucial for safer online spaces. This work provides a robust method using LLMs to address this critical issue.

Original Abstract

SemEval-2026 Task 9 is focused on multilingual polarization detection. Specifically, it covers the identification of multilingual, multicultural and multievent polarization along three axes (in subtasks), namely detection, type, and manifestation. Online polarization presents a concern, because it is often followed by hate speech, offensive discourse, and social fragmentation. Therefore, its detection before it escalates is crucial for a safer and more inclusive online space. We have coped with this SemEval task by finetuning mid-size LLMs for the sequence-classification task using the QLoRA parameter-efficient finetuning technique. The training data augmented the multilingual (22 languages) training sets by anonymized, lower-cased, upper-cased, and homoglyphied counterparts, making the detection more robust.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers