mdok-style at SemEval-2026 Task 9: Finetuning LLMs for Multilingual Polarization Detection
Dominik Macko, Alok Debnath, Jakub Simko
TLDR
This paper describes finetuning LLMs with QLoRA and augmented data for robust multilingual polarization detection in SemEval-2026 Task 9.
Key contributions
- Finetunes mid-size LLMs for multilingual polarization detection.
- Applies QLoRA for parameter-efficient finetuning.
- Augments training data with anonymized, cased, and homoglyphied versions.
- Addresses SemEval-2026 Task 9 across 22 languages.
Why it matters
Online polarization leads to hate speech and social fragmentation. Early detection is crucial for safer online spaces. This work provides a robust method using LLMs to address this critical issue.
Original Abstract
SemEval-2026 Task 9 is focused on multilingual polarization detection. Specifically, it covers the identification of multilingual, multicultural and multievent polarization along three axes (in subtasks), namely detection, type, and manifestation. Online polarization presents a concern, because it is often followed by hate speech, offensive discourse, and social fragmentation. Therefore, its detection before it escalates is crucial for a safer and more inclusive online space. We have coped with this SemEval task by finetuning mid-size LLMs for the sequence-classification task using the QLoRA parameter-efficient finetuning technique. The training data augmented the multilingual (22 languages) training sets by anonymized, lower-cased, upper-cased, and homoglyphied counterparts, making the detection more robust.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.