Fine-tuning with Hierarchical Prompting for Robust Propaganda Classification Across Annotation Schemas

May 13, 20262605.13663

Lukas Stähelin, Veronika Solopova, Max Upravitelev, David Kaplan, Ariana Sahitaj + 4 more

cs.CLcs.CY

TLDR

A new intent-focused propaganda taxonomy and hierarchical prompting (HiPP) significantly improve robust propaganda classification, especially after fine-tuning.

Key contributions

Introduces HQP, a new intent-focused propaganda taxonomy and dataset for robust detection.
Proposes Hierarchical Prompting (HiPP) for improved propaganda classification, especially after fine-tuning.
Shows fine-tuning is crucial, transforming weak zero-shot baselines into competitive classification systems.
Evaluates LMs across schemas, finding Qwen models strongest and Phi-4 14B outperforming GPT-4.1-nano.

Why it matters

Propaganda detection is vital for combating misinformation. This paper provides a new dataset and a robust method (HiPP) that significantly improves classification, especially on complex, ambiguous propaganda. Its findings on fine-tuning and model performance offer crucial insights for developing more effective real-world detection systems.

Original Abstract

Propaganda detection in social media is challenging due to noisy, short texts and low annotation agreements. We introduce a new intent-focused taxonomy of propaganda techniques and compare it against an established, higher-agreement schema. Along three dimensions (model portfolio, schema effects, and prompting strategy) we evaluate the taxonomies as a classification task with the help of four language models (GPT-4.1-nano, Phi-4 14B, Qwen2.5-14B, Qwen3-14B). Our results show that fine-tuning is essential, since it transforms weak zero-shot baselines into competitive systems and reveals methodological differences that are hidden using base models. Across schemas, the Qwen models achieve the strongest overall performance, and Phi-4 14B consistently outperforms GPT-4.1-nano. Our hierarchical prompting method (HiPP), which predicts fine-grained techniques before aggregating them, is especially beneficial after fine-tuning and on the more ambiguous, low-agreement taxonomy, while remaining competitive on the simpler schema. The HQP dataset, annotated with the new intent-based labels, provides a richer lens on propaganda's strategic goals and a challenging benchmark for future work on robust, real-world detection.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers