DGPO: Beyond Pairwise Preferences with Directional Consistent Groupwise Optimization

May 11, 20262605.10863

Mengyi Deng, Zhiwei Li, Xin Li, Tingyu Zhu, Yulan Yuan + 2 more

cs.CL

TLDR

DGPO is a new preference optimization method for LLMs that improves directional consistency and reasoning diversity using group-wise, multi-candidate comparisons.

Key contributions

Introduces DGPO, a lightweight framework for LLM preference optimization using group-level supervision.
Explicitly models direction-aware alignment through multi-candidate comparisons, enhancing reasoning consistency.
Optimizes a margin-based likelihood objective on structured forward and reverse question-answer sets.
Delivers up to 3.6% average accuracy improvement across diverse datasets and LLM families.

Why it matters

Current LLM preference optimization methods often lack directional consistency and reasoning diversity. DGPO addresses this by introducing a novel group-wise approach. This method significantly improves LLM alignment and consistency, offering a more robust way to train models.

Original Abstract

Although Large Language Models (LLMs) have made remarkable progress, current preference optimization methods still struggle to align directional consistency while preserving reasoning diversity. To address this limitation, we propose Directional-Groupwise Preference Optimization (DGPO), a lightweight framework that aggregates supervision signals at the group level and explicitly models direction-aware alignment through multi-candidate comparisons. DGPO organizes forward and reverse question-answer instances into structured sets and optimizes a margin-based likelihood objective that separates coherent reasoning paths from inconsistent alternatives. This group-wise formulation captures richer relative information than pairwise objectives and reinforces consistency across diverse reasoning pathways. Empirical results show that our constructed reverse data yields a 3.2% average improvement across five benchmarks, while DGPO further delivers consistent gains across multiple datasets and model families, achieving average accuracy improvements of up to 3.6%.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers