ArXiv TLDR

EvoPref: Multi-Objective Evolutionary Optimization Discovers Diverse LLM Alignments Beyond Gradient Descent

🐦 Tweet
2605.09777

Dongxin Guo, Jikun Wu, Siu Ming Yiu

cs.NEcs.AIcs.CLcs.LG

TLDR

EvoPref, a multi-objective evolutionary algorithm, discovers diverse LLM alignments, overcoming preference collapse in gradient-based methods.

Key contributions

  • EvoPref optimizes LLM alignments for helpfulness, harmlessness, and honesty using a multi-objective evolutionary algorithm.
  • Discovers significantly more diverse LLM alignments than gradient descent methods, avoiding preference collapse.
  • Improves preference coverage by 18% and reduces collapse rates by 47% while maintaining alignment quality.
  • Provides theoretical motivation for archive-based methods' effectiveness in escaping preference collapse.

Why it matters

This paper introduces EvoPref, a novel evolutionary approach to LLM alignment, directly addressing preference collapse in gradient-based methods. It offers a new principled paradigm for creating more robust and versatile LLMs by demonstrating superior diversity and collapse reduction, crucial for safer and more adaptable AI.

Original Abstract

Gradient-based preference optimization methods for large language model (LLM) alignment suffer from preference collapse, converging to narrow behavioral modes while neglecting preference diversity. We introduce EvoPref, a multi-objective evolutionary algorithm that maintains populations of Low-Rank Adaptation (LoRA) adapters optimized across helpfulness, harmlessness, and honesty objectives using Non-dominated Sorting Genetic Algorithm II (NSGA-II) selection with archive-based diversity preservation. Our primary contribution is demonstrating that population-based methods discover substantially more diverse alignments than gradient descent. On standard benchmarks, EvoPref improves preference coverage by 18% (median 82.5% vs. 70.0% for ORPO, $p<0.001$, Wilcoxon, $n=30$) and reduces collapse rates by 47% (11.0% vs. 20.6%, $p<0.001$), while achieving competitive alignment quality (median 75.5% RewardBench vs. 75.0% for ORPO, $p<0.05$). We provide theoretical motivation extending recent multi-objective evolutionary algorithm (MOEA) runtime analysis (Dang et al., 2025) suggesting why archive-based methods escape collapse more effectively than single-trajectory optimization. Comprehensive comparisons against MOEA/D, SMS-EMOA, CMA-ES, and gradient baselines (DPO, IPO, KTO, ORPO) with rigorous statistical testing (Friedman with Holm correction, Vargha-Delaney effect sizes, median with IQR) confirm that multi-objective selection with diversity preservation is essential. This work establishes evolutionary optimization as a principled paradigm for diverse LLM alignment.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.