Evolutionary Token-Level Prompt Optimization for Diffusion Models
Domício Pereira Neto, João Correia, Penousal Machado
TLDR
This paper introduces a Genetic Algorithm for token-level prompt optimization in diffusion models, outperforming baselines in aesthetic quality and alignment.
Key contributions
- Proposes a Genetic Algorithm (GA) to directly evolve token vectors for diffusion model prompt optimization.
- Optimizes a fitness function combining aesthetic quality (LAION Aesthetic Predictor) and prompt-image alignment (CLIPScore).
- Achieves up to 23.93% better fitness than Promptist and random search on 36 Parti Prompts.
- Offers a modular and adaptable framework for other image generation models with tokenized text encoders.
Why it matters
Manual prompt engineering for diffusion models is inefficient. This paper offers an automated, systematic approach using a Genetic Algorithm to optimize token vectors. This significantly improves image quality and prompt alignment, making diffusion models more accessible and practical.
Original Abstract
Text-to-image diffusion models exhibit strong generative performance but remain highly sensitive to prompt formulation, often requiring extensive manual trial and error to obtain satisfactory results. This motivates the development of automated, model-agnostic prompt optimization methods that can systematically explore the conditioning space beyond conventional text rewriting. This work investigates the use of a Genetic Algorithm (GA) for prompt optimization by directly evolving the token vectors employed by CLIP-based diffusion models. The GA optimizes a fitness function that combines aesthetic quality, measured by the LAION Aesthetic Predictor V2, with prompt-image alignment, assessed via CLIPScore. Experiments on 36 prompts from the Parti Prompts (P2) dataset show that the proposed approach outperforms the baseline methods, including Promptist and random search, achieving up to a 23.93% improvement in fitness. Overall, the method is adaptable to image generation models with tokenized text encoders and provides a modular framework for future extensions, the limitations and prospects of which are discussed.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.