Dynamic Context Evolution for Scalable Synthetic Data Generation
TLDR
Dynamic Context Evolution (DCE) prevents LLM output diversity loss (mode collapse) by dynamically evolving prompts and filtering repetitive content.
Key contributions
- Introduces Dynamic Context Evolution (DCE) to prevent LLM cross-batch mode collapse.
- Uses verbalized tail sampling for model self-assessment and filtering obvious ideas.
- Employs semantic memory and adaptive prompt evolution for diversity across batches.
- Achieves 0.0% mode collapse and reliably richer conceptual structure at low cost.
Why it matters
This paper addresses the critical problem of "mode collapse" in LLM-generated synthetic data, where models produce repetitive outputs. Dynamic Context Evolution (DCE) offers a principled, cost-effective solution to maintain output diversity. It significantly improves the richness and novelty of generated content without requiring fine-tuning.
Original Abstract
Large language models produce repetitive output when prompted independently across many batches, a phenomenon we term cross-batch mode collapse: the progressive loss of output diversity when a language model is prompted repeatedly without access to its prior generations. Practitioners have long mitigated this with ad hoc deduplication and seed rotation, but no principled framework exists. We introduce Dynamic Context Evolution (DCE), comprising three mechanisms: (1) verbalized tail sampling (the model labels each idea with a guess about how obvious it is, and obvious ideas are discarded), which filters high-probability candidates via model self-assessment; (2) semantic memory, which maintains a persistent embedding index to reject near-duplicates across batches; and (3) adaptive prompt evolution, which reconstructs the generation prompt each batch using memory state and rotating diversity strategies. In experiments across three domains (sustainable packaging concepts, educational exam questions, and creative writing prompts) and two model families (gpt-5-mini and claude-haiku-4-5), a component ablation across 2-3 random seeds per method shows that DCE achieves 0.0 +/- 0.0% collapse versus 5.6 +/- 2.0% for naive prompting, while producing 17-18 HDBSCAN clusters per seed versus naive's volatile 2-17, indicating reliably richer conceptual structure. These results are validated with an independent embedding model (all-MiniLM-L6-v2) and hold across sensitivity sweeps of the VTS threshold tau and dedup threshold delta. Deduplication and prompt evolution are individually insufficient but jointly effective, at approximately $0.50 per 1,000 candidates using only standard API calls, with no fine-tuning or custom architectures required.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.