ArXiv TLDR

Decomposing Evolutionary Mixture-of-LoRA Architectures: The Routing Lever, the Lifecycle Penalty, and a Substrate-Conditional Boundary

🐦 Tweet
2605.11153

Ramchand Kumaresan

cs.CLcs.LGcs.NE

TLDR

This paper decomposes an evolutionary Mixture-of-LoRA system, finding that router improvements, not the evolutionary lifecycle, drive performance gains.

Key contributions

  • Decomposes an evolutionary Mixture-of-LoRA system into router, evaluation scope, and lifecycle factors.
  • Shows the router rewrite, not the evolutionary lifecycle, drives all observed performance gains (+0.0426 nat).
  • Identifies the evolutionary lifecycle as a net performance drag (-0.028 nats) in the primary chain.
  • Reveals evolutionary routing search is only effective when adapters are pre-aligned to the task.

Why it matters

This paper critically evaluates evolutionary Mixture-of-LoRA systems, pinpointing that router design, not complex evolutionary lifecycles, is the key to performance. It offers vital insights for developing more efficient and effective adaptive models by focusing design efforts where they truly matter.

Original Abstract

We decompose an evolutionary mixture-of-LoRA system on a from-scratch ~150M-parameter widened-D substrate (D=1536, V=32000; D/V approx 0.048; the "widened-1536" substrate) into three factors -- a router rewrite (parallel sigmoid gate with learnable per-adapter floor and bounded temperature anneal, fed post-stack hidden states rather than token-embedding means), a per-domain leave-one-out evaluation scope, and a lifecycle of death plus alpha-blend inheritance plus SVD mutation plus slot reallocation -- and report a 5-of-8 partial 2^3 factorial run at n=3 seeds and 25000 adaptation steps per cell. The attribution chain is sharp on this substrate: the router rewrite carries the entire +0.0426 nat balanced log-PPL improvement (Delta = log PPL_ref - log PPL_test, positive = improvement; t=12.86, p=0.006) attributed to "the full evolutionary system vs the static B3 baseline"; the headline full-system-vs-B3 balanced contrast itself is +0.015 nats, t=1.94, p=0.19 at n=3 and does not clear alpha=0.05. The per-domain evaluation scope is null at seed-resolution, and the lifecycle is a net drag of approx -0.028 nats (t=-4.46,p=0.047 in the primary chain). An auxiliary alpha=0 inheritance counterfactual at n=3 seeds is sign-inconsistent at the headline metric and underpowered for either an equivalence or load-bearing conclusion (corrected from an earlier arithmetic-mean aggregator that erroneously cleared inheritance; see Appendix B.11). A base-perturbation probe directionally refutes a "genomic-context" reframe of the lifecycle role. A controllable synthetic sandbox locates a substrate-conditional regime boundary: evolutionary search on the routing channel is load-bearing only when adapters are pre-aligned to the task; in every other regime tested it underperforms, ties, or actively degrades the gradient solution.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.