Good Agentic Friends Do Not Just Give Verbal Advice: They Can Update Your Weights

May 13, 20262605.13839

Wenrui Bao, Huan Wang, Jian Wang, Zhangyang Wang, Kai Wang + 1 more

cs.CL

TLDR

TFlow enables multi-agent LLMs to communicate via transient weight perturbations, boosting efficiency and accuracy over text-based methods.

Key contributions

Introduces TFlow, a framework for multi-agent LLM collaboration via weight-space communication.
Sender agents' activations are mapped to transient, low-rank LoRA perturbations for the receiver.
Boosts accuracy by up to 8.5% and reduces processed tokens by up to 83% over text-based agents.
Achieves up to 4.6x faster inference time while maintaining competitive accuracy.

Why it matters

This paper introduces a novel communication paradigm for multi-agent LLMs, moving beyond traditional text-based exchanges. By using weight perturbations, it addresses significant efficiency bottlenecks like token cost and memory. This approach opens new avenues for more scalable and performant collaborative AI systems.

Original Abstract

Multi-agent LLM systems usually collaborate by exchanging natural-language messages. This interface is simple and interpretable, but it forces each sender's intermediate computation to be serialized into tokens and then reprocessed by the receiver, thereby increasing the generated-token cost, prefill overhead, and KV-cache memory. We study an alternative communication interface: instead of appending a sender's message to the receiver's context, compile the sender's hidden states into a transient, receiver-specific weight perturbation. We introduce TFlow (Thought Flow), a weight-space communication framework for a known and fixed receiver architecture. For each query, frozen role-prompted sender agents process the input, and a learned parameter generator maps their internal activations into low-rank LoRA perturbations targeting the receiver's modules. These perturbations are fused and applied only during the receiver's generation, enabling instance-level adaptation without permanently changing the model or enlarging the receiver's text context. With three Qwen3-4B agents, TFlow improves over a standalone receiver by up to 8.5 accuracy points across five benchmarks while reducing processed tokens by up to 32.69%. Compared with a text-based three-agent baseline, it reduces total processed tokens by up to 83.27% and the wall-clock inference time by up to 4.6$\times$, while maintaining competitive accuracy on four of five benchmarks. These results suggest that transient low-rank weight perturbations can serve as an executable communication medium for efficient multi-agent LLM collaboration.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers