Plasticity-Enhanced Multi-Agent Mixture of Experts for Dynamic Objective Adaptation in UAVs-Assisted Emergency Communication Networks
Wen Qiu, Zhiqiang He, Wei Zhao, Hiroshi Masui
TLDR
PE-MAMoE uses a plasticity-enhanced multi-agent mixture of experts to improve UAV communication network adaptation to dynamic user demands.
Key contributions
- Introduces PE-MAMoE, a multi-agent MoE framework for UAV-assisted emergency communication.
- Each UAV uses a sparsely gated MoE actor with a router selecting a single specialist per step.
- A non-parametric Phase Controller re-plasticizes policies via expert-only perturbations and parameter resets.
- Achieves 26.3% higher return, 12.8% increased capacity, and 75% fewer collisions in simulations.
Why it matters
Deep reinforcement learning policies struggle with plasticity loss in dynamic UAV communication networks, leading to poor adaptation. PE-MAMoE addresses this by re-plasticizing policies, significantly improving performance and reliability in emergency scenarios. This ensures UAVs can effectively restore connectivity.
Original Abstract
Unmanned aerial vehicles serving as aerial base stations can rapidly restore connectivity after disasters, yet abrupt changes in user mobility and traffic demands shift the quality of service trade-offs and induce strong non-stationarity. Deep reinforcement learning policies suffer from plasticity loss under such shifts, as representation collapse and neuron dormancy impair adaptation. We propose plasticity enhanced multi-agent mixture of experts (PE-MAMoE), a centralized training with decentralized execution framework built on multi-agent proximal policy optimization. PE-MAMoE equips each UAV with a sparsely gated mixture of experts actor whose router selects a single specialist per step. A non-parametric Phase Controller injects brief, expert-only stochastic perturbations after phase switches, resets the action log-standard-deviation, anneals entropy and learning rate, and schedules the router temperature, all to re-plasticize the policy without destabilizing safe behaviors. We derive a dynamic regret bound showing the tracking error scales with both environment variation and cumulative noise energy. In a phase-driven simulator with mobile users and 3GPP-style channels, PE-MAMoE improves normalized interquartile mean return by 26.3\% over the best baseline, increases served-user capacity by 12.8\%, and reduces collisions by approximately 75\%. Diagnostics confirm persistently higher expert feature rank and periodic dormant-neuron recovery at regime switches.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.