On the Role of DAG topology in Energy-Aware Cloud Scheduling : A GNN-Based Deep Reinforcement Learning Approach
Anas Hattay, Fred Ngole Mboula, Eric Gascard, Zakaria Yahoun
TLDR
This paper shows GNN-based cloud schedulers fail in out-of-distribution conditions due to structural mismatches, impacting generalization.
Key contributions
- Investigates GNN-based DRL for energy-aware cloud scheduling of workflow DAGs.
- Pinpoints specific out-of-distribution (OOD) conditions causing GNN scheduler failures.
- Explains failures stem from structural mismatches disrupting GNN message passing.
- Emphasizes the need for robust GNN representations for reliable scheduling under shifts.
Why it matters
This paper is crucial for understanding the limitations of GNN-based schedulers in real-world cloud environments. It highlights the critical need for more robust representations to ensure reliable and generalizable scheduling performance under varying conditions.
Original Abstract
Cloud providers must assign heterogeneous compute resources to workflow DAGs while balancing competing objectives such as completion time, cost, and energy consumption. In this work, we study a single-workflow, queue-free scheduling setting and consider a graph neural network (GNN)-based deep reinforcement learning scheduler designed to minimize workflow completion time and energy usage. We identify specific out-of-distribution (OOD) conditions under which GNN-based deep reinforcement learning schedulers fail and provide a principled explanation of why these failures occur. Through controlled OOD evaluations, we demonstrate that performance degradation stems from structural mismatches between training and deployment environments, which disrupt message passing and undermine policy generalization. Our analysis exposes fundamental limitations of current GNN-based schedulers and highlights the need for more robust representations to ensure reliable scheduling performance under distribution shifts.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.