SAGE: Training-Free Semantic Evidence Composition for Edge-Cloud Inference under Hard Uplink Budgets
TLDR
SAGE is a training-free method for efficient edge-cloud inference, combining importance filtering and diversity sampling to overcome uplink budget limits.
Key contributions
- Standard attention-based content selection is limited under hard uplink budgets.
- Complementary, diverse content improves server accuracy more than just high-importance units.
- SAGE combines importance filtering with embedding-diversity sampling for efficient offloading.
- SAGE achieves 93% server ceiling accuracy on ImageNet-1K, transmitting <50% evidence units.
Why it matters
This paper addresses a critical challenge in edge-cloud inference: efficiently transmitting data under strict uplink budgets. By demonstrating the limitations of importance-only selection and proposing SAGE, it offers a principled, training-free solution. This improves offloaded accuracy significantly while reducing data transmission, making hybrid inference more practical.
Original Abstract
Edge-cloud hybrid inference offloads difficult inputs to a powerful remote model, but the uplink channel imposes hard per-request constraints on the number of bits that can be transmitted. We show that selecting transmitted content based solely on attention-based importance, the standard approach in collaborative inference, is inherently limited under hard budgets. Two findings support this claim. First, replacing high-importance units with low-importance but complementary ones improves server accuracy. This shows that what matters is not individual importance but how well the transmitted set covers diverse aspects of the input. Second, spatially uniform selection without any content information achieves competitive accuracy at moderate budgets. This confirms that spatial coverage alone carries independent value. Based on this analysis, we propose SAGE (Semantic Attention-Guided Evidence), a principled, training-free method that combines importance filtering with embedding-diversity sampling. SAGE achieves 93% of the server ceiling in offloaded accuracy while transmitting fewer than half of the available evidence units on ImageNet-1K, substantially outperforming importance-only composition.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.