MATCHA: Efficient Deployment of Deep Neural Networks on Multi-Accelerator Heterogeneous Edge SoCs
Enrico Russo, Mohamed Amine Hamdi, Alessandro Ottaviano, Francesco Conti, Angelo Garofalo + 4 more
TLDR
MATCHA is a new framework for efficiently deploying deep neural networks on multi-accelerator heterogeneous edge SoCs, reducing inference latency.
Key contributions
- MATCHA is a unified framework for deploying DNNs on heterogeneous multi-accelerator SoCs.
- Generates highly concurrent schedules and optimizes L3/L2 memory via constraint programming.
- Employs pattern matching, tiling, and mapping for parallel execution and high utilization.
- Reduces DNN inference latency by up to 35% on MLPerf Tiny compared to state-of-the-art.
Why it matters
Existing DNN deployment frameworks struggle to fully utilize heterogeneous edge SoCs. MATCHA offers a novel approach to optimize scheduling and memory, leading to substantial performance gains. This enables more efficient and faster AI inference at the edge.
Original Abstract
Deploying DNNs on System-on-Chips (SoC) with multiple heterogeneous acceleration engines is challenging, and the majority of deployment frameworks cannot fully exploit heterogeneity. We present MATCHA, a unified DNN deployment framework that generates highly concurrent schedules for parallel, heterogeneous accelerators and uses constraint programming to optimize L3/L2 memory allocation and scheduling. Using pattern matching, tiling, and mapping across individual HW units enables parallel execution and high accelerator utilization. On the MLPerf Tiny benchmark, using a SoC with two heterogeneous accelerators, MATCHA improves accelerator utilization and reduces inference latency by up to 35% with respect to the the state-of-the-art MATCH compiler.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.