ArXiv TLDR

MATCHA: Efficient Deployment of Deep Neural Networks on Multi-Accelerator Heterogeneous Edge SoCs

🐦 Tweet
2604.09124

Enrico Russo, Mohamed Amine Hamdi, Alessandro Ottaviano, Francesco Conti, Angelo Garofalo + 4 more

cs.DCcs.ARcs.LG

TLDR

MATCHA is a new framework for efficiently deploying deep neural networks on multi-accelerator heterogeneous edge SoCs, reducing inference latency.

Key contributions

  • MATCHA is a unified framework for deploying DNNs on heterogeneous multi-accelerator SoCs.
  • Generates highly concurrent schedules and optimizes L3/L2 memory via constraint programming.
  • Employs pattern matching, tiling, and mapping for parallel execution and high utilization.
  • Reduces DNN inference latency by up to 35% on MLPerf Tiny compared to state-of-the-art.

Why it matters

Existing DNN deployment frameworks struggle to fully utilize heterogeneous edge SoCs. MATCHA offers a novel approach to optimize scheduling and memory, leading to substantial performance gains. This enables more efficient and faster AI inference at the edge.

Original Abstract

Deploying DNNs on System-on-Chips (SoC) with multiple heterogeneous acceleration engines is challenging, and the majority of deployment frameworks cannot fully exploit heterogeneity. We present MATCHA, a unified DNN deployment framework that generates highly concurrent schedules for parallel, heterogeneous accelerators and uses constraint programming to optimize L3/L2 memory allocation and scheduling. Using pattern matching, tiling, and mapping across individual HW units enables parallel execution and high accelerator utilization. On the MLPerf Tiny benchmark, using a SoC with two heterogeneous accelerators, MATCHA improves accelerator utilization and reduces inference latency by up to 35% with respect to the the state-of-the-art MATCH compiler.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.