Arax: A Runtime Framework for Decoupling Applications from Heterogeneous Accelerators

May 2, 20232305.01291

Manos Pavlidakis, Stelios Mavridis, Antony Chazapis, Giorgos Vasiliadis, Angelos Bilas

eess.SY

TLDR

Arax is a runtime framework that decouples applications from heterogeneous accelerators, enabling dynamic resource sharing, elastic allocation, and simplified programming across diverse accelerator types.

Key contributions

Dynamically maps application tasks to available heterogeneous accelerators, managing task state, memory, and dependencies.
Enables elastic resource allocation, allowing applications to adjust accelerator usage based on load fluctuations.
Provides a simple API and automatic stub generation (Autotalk) to minimize programming effort and support legacy accelerator-specific applications.

Why it matters

This paper addresses critical challenges in efficiently utilizing multiple heterogeneous accelerators by introducing Arax, which abstracts hardware details and enables flexible, efficient sharing and scaling of accelerator resources. This is important because it simplifies development, improves performance over existing solutions like NVIDIA MPS, and supports elasticity, thereby enhancing the usability and efficiency of heterogeneous computing environments in modern AI and HPC workloads.

Original Abstract

Today, using multiple heterogeneous accelerators efficiently from applications and high-level frameworks, such as TensorFlow and Caffe, poses significant challenges in three respects: (a) sharing accelerators, (b) allocating available resources elastically during application execution, and (c) reducing the required programming effort. In this paper, we present Arax, a runtime system that decouples applications from heterogeneous accelerators within a server. First, Arax maps application tasks dynamically to available resources, managing all required task state, memory allocations, and task dependencies. As a result, Arax can share accelerators across applications in a server and adjust the resources used by each application as load fluctuates over time. dditionally, Arax offers a simple API and includes Autotalk, a stub generator that automatically generates stub libraries for applications already written for specific accelerator types, such as NVIDIA GPUs. Consequently, Arax applications are written once without considering physical details, including the number and type of accelerators. Our results show that applications, such as Caffe, TensorFlow, and Rodinia, can run using Arax with minimum effort and low overhead compared to native execution, about 12% (geometric mean). Arax supports efficient accelerator sharing, by offering up to 20% improved execution times compared to NVIDIA MPS, which supports NVIDIA GPUs only. Arax can transparently provide elasticity, decreasing total application turn-around time by up to 2x compared to native execution without elasticity support.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers