AIPC: Agent-Based Automation for AI Model Deployment with Qualcomm AI Runtime

April 16, 20262604.14661

Jianhao Su, Zhanwei Wu, ShengTing Huang, Weidong Feng

cs.SEcs.AIcs.LG

TLDR

AIPC uses AI agents to automate complex edge AI model deployment, significantly reducing time and expertise needed for hardware-specific runtimes.

Key contributions

Introduces AIPC, an AI agent-driven system for automating edge AI model deployment.
Decomposes deployment into verifiable stages, injecting domain knowledge via Agent Skills and validation.
Achieves 7-20 minute deployment for regular vision models on Qualcomm AI Runtime with low API costs.
Significantly reduces expertise barriers and engineering time for hardware-specific AI deployment.

Why it matters

Deploying AI models to edge hardware is notoriously difficult and time-consuming. AIPC offers a novel, agent-based solution that automates much of this process, making edge AI more accessible and efficient. This advancement could accelerate the adoption of AI on specialized hardware by lowering technical barriers.

Original Abstract

Edge AI model deployment is a multi-stage engineering process involving model conversion, operator compatibility handling, quantization calibration, runtime integration, and accuracy validation. In practice, this workflow is long, failure-prone, and heavily dependent on deployment expertise, particularly when targeting hardware-specific inference runtimes. This technical report presents AIPC (AI Porting Conversion), an AI agent-driven approach for constrained automation of AI model deployment. AIPC decomposes deployment into standardized, verifiable stages and injects deployment-domain knowledge into agent execution through Agent Skills, helper scripts, and a stage-wise validation loop. This design reduces both the expertise barrier and the engineering time required for hardware deployment. Using Qualcomm AI Runtime (QAIRT) as the primary scenario, this report examines automated deployment across representative vision, multimodal, and speech models. In the cases covered here, AIPC can complete deployment from PyTorch to runnable QNN/SNPE inference within 7-20 minutes for structurally regular vision models, with indicative API costs roughly in the range of USD 0.7-10. For more complex models involving less-supported operators, dynamic shapes, or autoregressive decoding structures, fully automated deployment may still require further advances, but AIPC already provides practical support for execution, failure localization, and bounded repair.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers