FlowCompile: An Optimizing Compiler for Structured LLM Workflows
Junyan Li, Zhang-Wei Hong, Maohao Shen, Yang Zhang, Chuang Gan
TLDR
FlowCompile is an optimizing compiler for structured LLM workflows that explores design space at compile-time to find efficient, reusable configurations.
Key contributions
- Addresses the challenge of optimizing structured LLM workflows with a combinatorial design space.
- Introduces FlowCompile, a novel compiler that explores workflow design space at compile-time.
- Profiles sub-agents and composes measurements to estimate workflow accuracy and latency.
- Delivers up to 6.4x speedup over routing-based baselines and creates reusable configurations.
Why it matters
This paper introduces a novel compilation approach for optimizing LLM workflows, moving beyond traditional routing methods. It significantly improves efficiency, offering up to 6.4x speedup, and provides reusable configurations for flexible deployment. This work simplifies the deployment and optimization of complex LLM-based systems.
Original Abstract
Structured LLM workflows, where specialized LLM sub-agents execute according to a predefined graph, have become a powerful abstraction for solving complex tasks. Optimizing such workflows, i.e., selecting configurations for each sub-agent to balance accuracy and latency, is challenging due to the combinatorial design space over model choices, reasoning budgets, and workflow structures. Existing cost-aware methods largely treat workflow optimization as a routing problem, selecting a configuration at inference time for each query according to the accuracy-latency objective used during training. We argue that structured LLM workflows can also be optimized from a compilation perspective: before deployment, the system can globally explore the workflow design space and construct a reusable set of workflow-level configurations spanning diverse accuracy-latency trade-offs. Drawing inspiration from machine learning compilers, we introduce FlowCompile, a structured LLM workflow compiler that performs compile-time design space exploration to identify a high-quality, reusable trade-off set. FlowCompile decomposes a workflow into sub-agents, profiles each sub-agent under diverse configurations, and composes these measurements through a structure-aware proxy to estimate workflow-level accuracy and latency. It then identifies diverse high-quality configurations in a single compile-time pass, without retraining or online adaptation. Experiments across diverse workflows and challenging benchmarks show that FlowCompile consistently outperforms heuristically optimized workflow configurations and routing-based baselines, delivering up to 6.4x speedup. The compiled configuration set further serves as a reusable optimization artifact, enabling flexible deployment under varying runtime preferences and supporting downstream selection or routing.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.