ArXiv TLDR

Figma2Code: Automating Multimodal Design to Code in the Wild

🐦 Tweet
2604.13648

Yi Gui, Jiawan Zhang, Yina Wang, Tianran Ma, Yao Wan + 7 more

cs.SE

TLDR

Figma2Code automates design-to-code by leveraging rich multimodal Figma data, creating a new task and dataset to benchmark MLLMs.

Key contributions

  • Introduces Figma2Code, a novel task for automating design-to-code using multimodal Figma data.
  • Developed a new dataset of 213 high-quality Figma design-to-code pairs from 3,055 samples.
  • Benchmarked ten state-of-the-art MLLMs, revealing limitations in layout responsiveness and code maintainability.
  • Identified that MLLMs tend to directly map primitive visual attributes, impacting code quality.

Why it matters

Front-end development is costly, and automating design-to-code is crucial. This paper addresses a key limitation of current MLLMs by incorporating rich Figma metadata, moving closer to real-world workflows. Its new task and dataset provide a vital benchmark for advancing more practical and robust UI code generation.

Original Abstract

Front-end development constitutes a substantial portion of software engineering, yet converting design mockups into production-ready User Interface (UI) code remains tedious and costly. While recent work has explored automating this process with Multimodal Large Language Models (MLLMs), existing approaches typically rely solely on design images. As a result, they must infer complex UI details from images alone, often leading to degraded results. In real-world development workflows, however, design mockups are usually delivered as Figma files, a widely used tool for front-end design, that embed rich multimodal information (e.g., metadata and assets) essential for generating high-quality UI. To bridge this gap, we introduce Figma2Code, a new task that advances design-to-code into a multimodal setting and aims to automate design-to-code in the wild. Specifically, we collect paired design images and their corresponding metadata files from the Figma community. We then apply a series of processing operations, including rule-based filtering, human- and MLLM-based annotation and screening, and metadata refinement. This process yields 3,055 samples, from which designers curate a balanced dataset of 213 high-quality cases. Using this dataset, we benchmark ten state-of-the-art open-source and proprietary MLLMs. Our results show that while proprietary models achieve superior visual fidelity, they remain limited in layout responsiveness and code maintainability. Further experiments across modalities and ablation studies corroborate this limitation, partly due to models' tendency to directly map primitive visual attributes from Figma metadata.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.