ArXiv TLDR

Aligned Multi-View Scripts for Universal Chart-to-Code Generation

🐦 Tweet
2604.24559

Zhihan Zhang, Lizi Liao

cs.CLcs.AI

TLDR

Introduces Chart2NCode dataset and CharLuMA model for universal chart-to-code generation across Python, R, and LaTeX, improving fidelity.

Key contributions

  • Introduces Chart2NCode, a 176K chart dataset with aligned Python, R, and LaTeX scripts.
  • Proposes CharLuMA, a LLaVA-style model with a parameter-efficient adaptation module.
  • Utilizes language-conditioned low-rank subspaces for specialized code generation.
  • Outperforms baselines in executability and visual fidelity across all target languages.

Why it matters

This paper fills a gap in universal chart-to-code generation by supporting multiple languages. It enables faithful reproduction and editable visualizations across different plotting environments, proving the value of multi-language supervision.

Original Abstract

Chart-to-code generation converts a chart image into an executable plotting script, enabling faithful reproduction and editable visualizations. Existing methods are largely Python-centric, limiting practical use and overlooking a critical source of supervision: the same chart can be expressed by semantically equivalent scripts in different plotting languages. To fill this gap, we introduce Chart2NCode, a dataset of 176K charts paired with aligned scripts in Python, R, and LaTeX that render visually equivalent outputs, constructed via a metadata-to-template pipeline with rendering verification and human quality checks. Building on a LLaVA-style architecture, we further propose CharLuMA, a parameter-efficient adaptation module that augments the multimodal projector with a language-conditioned mixture of low-rank subspaces, allowing the model to share core chart understanding while specializing code generation to the target language through lightweight routing. Extensive experiments show consistent gains in executability and visual fidelity across all languages, outperforming strong open-source baselines and remaining competitive with proprietary systems. Further analyses reveal that balanced multi-language supervision benefits all languages and that the adapter allocates a compact shared core plus language-specific capacity. Codes and data are available at https://github.com/Zhihan72/CharLuMA.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.