ArXiv TLDR

Spreadsheet Modeling Experiments Using GPTs on Small Problem Statements and the Wall Task

🐦 Tweet
2604.25689

Thomas A. Grossman, Yuan Chen, Sopiko Datuashvili

cs.SEcs.AI

TLDR

This paper evaluates GPTs for spreadsheet modeling, finding them inconsistent but promising for drafts, requiring skilled user oversight.

Key contributions

  • Evaluated five GPT extensions for spreadsheet modeling, selecting Excel AI for detailed testing.
  • Assessed Excel AI's performance using ERFR criteria on small problem statements.
  • Identified inconsistency and non-reproducibility in GPT-generated models, noting "confidence" and "workflow" issues.
  • Concluded GPTs show promise for draft models but are unreliable for professional use, requiring skilled user verification.

Why it matters

This research critically assesses current GPT capabilities in spreadsheet modeling, a common business task. It highlights significant limitations like inconsistency and the need for human oversight, tempering expectations for AI's immediate professional use. This helps users understand AI's practical readiness for analytical tasks.

Original Abstract

This paper investigates how GPT-based tools can assist in building reusable analytical spreadsheet models. After a screening, we evaluate five GPT extensions and select Excel AI by pulsrai.com for detailed testing. Through structured experiments on simple problem statements, we assess Excel AI's performance against the ERFR criteria (each input in a cell; cell formulas; no hardwired numbers; labels; accurate). Results show that while Excel AI can produce well-structured models, it is inconsistent and often non-reproducible. We identify two central challenges - "the problem of confidence" and "the problem of workflow" - which highlight the need for skilled users to verify and adapt GPT-generated spreadsheets. Though GPTs show promise for generating draft models that may reduce development time or lower skill requirements, current tools remain unreliable for professional use. We conclude with recommendations for future research into prompt engineering, reproducibility, and larger-scale modeling tasks.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.