CIAO - Code In Architecture Out - Automated Software Architecture Documentation with Large Language Models
Marco De Luca, Tiziano Santilli, Domenico Amalfitano, Anna Rita Fasolino, Patrizio Pelliccione
TLDR
CIAO leverages LLMs to automate system-level architectural documentation from GitHub repositories, providing valuable, comprehensible, and cost-effective output.
Key contributions
- Introduces CIAO, an LLM-based workflow for generating system-level architectural documentation from GitHub repos.
- Produces structured documentation following templates from ISO/IEC/IEEE 42010, SEI Views & Beyond, and C4 model.
- Evaluated by 22 developers, who found the generated documentation valuable, comprehensible, and accurate.
- The process is operationally efficient, generating complete documents in minutes at a low cost.
Why it matters
Software architecture documentation is crucial but often incomplete. This paper offers a structured, automated solution using LLMs to create system-level documents directly from code. It significantly bridges the gap between code and up-to-date architecture documentation, improving system comprehension and maintainability.
Original Abstract
Software architecture documentation is essential for system comprehension, yet it is often unavailable or incomplete. While recent LLM-based techniques can generate documentation from code, they typically address local artifacts rather than producing coherent, system-level architectural descriptions. This paper presents a structured process for automatically generating system-level architectural documentation directly from GitHub repositories using Large Language Models. The process, called CIAO (Code In Architecture Out), defines an LLM-based workflow that takes a repository as input and produces system-level architectural documentation following a template derived from ISO/IEC/IEEE 42010, SEI Views \& Beyond, and the C4 model. The resulting documentation can be directly added to the target repository. We evaluated the process through a study with 22 developers, each reviewing the documentation generated for a repository they had contributed to. The evaluation shows that developers generally perceive the produced documentation as valuable, comprehensible, and broadly accurate with respect to the source code, while also highlighting limitations in diagram quality, high-level context modeling, and deployment views. We also assessed the operational cost of the process, finding that generating a complete architectural document requires only a few minutes and is inexpensive to run. Overall, the results indicate that a structured, standards-oriented approach can effectively guide LLMs in producing system-level architectural documentation that is both usable and cost-effective.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.