ArXiv TLDR

The Grand Software Supply Chain of AI Systems

🐦 Tweet
2604.27781

Carmine Cesarano, Martin Monperrus

cs.SE

TLDR

This paper analyzes the vulnerabilities in the AI software supply chain across four layers, identifying critical gaps and demonstrating its immense scale.

Key contributions

  • Decomposes the AI software supply chain into four architectural layers: data, training, inference, and substrate.
  • Identifies four critical structural gaps: verifiability, versioning, observability, and traceability.
  • Highlights how current AI systems fail to address these gaps, leading to silent degradation and poor lineage.
  • Quantifies a reference AI stack with 11,508 transitive packages and 392M lines of code.

Why it matters

This paper is crucial for understanding the inherent vulnerabilities in AI systems' software supply chains. It provides a foundational framework and empirical evidence to address critical security and reliability challenges in AI development and deployment.

Original Abstract

AI systems rest on software with low integrity mechanisms, leaving AI systems exposed across every stage from data acquisition to final inference. This paper makes the AI supply chain a first-class object of analysis, decomposing it across four architectural layers: data acquisition, model training, model inference, and a cross-cutting substrate. Within these layers, we identify four structural gaps that traditional supply chain mechanisms do not address: verifiability, versioning, observability, and traceability.Current AI systems fall short on all of them: they carry undeclared behavioral couplings that no resolver enforces; they cannot be reverted back to known working assemblies; they degrade silently rather than surfacing breaking changes; and their lineage can hardly be approximated. To illustrate the scale of the software supply chain of AI, we measure a reference stack of 48 production-grade open-source projects, which declares 4,664 direct dependencies, resolves to 11,508 transitive packages, and totals roughly 392M lines of code.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.