ArXiv TLDR

Less Is More: Engineering Challenges of On-Device Small Language Model Integration in a Mobile Application

🐦 Tweet
2604.24636

William Oliveira

cs.SEcs.AIcs.CL

TLDR

Integrating on-device SLMs into mobile apps is viable but challenging, requiring pragmatic design and reduced LLM responsibility for reliable performance.

Key contributions

  • Documented engineering challenges of integrating SLMs (Gemma 4 E2B, Qwen3 0.6B) into a mobile game.
  • Identified five specific failure categories for on-device SLMs, including output format and latency.
  • Developed and documented mitigation strategies like defensive parsing and progressive prompt hardening.
  • Concludes that on-device SLMs are viable when their responsibilities are significantly reduced.

Why it matters

This paper provides practical insights into the real-world challenges and solutions for deploying SLMs on mobile devices. It offers a crucial guide for developers aiming to leverage private, offline AI by demonstrating that 'less is more' for reliable on-device performance.

Original Abstract

On-device Small Language Models (SLMs) promise fully offline, private AI experiences for mobile users (no cloud dependency, no data leaving the device). But is this promise achievable in practice? This paper presents a longitudinal practitioner case study documenting the engineering challenges of integrating SLMs (Gemma 4 E2B, 2.6B parameters; Qwen3 0.6B, 600M parameters) into Palabrita, a production Android word-guessing game. Over a 5-day development sprint comprising 204 commits (~90 directly AI-related), the system underwent a radical transformation: from an ambitious design where the LLM generated complete structured puzzles (word, category, difficulty, and five hints as JSON) to a pragmatic architecture where curated word lists provide the words and the LLM generates only three short hints, with a deterministic fallback if it fails. We identify five categories of failures specific to on-device SLM integration: output format violations, constraint violations, context quality degradation, latency incompatibility, and model selection instability. For each failure category, we document the observed symptoms, root causes, and the prompt engineering and architectural strategies that effectively mitigated them, including multi-layer defensive parsing, contextual retry with failure feedback, session rotation, progressive prompt hardening, and systematic responsibility reduction. Our findings demonstrate that on-device SLMs are viable for production mobile applications, but only when the developer accepts a fundamental constraint: the most reliable on-device LLM feature is one where the LLM does the least. We distill our experience into eight actionable design heuristics for practitioners integrating SLMs into mobile apps.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.