ArXiv TLDR

FOCAL: Filtered On-device Continuous Activity Logging for Efficient Personal Desktop Summarization

🐦 Tweet
2604.19541

Haoran Yin, Zhiyuan Wen, Jiannong Cao, Bo Yuan, Ruosong Yang

cs.MAcs.HC

TLDR

FOCAL is an on-device multi-agent system that efficiently summarizes desktop activity by filtering noise and selectively using VLMs, improving recall.

Key contributions

  • FOCAL is a multi-agent system for efficient, privacy-first on-device desktop activity summarization.
  • Employs a filter-plan-log architecture with agents for noise filtering, task attribution, and selective visual reasoning.
  • Reduces VLM token consumption by 60.4% and VLM calls by 72.3% versus a baseline.
  • Boosts Key Information Recall (KIR) from 0.38 to 0.61, robust to task interruptions.

Why it matters

On-device desktop activity summarization is challenging due to high VLM resource demands and context pollution. FOCAL addresses these issues by providing an efficient, privacy-first multi-agent system. This enables practical, continuous personal logging without straining local resources or compromising privacy.

Original Abstract

Desktop interaction streams provide a continuous, privacy-sensitive record of interleaved user tasks. Transforming these streams into task-organized personal logs on-device faces two main challenges: exhaustive Vision-Language Model (VLM) processing strains local resources, and global stream processing causes cross-task context pollution. We present FOCAL (Filtered On-device Continuous Activity Logging), a privacy-first multi-agent system utilizing a unified filter-plan-log architecture. It cascades a lightweight Filter Agent for noise suppression, a text-only Brain Agent for task attribution, a Record Agent for selective visual reasoning, and a task-isolated Memory Agent for context-coherent summarization. Experiments on DesktopBench (comprising 2,572 screenshots across 420 complex sessions) show FOCAL reduces total token consumption by 60.4% and VLM call count by 72.3% versus a baseline, while boosting Key Information Recall (KIR) from 0.38 to 0.61. Crucially, under $A{\to}B{\to}A$ task interruptions, FOCAL maintains Task Acc 0.81 and KIR 0.80, whereas the baseline collapses to Task Acc 0.03. FOCAL pioneers the efficient, on-device summarization of instruction-free desktop streams into multi-perspective personal logs.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.