Task-Adaptive Retrieval over Agentic Multi-Modal Web Histories via Learned Graph Memory
Saman Forouzandeh, Kamal Berahmand, Mahdi Jalili
TLDR
ACGM introduces a learned graph-memory retriever that adaptively retrieves relevant information from multi-modal web histories, outperforming strong baselines.
Key contributions
- Proposes ACGM, a learned graph-memory retriever for task-adaptive retrieval from multi-modal web histories.
- Optimizes relevance graphs using policy gradients, adapting to evolving task contexts and modalities.
- Captures heterogeneous temporal dynamics with modality-specific decay and learns sparse, efficient connectivity.
- Achieves 82.7 nDCG@10 and 89.2% Precision@10, significantly outperforming 19 baselines.
Why it matters
This paper tackles the challenge of retrieving relevant information from complex, multi-modal web histories for AI agents. ACGM, a learned graph-memory system, significantly improves adaptive recall of past observations, crucial for building more intelligent and effective web-based AI agents.
Original Abstract
Retrieving relevant observations from long multi-modal web interaction histories is challenging because relevance depends on the evolving task state, modality (screenshots, HTML text, structured signals), and temporal distance. Prior approaches typically rely on static similarity thresholds or fixed-capacity buffers, which fail to adapt relevance to the current task context. We propose \textbf{ACGM}, a learned graph-memory retriever that constructs \emph{task-adaptive} relevance graphs over agent histories using policy-gradient optimization from downstream task success. ACGM captures heterogeneous temporal dynamics with modality-specific decay (visual decays $4.3\times$ faster than text: $λ_v{=}0.47$ vs.\ $λ_x{=}0.11$) and learns sparse connectivity (3.2 edges/node), enabling efficient $O(\log T)$ retrieval. Across WebShop, VisualWebArena, and Mind2Web, ACGM improves retrieval quality to \textbf{82.7 nDCG@10} (+9.3 over GPT-4o, $p{<}0.001$) and \textbf{89.2\% Precision@10} (+7.7), outperforming 19 strong dense, re-ranking, multi-modal, and graph-based baselines. Code to reproduce our results is available at{\color{blue}\href{https://github.com/S-Forouzandeh/ACGM-Agentic-Web}{Saman Forouzandeh}}.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.