ArXiv TLDR

A Case-Driven Multi-Agent Framework for E-Commerce Search Relevance

🐦 Tweet
2605.05991

Global E-Commerce Search Relevance Team

cs.IR

TLDR

A multi-agent framework automates e-commerce search relevance optimization by replacing human roles with specialized AI agents for case identification and resolution.

Key contributions

  • Automates e-commerce search relevance optimization using a case-driven multi-agent framework.
  • Features specialized agents (Annotator, Optimizer, User) for autonomous bad-case identification and resolution.
  • Incorporates production-ready components like unified models, instruction-following models, and Global Memory.
  • Demonstrates improved annotation accuracy and faster, more generalizable bad-case resolution.

Why it matters

This paper proposes an autonomous multi-agent system to optimize e-commerce search relevance, automating the pipeline from bad-case identification to resolution. It offers a practical paradigm for industrial search, significantly reducing human effort and improving efficiency for large-scale e-commerce platforms.

Original Abstract

Relevance is a foundation of user experience in e-commerce search. We view relevance optimization as a closed-loop ecosystem involving multiple human roles: users who provide feedback, product managers who define standards, annotators who label data, algorithm engineers who optimize models, and evaluators who assess performance. Because improving relevance in practice means systematically resolving user-perceived bad cases, we ask a system-level question: can this ecosystem be reimagined by replacing its human roles with autonomous agents? To answer this question, we propose a case-driven multi-agent framework that automates the pipeline from bad-case identification to resolution. The framework instantiates an Annotator Agent for multi-turn annotation, an Optimizer Agent for autonomous bad-case analysis and resolution, and a User Agent that identifies bad cases through conversational interaction, together forming an autonomous and continually evolving system. To make the framework practical in production, we further adopt a harness-engineering paradigm and build a unified retrieval-and-ranking relevance model for efficient training, an instruction-following relevance model for real-time case resolution, Global Memory to reduce information asymmetry across agents, a Deep Search Agent to target underestimation failures, and an agent-based chatbot for human--agent collaboration. Extensive human evaluation shows that the framework performs relevance-related tasks effectively, improves annotation accuracy, and enables more timely and generalizable bad-case resolution, indicating a practical paradigm for industrial search relevance optimization.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.