No More, No Less: Task Alignment in Terminal Agents

May 12, 20262605.12233

Sina Mavali, David Pape, Jonathan Evertz, Samira Abedini, Devansh Srivastav + 3 more

cs.LGcs.AIcs.CR

TLDR

A new benchmark, TAB, reveals terminal agents struggle with selectively following relevant instructions while ignoring distractors, highlighting a gap in task alignment.

Key contributions

Introduces TAB (Task Alignment Benchmark) with 89 underspecified terminal tasks.
Tasks require agents to selectively use necessary cues while ignoring plausible distractors.
Evaluates ten frontier agents, showing a systematic gap between capability and task alignment.
Demonstrates prompt-injection defenses suppress both distractors and essential cues.

Why it matters

This paper highlights a critical flaw in current terminal agents: their inability to discern relevant instructions from irrelevant ones. The new TAB benchmark provides a crucial tool to evaluate and improve this "task alignment," pushing towards more robust and intelligent autonomous agents.

Original Abstract

Terminal agents are increasingly capable of executing complex, long-horizon tasks autonomously from a single user prompt. To do so, they must interpret instructions encountered in the environment (e.g., README files, code comments, stack traces) and determine their relevance to the task. This creates a fundamental challenge: relevant cues must be followed to complete a task, whereas irrelevant or misleading ones must be ignored. Existing benchmarks do not capture this ability. An agent may appear capable by blindly following all instructions, or appear robust by ignoring them altogether. We introduce TAB (Task Alignment Benchmark), a suite of 89 terminal tasks derived from Terminal-Bench 2.1. Each task is intentionally underspecified, with missing information provided as a necessary cue embedded in a natural environmental artifact, alongside a plausible but irrelevant distractor. Solving these tasks requires selectively using the cue while ignoring the distractor. Applying TAB to ten frontier agents reveals a systematic gap between task capability and task alignment. The strongest Terminal-Bench agent achieves high task completion but low task alignment on TAB. Evaluating six prompt-injection defenses further shows that suppressing distractor execution also suppresses the cues required for task completion. These results demonstrate that task-aligned agents require selective use of environmental instructions rather than blanket acceptance or rejection.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers