Eric Wong

2 papers · Latest: April 13, 2026

Detecting Safety Violations Across Many Agent Traces

Meerkat uses clustering and agentic search to detect rare, complex safety violations across many agent traces, outperforming existing methods.

This paper measures and corrects citation hallucinations in LLMs and research agents, finding 3-13% of URLs are fabricated.

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.