Xiaofei Xie

3 papers · Latest: May 7, 2026

Beyond Accuracy: Policy Invariance as a Reliability Test for LLM Safety Judges

LLM safety judges are unreliable; their verdicts depend on policy wording, not just agent behavior, leading to flawed safety evaluations.

2605.06161May 7, 2026

Cryptography & Security

ARGUS: Defending LLM Agents Against Context-Aware Prompt Injection

ARGUS defends LLM agents against context-aware prompt injection by auditing decisions based on provenance, significantly reducing attack success.

2605.03378May 5, 2026

Software Engineering

From Exploration to Specification: LLM-Based Property Generation for Mobile App Testing

PropGen automates property generation for Android app testing, effectively finding functional bugs by exploring functionalities and refining properties.

2604.13463Apr 15, 2026

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.