Xiaofei Xie
3 papers ยท Latest:
Artificial Intelligence
Beyond Accuracy: Policy Invariance as a Reliability Test for LLM Safety Judges
LLM safety judges are unreliable; their verdicts depend on policy wording, not just agent behavior, leading to flawed safety evaluations.
2605.06161
Cryptography & SecurityARGUS: Defending LLM Agents Against Context-Aware Prompt Injection
ARGUS defends LLM agents against context-aware prompt injection by auditing decisions based on provenance, significantly reducing attack success.
2605.03378
Software EngineeringFrom Exploration to Specification: LLM-Based Property Generation for Mobile App Testing
PropGen automates property generation for Android app testing, effectively finding functional bugs by exploring functionalities and refining properties.
2604.13463
๐ฌ Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week โ summarized, scored, and delivered to your inbox every Monday.