Shihao Weng
2 papers ยท Latest:
Artificial Intelligence
Beyond Accuracy: Policy Invariance as a Reliability Test for LLM Safety Judges
LLM safety judges are unreliable; their verdicts depend on policy wording, not just agent behavior, leading to flawed safety evaluations.
2605.06161
Cryptography & SecurityARGUS: Defending LLM Agents Against Context-Aware Prompt Injection
ARGUS defends LLM agents against context-aware prompt injection by auditing decisions based on provenance, significantly reducing attack success.
2605.03378
๐ฌ Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week โ summarized, scored, and delivered to your inbox every Monday.