Zeming Wei
2 papers ยท Latest:
Cryptography & Security
SkillSafetyBench: Evaluating Agent Safety under Skill-Facing Attack Surfaces
SkillSafetyBench evaluates how reusable skills in LLM agents create new attack surfaces, revealing vulnerabilities beyond model-level alignment.
2605.12015
Cryptography & SecurityThe Salami Slicing Threat: Exploiting Cumulative Risks in LLM Systems
This paper introduces Salami Slicing, a novel multi-turn jailbreak attack that exploits cumulative low-risk inputs to bypass LLM safety, achieving high success rates.
2604.11309
๐ฌ Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week โ summarized, scored, and delivered to your inbox every Monday.