Jun Sun

4 papers · Latest: April 27, 2026

Layerwise Convergence Fingerprints for Runtime Misbehavior Detection in Large Language Models

LCF is a tuning-free runtime monitor that detects LLM misbehavior like backdoors, jailbreaks, and prompt injections by analyzing hidden-state trajectories.

2604.24542Apr 27, 2026

Cryptography & Security

Train in Vain: Functionality-Preserving Poisoning to Prevent Unauthorized Use of Code Datasets

FunPoison introduces a functionality-preserving poisoning method to prevent unauthorized use of code datasets for training CodeLLMs, maintaining compilability.

2604.22291Apr 24, 2026

Cryptography & Security

ClawGuard: A Runtime Security Framework for Tool-Augmented LLM Agents Against Indirect Prompt Injection

ClawGuard is a runtime security framework protecting tool-augmented LLM agents from indirect prompt injection by enforcing rules at tool-call boundaries.

2604.11790Apr 13, 2026

Cryptography & Security

The Salami Slicing Threat: Exploiting Cumulative Risks in LLM Systems

This paper introduces Salami Slicing, a novel multi-turn jailbreak attack that exploits cumulative low-risk inputs to bypass LLM safety, achieving high success rates.

2604.11309Apr 13, 2026

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.