Zheng Lin
4 papers ยท Latest:
Cryptography & Security
Re-Triggering Safeguards within LLMs for Jailbreak Detection
This paper introduces an embedding disruption method to re-trigger LLM safeguards, effectively detecting and defending against jailbreak attacks.
2605.10611
Cryptography & SecurityGuaranteed Jailbreaking Defense via Disrupt-and-Rectify Smoothing
DR-Smoothing offers a guaranteed defense against LLM jailbreaking attacks by disrupting and rectifying prompts, balancing safety and helpfulness.
2605.10582
Machine LearningNear-Future Policy Optimization
NPO and AutoNPO enhance Reinforcement Learning with Verifiable Rewards (RLVR) by leveraging near-future policy checkpoints for improved off-policy learning.
2604.20733
Machine LearningSelf-Distilled RLVR
RLSD combines RLVR with self-distillation to provide fine-grained updates and reliable directions, improving LLM training stability and convergence.
2604.03128
๐ฌ Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week โ summarized, scored, and delivered to your inbox every Monday.