Chao Yang

2 papers · Latest: May 12, 2026

SkillSafetyBench: Evaluating Agent Safety under Skill-Facing Attack Surfaces

SkillSafetyBench evaluates how reusable skills in LLM agents create new attack surfaces, revealing vulnerabilities beyond model-level alignment.

2605.12015May 12, 2026

Machine Learning

Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization

MODPO is a novel, RL-free method for aligning language models to multiple human preferences simultaneously, achieving stable and efficient optimization across diverse objectives.

2310.03708Oct 5, 2023

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.