Zhen Xiang
2 papers ยท Latest:
Machine Learning
Crafting Reversible SFT Behaviors in Large Language Models
This paper introduces LCDD to create sparse, controllable "carriers" for SFT behaviors in LLMs, enabling their selective reversal with SFT-Eraser.
2605.06632
Natural Language ProcessingGreen Shielding: A User-Centric Approach Towards Trustworthy AI
Green Shielding proposes a user-centric approach to build trustworthy AI by characterizing how benign input variations affect LLM behavior.
2604.24700
๐ฌ Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week โ summarized, scored, and delivered to your inbox every Monday.