Beyond "I cannot fulfill this request": Alleviating Rigid Rejection in LLMs via Label Enhancement

May 8, 20262605.07883

Ying Zhang, Congyu Qiao, Xin Geng, Ning Xu

cs.CL

TLDR

LANCE introduces a label enhancement method using variational inference to enable LLMs to provide safe yet flexible and natural responses, avoiding rigid rejections.

Key contributions

Addresses LLM "rigid rejection" that uses generic templates, harming natural interaction.
Introduces LANCE, a label enhancement method using variational inference for fine-grained rejection.
Utilizes multi-way textual gradients to refine prompts, neutralizing hazardous elements for safer responses.
Significantly improves LLM helpfulness and naturalness while maintaining high security standards.

Why it matters

This paper is important because it tackles a key limitation in LLM safety alignment: the overly rigid rejection mechanism. By enabling more nuanced and natural refusals, LANCE improves user experience and interaction quality without compromising safety. This makes LLMs more practical and user-friendly.

Original Abstract

Large Language Models (LLMs) rely on safety alignment to obey safe requests while refusing harmful ones. However, traditional refusal mechanisms often lead to "rigid rejection," where a general template (e.g., "I cannot fulfill this request") indiscriminately triggers refusals and severely undermines the naturalness of interactions between humans and LLMs. To address this issue, LANCE is proposed in this paper to ensure safe yet flexible and natural responses via label enhancement. Specifically, LANCE employs variational inference to perform label enhancement, predicting a continuous distribution across multiple rejection categories. These fine-grained rejection distributions provide multi-way textual gradients for a refinement model to neutralize the hazardous elements in the prompt, so that the LLMs could generate safe responses that avoid rigid rejections while preserving the naturalness of interactions. Experiments demonstrate that LANCE significantly alleviates the rigid rejection problem while maintaining high security standards, significantly outperforming existing baseline models in terms of helpfulness and naturalness of responses.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers