ArXiv TLDR

FinSafetyBench: Evaluating LLM Safety in Real-World Financial Scenarios

🐦 Tweet
2605.00706

Yutao Hou, Yihan Jiang, Yuhan Xie, Jian Yang, Liwen Zhang + 3 more

cs.CL

TLDR

FinSafetyBench is a new bilingual red-teaming benchmark evaluating LLM safety and compliance in real-world financial scenarios, revealing vulnerabilities.

Key contributions

  • Introduces FinSafetyBench, a bilingual (English-Chinese) red-teaming benchmark for LLM financial safety.
  • Evaluates LLM compliance refusal against 14 subcategories of financial crimes and ethical violations.
  • Identifies critical vulnerabilities in LLMs, allowing adversarial prompts to bypass compliance safeguards.
  • Highlights greater susceptibility in Chinese contexts and limitations of current prompt-level defenses.

Why it matters

LLMs in finance pose serious compliance risks due to their potential for harmful outputs. This paper provides a crucial benchmark to systematically evaluate and identify vulnerabilities in LLM safety, highlighting the urgent need for robust defense mechanisms in financial AI applications.

Original Abstract

Large language models (LLMs) are increasingly applied in financial scenarios. However, they may produce harmful outputs, including facilitating illegal activities or unethical behavior, posing serious compliance risks. To systematically evaluate LLM safety in finance, we propose FinSafetyBench, a bilingual (English-Chinese) red-teaming benchmark designed to test an LLM's refusal of requests that violate financial compliance. Grounded in real-world financial crime cases and ethics standards, the benchmark comprises 14 subcategories spanning financial crimes and ethical violations. Through extensive experiments on general-purpose and finance-specialized LLMs under three representative attack settings, we identify critical vulnerabilities that allow adversarial prompts to bypass compliance safeguards. Further analysis reveals stronger susceptibility in Chinese contexts and highlights the limitations of prompt-level defenses against sophisticated or implicit manipulation strategies.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.