ArXiv TLDR

GLiGuard: Schema-Conditioned Classification for LLM Safeguard

🐦 Tweet
2605.07982

Urchade Zaratiana, Mary Newhauser, George Hurn-Maloney, Ash Lewis

cs.CLcs.CR

TLDR

GLiGuard is a compact 0.3B-parameter model that uses schema-conditioned classification to efficiently safeguard LLMs, outperforming larger models in speed.

Key contributions

  • Introduces GLiGuard, a 0.3B-parameter schema-conditioned bidirectional encoder for LLM content moderation.
  • Re-frames content moderation as classification, encoding task definitions and label semantics into input schemas.
  • Simultaneously evaluates prompt/response safety, 14 harm categories, and 11 jailbreak strategies.
  • Matches larger models' accuracy while achieving up to 16x higher throughput and 17x lower latency.

Why it matters

Current LLM guardrails are large and slow, hindering real-time, multi-aspect content moderation. GLiGuard offers a compact, efficient solution that matches larger models' accuracy while drastically reducing inference costs. This makes scalable and practical LLM safeguarding more accessible.

Original Abstract

Ensuring safe, policy-compliant outputs from large language models requires real-time content moderation that can scale across multiple safety dimensions. However, state-of-the-art guardrail models rely on autoregressive decoders with 7B--27B parameters, reformulating what is fundamentally a classification problem as sequential text generation, a design choice that incurs high latency and scales poorly to multi-aspect evaluation. In this work, we introduce \textbf{GLiGuard}, a 0.3B-parameter schema-conditioned bidirectional encoder adapted from GLiNER2 for LLM content moderation. The key idea is to encode task definitions and label semantics directly into the input sequence as structured token schemas, enabling simultaneous evaluation of prompt safety, response safety, refusal detection, 14 fine-grained harm categories, and 11 jailbreak strategies in a single non-autoregressive forward pass. This schema-conditioned design lets supported task and label blocks be composed directly in the input schema at inference time. Across nine established safety benchmarks, GLiGuard achieves F1 scores competitive with 7B--27B decoder-based guards despite being 23--90$\times$ smaller, while delivering up to 16$\times$ higher throughput and 17$\times$ lower latency. These results suggest that compact bidirectional encoders can approach the accuracy of much larger guard models while drastically reducing inference cost. Code and models are available at https://github.com/fastino-ai/GLiGuard.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.