Enhancing Judgment Document Generation via Agentic Legal Information Collection and Rubric-Guided Optimization

May 3, 20262605.02011

Weihang Su, Xuanyi Chen, Yueyue Wu, Qingyao Ai, Yiqun Liu

cs.CLcs.AIcs.IR

TLDR

Judge-R1 enhances LLM-based judgment document generation through agentic legal information collection and rubric-guided optimization, improving accuracy.

Key contributions

Introduces Agentic Legal Information Collection for precise statute and precedent retrieval.
Implements Rubric-Guided Optimization using GRPO with a comprehensive legal reward function.
Enforces adherence to judicial standards and logical reasoning in generated documents.
Achieves significant improvements in legal accuracy and generation quality on the JuDGE benchmark.

Why it matters

Automating judgment document drafting is challenging due to issues like hallucinated references and flawed reasoning. Judge-R1 significantly enhances LLM-based legal document generation, improving accuracy and adherence to judicial standards. This advances judicial efficiency with a more reliable automation solution.

Original Abstract

Automating the drafting of judgment documents is pivotal to judicial efficiency, yet it remains challenging due to the dual requirements of comprehensive retrieval of legal information and rigorous logical reasoning. Existing approaches, typically relying on standard Retrieval-Augmented Generation and Supervised Fine-Tuning, often suffer from insufficient evidence recall, hallucinated statutory references, and logically flawed legal reasoning. To bridge this gap, we propose Judge-R1, a unified framework designed to enhance LLM-based judgment document generation by jointly improving legal information collection and judgment document generation. First, we introduce Agentic Legal Information Collection, which employs a dynamic planning agent to retrieve precise statutes and precedents from multiple sources. Second, we implement Rubric-Guided Optimization, a reinforcement learning phase utilizing Group Relative Policy Optimization (GRPO) with a comprehensive legal reward function to enforce adherence to judicial standards and reasoning logic. Extensive experiments on the JuDGE benchmark demonstrate that Judge-R1 significantly outperforms state-of-the-art baselines in both legal accuracy and generation quality.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers