SCPRM: A Schema-aware Cumulative Process Reward Model for Knowledge Graph Question Answering
Jiujiu Chen, Yazheng Liu, Sihong Xie, Hui Xiong
TLDR
SCPRM is a new reward model for Knowledge Graph Question Answering that uses schema-aware cumulative rewards to improve multi-hop reasoning accuracy.
Key contributions
- Mitigates "risk compensation" in process reward models for KG reasoning.
- Evaluates reasoning paths using schema-aware cumulative and future rewards.
- Integrates into MCTS (SCPRM-MCTS) for enhanced multi-hop KGQA.
- Improves Hits@k by 1.18% on medical, legal, and CWQ KGQA benchmarks.
Why it matters
Current LLM evaluation struggles with intermediate steps, leading to flawed reasoning paths getting high rewards, especially in complex KG reasoning. SCPRM offers a more robust and risk-sensitive evaluation method by guiding LLMs to accurate multi-hop reasoning paths, crucial for risk-sensitive domains like medicine and law.
Original Abstract
Large language models excel at complex reasoning, yet evaluating their intermediate steps remains challenging. Although process reward models provide step-wise supervision, they often suffer from a risk compensation effect, where incorrect steps are offset by later correct ones, assigning high rewards to flawed reasoning paths. This issue is further exacerbated in knowledge graph (KG) reasoning, as there may exist multiple paths between the start and end entities in the KGs, and a risky step can make the reasoning path flawed. Those limitations are problematic in risk-sensitive tasks such as medical and legal KG reasoning. To address the issues, we propose a Schema-aware Cumulative Process Reward Model (SCPRM) that evaluates reasoning paths by conditioning on the reasoning prefix , and incorporating schema distance between current reasoning step and the implicit target parsed from the query, which provides cumulative and future rewards to guide the path explorations. We further integrate SCPRM into Monte Carlo Tree Search (MCTS) as SCPRM-MCTS to conduct multi-hop reasoning on KGs for question answering (QA) tasks. Across medical and legal KGQA and CWQ, SCPRM-MCTS improves the performance of Hits@k by an average of 1.18% over strong baselines, demonstrating more accurate and risk-sensitive reasoning evaluation.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.