ComplianceNLP: Knowledge-Graph-Augmented RAG for Multi-Framework Regulatory Gap Detection
Dongxin Guo, Jikun Wu, Siu Ming Yiu
TLDR
ComplianceNLP is a knowledge-graph-augmented RAG system for automated regulatory gap detection, outperforming GPT-4o and improving analyst efficiency.
Key contributions
- Integrates a KG-augmented RAG pipeline grounded in a regulatory knowledge graph (SEC, MiFID II, Basel III).
- Performs multi-task obligation extraction using NER, deontic classification, and cross-reference resolution.
- Conducts compliance gap analysis by mapping obligations to internal policies with severity-aware scoring.
- Achieves 87.7 F1 on gap detection, outperforming GPT-4o+RAG by +3.5 F1 in benchmarks.
Why it matters
Financial institutions face overwhelming regulatory changes, leading to massive fines. ComplianceNLP offers an automated solution to monitor regulations, extract obligations, and detect compliance gaps. This system significantly boosts analyst efficiency and accuracy, mitigating financial and reputational risks.
Original Abstract
Financial institutions must track over 60,000 regulatory events annually, overwhelming manual compliance teams; the industry has paid over USD 300 billion in fines and settlements since the 2008 financial crisis. We present ComplianceNLP, an end-to-end system that automatically monitors regulatory changes, extracts structured obligations, and identifies compliance gaps against institutional policies. The system integrates three components: (1) a knowledge-graph-augmented RAG pipeline grounding generations in a regulatory knowledge graph of 12,847 provisions across SEC, MiFID II, and Basel III; (2) multi-task obligation extraction combining NER, deontic classification, and cross-reference resolution over a shared LEGAL-BERT encoder; and (3) compliance gap analysis that maps obligations to internal policies with severity-aware scoring. On our benchmark, ComplianceNLP achieves 87.7 F1 on gap detection, outperforming GPT-4o+RAG by +3.5 F1, with 94.2% grounding accuracy ($r=0.83$ vs. human judgments) and 83.4 F1 under realistic end-to-end error propagation. Ablations show that knowledge-graph re-ranking contributes the largest marginal gain (+4.6 F1), confirming that structural regulatory knowledge is critical for cross-reference-heavy tasks. Domain-specific knowledge distillation (70B $\to$ 8B) combined with Medusa speculative decoding yields $2.8\times$ inference speedup; regulatory text's low entropy ($H=2.31$ bits vs. $3.87$ general text) produces 91.3% draft-token acceptance rates. In four months of parallel-run deployment processing 9,847 updates at a financial institution, the system achieved 96.0% estimated recall and 90.7% precision, with a $3.1\times$ sustained analyst efficiency gain. We report deployment lessons on trust calibration, GRC integration, and distributional shift monitoring for regulated-domain NLP.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.