ArXiv TLDR

Beyond RAG for Cyber Threat Intelligence: A Systematic Evaluation of Graph-Based and Agentic Retrieval

🐦 Tweet
2604.11419

Dzenan Hamzic, Florian Skopik, Max Landauer, Markus Wurzenberger, Andreas Rauber

cs.AIcs.CR

TLDR

Evaluates graph-based and agentic RAG for Cyber Threat Intelligence, showing hybrid graph-text RAG improves multi-hop query answers by up to 35%.

Key contributions

  • Systematically evaluates four RAG architectures for Cyber Threat Intelligence analysis.
  • Finds graph grounding improves performance on structured factual CTI queries.
  • Hybrid graph-text RAG boosts multi-hop question answer quality by up to 35% over vector RAG.
  • Hybrid approach offers more reliable performance than graph-only systems for CTI analysis.

Why it matters

This paper provides crucial insights into improving RAG for complex Cyber Threat Intelligence analysis, where relational reasoning is vital. It demonstrates how hybrid graph-text approaches can significantly enhance answer quality for multi-hop queries, offering a path to more effective CTI systems.

Original Abstract

Cyber threat intelligence (CTI) analysts must answer complex questions over large collections of narrative security reports. Retrieval-augmented generation (RAG) systems help language models access external knowledge, but traditional vector retrieval often struggles with queries that require reasoning over relationships between entities such as threat actors, malware, and vulnerabilities. This limitation arises because relevant evidence is often distributed across multiple text fragments and documents. Knowledge graphs address this challenge by enabling structured multi-hop reasoning through explicit representations of entities and relationships. However, multiple retrieval paradigms, including graph-based, agentic, and hybrid approaches, have emerged with different assumptions and failure modes. It remains unclear how these approaches compare in realistic CTI settings and when graph grounding improves performance. We present a systematic evaluation of four RAG architectures for CTI analysis: standard vector retrieval, graph-based retrieval over a CTI knowledge graph, an agentic variant that repairs failed graph queries, and a hybrid approach combining graph queries with text retrieval. We evaluate these systems on 3,300 CTI question-answer pairs spanning factual lookups, multi-hop relational queries, analyst-style synthesis questions, and unanswerable cases. Results show that graph grounding improves performance on structured factual queries. The hybrid graph-text approach improves answer quality by up to 35 percent on multi-hop questions compared to vector RAG, while maintaining more reliable performance than graph-only systems.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.