Leveraging LLM-GNN Integration for Open-World Question Answering over Knowledge Graphs

April 15, 20262604.13979

Hussein Abdallah, Ibrahim Abdelaziz, Panos Kalnis, Essam Mansour

cs.CLcs.AIcs.DB

TLDR

GLOW is a hybrid LLM-GNN system for open-world question answering over incomplete knowledge graphs, combining symbolic and semantic reasoning.

Key contributions

Introduces GLOW, a hybrid LLM-GNN system for Open-World Question Answering (OW-QA) on incomplete KGs.
Combines GNN-predicted candidates with KG facts in structured prompts to guide LLM reasoning.
Enables joint symbolic and semantic reasoning without relying on retrieval or fine-tuning.
Presents GLOW-BENCH, a new 1,000-question benchmark for OW-QA over incomplete KGs.

Why it matters

Open-world QA is vital for real-world KGs that are often incomplete. Existing methods struggle with missing information and complex reasoning. GLOW offers a robust solution by effectively integrating LLM and GNN strengths, significantly advancing reliable and generalizable KGQA.

Original Abstract

Open-world Question Answering (OW-QA) over knowledge graphs (KGs) aims to answer questions over incomplete or evolving KGs. Traditional KGQA assumes a closed world where answers must exist in the KG, limiting real-world applicability. In contrast, open-world QA requires inferring missing knowledge based on graph structure and context. Large language models (LLMs) excel at language understanding but lack structured reasoning. Graph neural networks (GNNs) model graph topology but struggle with semantic interpretation. Existing systems integrate LLMs with GNNs or graph retrievers. Some support open-world QA but rely on structural embeddings without semantic grounding. Most assume observed paths or complete graphs, making them unreliable under missing links or multi-hop reasoning. We present GLOW, a hybrid system that combines a pre-trained GNN and an LLM for open-world KGQA. The GNN predicts top-k candidate answers from the graph structure. These, along with relevant KG facts, are serialized into a structured prompt (e.g., triples and candidates) to guide the LLM's reasoning. This enables joint reasoning over symbolic and semantic signals, without relying on retrieval or fine-tuning. To evaluate generalization, we introduce GLOW-BENCH, a 1,000-question benchmark over incomplete KGs across diverse domains. GLOW outperforms existing LLM-GNN systems on standard benchmarks and GLOW-BENCH, achieving up to 53.3% and an average 38% improvement. GitHub code and data are available.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers