ArXiv TLDR

Robust Multimodal Recommendation via Graph Retrieval-Enhanced Modality Completion

🐦 Tweet
2605.00670

Yuan Li, Jun Hu, Jiaxin Jiang, Bryan Hooi, Bingsheng He

cs.IRcs.SI

TLDR

GRE-MC enhances multimodal recommendation by completing missing data using graph retrieval and a transformer for robust, context-aware feature reconstruction.

Key contributions

  • Proposes GRE-MC, a novel framework for robust multimodal recommendation with incomplete data.
  • Employs modality-aware subgraph retrieval to gather rich, semantically relevant context.
  • Uses a graph transformer with global attention for joint encoding and feature completion.
  • Enhances robustness with a learnable sparse-routing codebook for latent embeddings.

Why it matters

Multimodal recommendation systems often suffer from missing data, degrading performance. This paper introduces GRE-MC, a robust solution that leverages graph retrieval and transformers to accurately complete missing features. This significantly improves the reliability and effectiveness of recommendations in real-world scenarios.

Original Abstract

Multimodal data plays a critical role in web-based recommendation systems, where information from diverse modalities such as vision and text enhances representation learning. However, real-world multimodal datasets often suffer from modality incompleteness due to sensor failures, annotation scarcity, or privacy constraints, which substantially degrade model performance and reliability. One effective solution to address this issue is modality completion, which reconstructs missing features to provide modality-complete graphs for downstream tasks. Given a query node with missing multimodal features, existing modality completion methods typically infer information from the node itself or its neighbors to reconstruct the missing modality. However, these methods may overlook semantically relevant context in the graph, which contains valuable cues that are non-trivial to capture through simple methods like neighborhood aggregation. In this work, we propose GRE-MC, a Graph Retrieval-Enhanced Modality Completion framework, to overcome these limitations. By introducing a modality-aware subgraph retrieval mechanism, GRE-MC selects semantically relevant subgraphs from the entire graph, providing richer contextual information for completing missing modalities. Subsequently, a graph transformer jointly encodes the query node and the retrieved subgraph via global attention to complete the missing features, while a learnable sparse-routing codebook regularizes latent embeddings into compact bases for improved robustness. Extensive experiments on multimodal recommendation benchmarks demonstrate that GRE-MC consistently outperforms state-of-the-art methods, validating the effectiveness of subgraph retrieval and joint-encoding graph transformer for robust modality completion.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.