ArXiv TLDR

Unlocking the Power of Large Language Models for Multi-table Entity Matching

🐦 Tweet
2604.21238

Yingkai Tang, Taoyu Su, Wenyuan Zhang, Xiaoyang Guo, Tingwen Liu

cs.CLcs.IR

TLDR

LLM4MEM uses LLMs to improve multi-table entity matching by resolving semantic inconsistencies, boosting efficiency, and pruning noisy entities.

Key contributions

  • Proposes LLM4MEM, a novel LLM-based framework for multi-table entity matching.
  • Introduces a multi-style prompt-enhanced LLM module to address semantic inconsistencies in attributes.
  • Develops a transitive consensus embedding module for efficient matching across numerous entities.
  • Implements a density-aware pruning module to optimize matching quality by removing noisy entities.

Why it matters

Multi-table entity matching is crucial for integrating diverse data, but existing methods struggle with inconsistencies and efficiency. This paper leverages LLMs to tackle these core challenges, offering a robust solution. The significant F1 score improvement demonstrates a practical advancement in data integration.

Original Abstract

Multi-table entity matching (MEM) addresses the limitations of dual-table approaches by enabling simultaneous identification of equivalent entities across multiple data sources without unique identifiers. However, existing methods relying on pre-trained language models struggle to handle semantic inconsistencies caused by numerical attribute variations. Inspired by the powerful language understanding capabilities of large language models (LLMs), we propose a novel LLM-based framework for multi-table entity matching, termed LLM4MEM. Specifically, we first propose a multi-style prompt-enhanced LLM attribute coordination module to address semantic inconsistencies. Then, to alleviate the matching efficiency problem caused by the surge in the number of entities brought by multiple data sources, we develop a transitive consensus embedding matching module to tackle entity embedding and pre-matching issues. Finally, to address the issue of noisy entities during the matching process, we introduce a density-aware pruning module to optimize the quality of multi-table entity matching. We conducted extensive experiments on 6 MEM datasets, and the results show that our model improves by an average of 5.1% in F1 compared with the baseline model. Our code is available at https://github.com/Ymeki/LLM4MEM.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.