SC-Taxo: Hierarchical Taxonomy Generation under Semantic Consistency Constraints using Large Language Models
Shiqiang Cai, Nianhong Niu, Shizhu He, Kang Liu, Jun Zhao
TLDR
SC-Taxo uses LLMs with hierarchy-aware refinement to generate semantically consistent scientific taxonomies, improving organization and access.
Key contributions
- Generates scientific taxonomies using LLMs with hierarchy-aware refinement.
- Employs a bidirectional heading generation for bottom-up abstraction and top-down constraints.
- Captures peer-level semantic dependencies to ensure horizontal consistency.
- Achieves improved hierarchy alignment, heading quality, and cross-lingual generalization.
Why it matters
Existing taxonomy generation methods often suffer from structural inconsistencies and semantic misalignment. SC-Taxo addresses this by ensuring hierarchical semantic consistency, making scientific knowledge more organized and accessible. This improves literature exploration and enables downstream applications like trend analysis.
Original Abstract
Scientific literature is expanding at an unprecedented pace, making it increasingly challenging to efficiently organize and access domain knowledge. A high-quality scientific taxonomy offers a structured and hierarchical representation of a research field, facilitating literature exploration and topic navigation, as well as enabling downstream applications such as trend analysis, idea generation, and information retrieval. However, existing taxonomy generation approaches often suffer from structural inconsistencies and semantic misalignment across hierarchical levels. Through empirical analysis, we find that these issues largely stem from inadequate modeling of hierarchical semantic consistency. To address this limitation, we propose a semantic-consistent taxonomy generation (SC-Taxo) framework that leverages large language models (LLMs) with hierarchy-aware refinement stages to ensure semantic consistency. Specifically, SC-Taxo introduces a bidirectional heading generation mechanism that jointly performs bottom-up abstraction and top-down semantic constraint, while further capturing peer-level semantic dependencies to enhance horizontal consistency. Experiments on multiple benchmark datasets demonstrate consistent improvements in hierarchy alignment and heading quality, and additional evaluation on Chinese scientific literature validates its robust cross-lingual generalization.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.