ArXiv TLDR

BioHiCL: Hierarchical Multi-Label Contrastive Learning for Biomedical Retrieval with MeSH Labels

🐦 Tweet
2604.15591

Mengfei Lan, Lecheng Zheng, Halil Kilicoglu

cs.IRcs.AI

TLDR

BioHiCL uses hierarchical multi-label contrastive learning with MeSH annotations to significantly improve biomedical information retrieval and related tasks.

Key contributions

  • Introduces BioHiCL, a novel hierarchical multi-label contrastive learning framework.
  • Leverages hierarchical MeSH annotations to provide structured supervision for biomedical retrieval.
  • Achieves promising performance on biomedical retrieval, sentence similarity, and question answering tasks.
  • Offers computationally efficient models (BioHiCL-Base/Large) suitable for deployment.

Why it matters

This paper addresses the limitations of existing biomedical retrievers by incorporating rich hierarchical semantic information. By using MeSH labels, BioHiCL significantly improves the accuracy and efficiency of biomedical information access. This advancement is crucial for research and clinical applications.

Original Abstract

Effective biomedical information retrieval requires modeling domain semantics and hierarchical relationships among biomedical texts. Existing biomedical generative retrievers build on coarse binary relevance signals, limiting their ability to capture semantic overlap. We propose BioHiCL (Biomedical Retrieval with Hierarchical Multi-Label Contrastive Learning), which leverages hierarchical MeSH annotations to provide structured supervision for multi-label contrastive learning. Our models, BioHiCL-Base (0.1B) and BioHiCL-Large (0.3B), achieve promising performance on biomedical retrieval, sentence similarity, and question answering tasks, while remaining computationally efficient for deployment.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.