ArXiv TLDR

Enhancing Healthcare Search Intent Recognition with Query Representation Learning and Session Context

🐦 Tweet
2605.10021

Harshita Jagdish Sahijwani, Madhav Sigdel, Song Aslan, Priya Gopi Achuthan, Monica D. Skidmore + 2 more

cs.IR

TLDR

Improves healthcare search intent recognition by learning query representations and leveraging session context for better accuracy.

Key contributions

  • Introduces clustering-based query representation learning to handle ambiguous health queries.
  • Proposes a novel loss function capturing multiple intents in healthcare search queries.
  • Defines concordance rate (CR) to measure intent ambiguity and session-global intent misalignment.
  • Incorporates learned query representations into session-based intent classification, boosting accuracy.

Why it matters

Accurately identifying healthcare search intent is vital for delivering relevant info but is challenged by ambiguous queries and limited data. This paper offers scalable methods to better capture query intent diversity and session context, improving classification performance.

Original Abstract

Classifying the intent behind healthcare search queries is crucial for improving the delivery of online healthcare information. The intricate nature of medical search queries, coupled with the limited availability of high-quality labeled data, presents substantial challenges for developing efficient classification models. Previous studies have exploited user interaction data, such as user clicks from search logs and employed pairwise loss functions to model co-click behavior for query representation learning. However, many health queries could have multiple intents, resulting in ambiguous or divergent click behavior. Furthermore, learning the single most popular intent of queries as inferred from global statistics based on the aggregate behavior of different users could potentially lead to disparity and performance drop when classifying the query intent within specific search sessions. To address these limitations, our work improves the query representation learning by aggregating similar queries via clustering, and introducing a novel loss function designed to capture the multifaceted nature of health search queries, resulting in a more scalable and accurate learning procedure. Furthermore, we quantify the ambiguity of health queries and the misalignment between global search intents and those discerned from individual sessions, by introducing the concordance rate (CR) score, and demonstrate a simple and effective method for incorporating our learned query representation into contextual, session-based search intent classification. Our extensive experimental results and analysis on two real-world search log datasets, i.e., a Health Search (HS) dataset and the publicly available TripClick dataset, demonstrate that our approach not only improves the intrinsic clustering metrics for query representation learning but also enhances accuracy for subsequent search intent classification tasks.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.