ArXiv TLDR

TCMIIES: A Browser-Based LLM-Powered Intelligent Information Extraction System for Academic Literature

🐦 Tweet
2605.07507

Hanqing Zhao

cs.CLcs.IR

TLDR

TCMIIES is a browser-based, zero-installation system leveraging commercial LLMs for privacy-preserving, schema-guided information extraction from academic literature.

Key contributions

  • Introduces TCMIIES, a browser-based, zero-installation platform for structured academic information extraction.
  • Features a novel schema-guided prompting framework with automatic prompt generation, requiring no programming expertise.
  • Ensures data privacy via a pure front-end architecture that processes all information locally in the browser.
  • Achieves over 94% structured output compliance and expert-level accuracy in Traditional Chinese Medicine research.

Why it matters

This paper offers a practical, accessible, and privacy-preserving solution for academic information extraction using LLMs. It removes barriers for researchers by providing a browser-based tool that requires no programming or specialized infrastructure, bridging advanced LLM capabilities with domain-specific needs.

Original Abstract

The exponential growth of academic publications has created an urgent need for automated tools capable of extracting structured knowledge from unstructured scientific texts. While large language models (LLMs) have demonstrated remarkable capabilities in natural language understanding and information extraction, existing solutions often require specialized infrastructure, programming expertise, or fine-tuned domain-specific models that create barriers for researchers in specialized fields. This paper presents TCMIIES, a browser-based, zero-installation platform that leverages commercial LLM APIs to perform structured information extraction from academic literature. The system employs a novel schema-guided prompting framework with automatic system prompt generation, enabling researchers to define custom extraction schemas through an intuitive graphical interface without any programming. TCMIIES features a pure front-end architecture that ensures data privacy by processing all information locally in the browser, supports five major LLM providers, implements concurrent batch processing with automatic retry mechanisms, and provides intelligent field mapping for Chinese academic databases including CNKI and Wanfang. We demonstrate the system's effectiveness through comprehensive evaluation across multiple extraction scenarios in Traditional Chinese Medicine research, achieving structured output compliance rates exceeding 94\% and information extraction accuracy comparable to domain-expert annotation. The system represents a practical, accessible solution that bridges the gap between advanced LLM capabilities and domain-specific academic information extraction needs, particularly for researchers in specialized fields who require flexible, privacy-preserving, and cost-effective extraction tools.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.