TypeScript Repository Indexing for Code Agent Retrieval
Junsong Pu, Yichen Li, Zhuangbin Chen
TLDR
A new TypeScript parser for ABCoder improves code indexing efficiency for LLM agents by using the TypeScript Compiler API.
Key contributions
- Solves the bottleneck of LSP-based symbol resolution in ABCoder's existing TypeScript parsers.
- Presents `abcoder-ts-parser`, leveraging the TypeScript Compiler API for direct AST and semantic access.
- Achieves reliable and significantly more efficient code indexing for large TypeScript repositories.
Why it matters
This parser significantly boosts the efficiency of code indexing for large TypeScript projects, crucial for LLM-based code agents. It enhances context retrieval by providing faster, more reliable graph-based indexes, improving agent performance.
Original Abstract
Graph-based code indexing can improve context retrieval for LLM-based code agents by preserving call chains and dependency relationships that keyword search and similarity retrieval often miss. ABCoder is an open-source framework that parses codebases into a function-level code index called UniAST, but its existing parsers combine lightweight AST parsers for syntactic analysis with language servers for semantic resolution, but because LSP-based resolution requires a JSON-RPC call for each symbol lookup, these per-symbol calls become a bottleneck on large TypeScript repositories. We present abcoder-ts-parser, a TypeScript parser built on the TypeScript Compiler API that works directly with the compiler's AST, semantic information, and module resolution logic. We evaluate the parser on three open-source TypeScript projects with up to 1.2 million lines of code and find that it produces reliable indexes significantly more efficiently than the existing architecture. For a live demonstration, watch: https://youtu.be/ryssr7ouvdE
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.