TypePro: Boosting LLM-Based Type Inference via Inter-Procedural Slicing
Teyu Lin, Minghao Fan, Huaxun Huang, Zhirong Shen, Rongxin Wu
TLDR
TypePro boosts LLM-based type inference for dynamic languages by using inter-procedural slicing to provide more complete contextual information.
Key contributions
- TypePro enhances LLM-based type inference for dynamic languages like Python and JavaScript.
- Employs inter-procedural code slicing to provide more complete contextual information for LLMs.
- Generates candidate complex types from code slice structure, addressing LLM domain knowledge gaps.
- Achieves state-of-the-art Top-1 EM rates (88.9% Python, 86.6% TypeScript), significantly outperforming prior work.
Why it matters
TypePro addresses a critical limitation in LLM-based type inference by providing more complete contextual information. This leads to substantial improvements in accuracy for dynamic languages, making type inference more robust and reliable for widely used languages like Python and JavaScript.
Original Abstract
Dynamic languages (such as Python and JavaScript) offer flexibility and simplified type handling for programming, but this can also lead to an increase in type-related errors and additional overhead for compile-time type inference. As a result, type inference for dynamic languages has become a popular research area. Existing approaches typically achieve type inference through static analysis, machine learning, or large language models (LLMs). However, current work only focuses on the direct dependencies of variables related to type inference as the context, resulting in incomplete contextual information and thus affecting the accuracy of type inference. To address this issue, this paper proposes a method called TypePro, which leverages LLMs for type inference in dynamic languages. TypePro supplements contextual information by conducting inter-procedural code slicing. Then, TypePro proposes a set of candidate complex types based on the structural information of data types implied in the slices, thereby addressing the lack of domain knowledge of LLMs. We conducted experiments on the ManyTypes4Py and ManyTypes4TypeScript datasets, achieving Top-1 exact match (EM) rates of 88.9% and 86.6%, respectively. Notably, TypePro improves the Top-1 Exact Match by 7.1 and 10.3 percentage points over the second-best approach, showing the effectiveness and robustness of TypePro.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.