Xiangru Tang

4 papers · Latest: April 27, 2026

The Last Human-Written Paper: Agent-Native Research Artifacts

Ara is a new protocol for machine-executable research packages, enhancing AI's ability to understand, reproduce, and extend scientific work by preserving full research context.

2604.24658Apr 27, 2026

Software Engineering

StarCoder 2 and The Stack v2: The Next Generation

StarCoder2 is a next-generation open-source Code LLM trained on a vastly expanded and diverse dataset, achieving state-of-the-art performance on multiple code benchmarks while being more parameter-efficient than larger models.

2402.19173Feb 29, 2024

Natural Language Processing

OctoPack: Instruction Tuning Code Large Language Models

OctoPack introduces instruction tuning for code LLMs using a massive dataset of Git commits, achieving state-of-the-art results on multi-language coding benchmarks without relying on OpenAI data.

2308.07124Aug 14, 2023

Natural Language Processing

Crosslingual Generalization through Multitask Finetuning

This paper demonstrates that multitask finetuning of large multilingual language models on English and machine-translated prompts enables strong zero-shot crosslingual generalization to many languages, including those unseen during training.

2211.01786Nov 3, 2022

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.