Adaptive Budget Allocation in LLM-Augmented Surveys
Zikun Ye, Jiameng Lyu, Rui Tao
TLDR
This paper introduces an adaptive algorithm for optimally allocating human labeling budgets in LLM-augmented surveys, significantly reducing waste.
Key contributions
- Proposes an adaptive algorithm for real-time human budget allocation in LLM-augmented surveys.
- Learns LLM reliability per question while collecting human responses, directing budget to harder questions.
- Reduces human labeling budget waste by 8-10% compared to uniform allocation on real data.
- Achieves same estimation quality with fewer human samples and no pilot study.
Why it matters
LLMs can cut survey costs, but their varying reliability demands human oversight. This algorithm optimizes human effort, ensuring more accurate data collection with less waste. It offers a practical, provably efficient solution for integrating LLMs into surveys.
Original Abstract
Large language models (LLMs) can generate survey responses at low cost, but their reliability varies substantially across questions and is unknown before data collection. Deploying LLMs in surveys still requires costly human responses for verification and correction. How should a limited human-labeling budget be allocated across questions in real time? We propose an adaptive allocation algorithm that learns which questions are hardest for the LLM while simultaneously collecting human responses. Each human label serves a dual role: it improves the estimate for that question and reveals how well the LLM predicts human responses on it. The algorithm directs more budget to questions where the LLM is least reliable, without requiring any prior knowledge of question-level LLM accuracy. We prove that the allocation gap relative to the best possible allocation vanishes as the budget grows, and validate the approach on both synthetic data and a real survey dataset with 68 questions and over 2000 respondents. On real survey data, the standard practice of allocating human labels uniformly across questions wastes 10--12% of the budget relative to the optimal; our algorithm reduces this waste to 2--6%, and the advantage grows as questions become more heterogeneous in LLM prediction quality. The algorithm achieves the same estimation quality as traditional uniform sampling with fewer human samples, requires no pilot study, and is backed by formal performance guarantees validated on real survey data. More broadly, the framework applies whenever scarce human oversight must be allocated across tasks where LLM reliability is unknown.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.