ArXiv TLDR

FutureWorld: A Live Environment for Training Predictive Agents with Real-World Outcome Rewards

🐦 Tweet
2604.26733

Zhixin Han, Yanzhi Zhang, Chuyang Wei, Maohang Gao, Xiawei Yue + 9 more

cs.AIcs.LG

TLDR

FutureWorld is a live RL environment for training predictive agents on real-world events, closing the loop between prediction and outcome.

Key contributions

  • Introduces FutureWorld, a live RL environment for training agents on real-world future prediction.
  • Closes the training loop: prediction, outcome realization, and parameter updates for continuous learning.
  • Trains open-source base models over consecutive days, demonstrating effective learning.
  • Establishes a daily benchmark, evaluating frontier agents and setting performance baselines.

Why it matters

This paper introduces FutureWorld, a novel live reinforcement learning environment crucial for advancing agent training in real-world future prediction. It enables continuous learning by closing the loop between predictions and actual outcomes. This work provides a vital platform and benchmark for developing more robust and adaptable predictive AI systems.

Original Abstract

Live future prediction refers to the task of making predictions about real-world events before they unfold. This task is increasingly studied using large language model-based agent systems, and it is important for building agents that can continually learn from real-world. Just as interactive environments have often driven progress in agents, advancing live future prediction naturally motivates viewing it as a learning environment. Prior works have explored future prediction from several different parts, but have generally not framed it as a unified learning environment. This task is appealing for learning because it can provide a large number of prediction questions grounded in diverse real-world events, while preventing answer leakage. To leverage the advantages of live future prediction, we present FutureWorld, a live agentic reinforcement learning environment that closes the training loop between prediction, outcome realization, and parameters update. In our environment, we take three open-source base models and train them for consecutive days. The results show that training is effective. Furthermore, we build a daily benchmark based on the environment and evaluate several frontier agents on it to establish performance baselines for current agent systems.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.