ArXiv TLDR
← All categories

Artificial Intelligence

Research on AI systems, knowledge representation, planning, and general intelligence.

cs.AI · 1428 papers

Beyond GRPO and On-Policy Distillation: An Empirical Sparse-to-Dense Reward Principle for Language-Model Post-Training

A new principle for LM post-training uses sparse rewards for strong teachers and dense distillation for students, outperforming direct sparse RL.

2605.12483May 12, 2026Yuanda Xu, Hejian Sang, Zhengze Zhou +3

ToolCUA: Towards Optimal GUI-Tool Path Orchestration for Computer Use Agents

ToolCUA enables Computer Use Agents to optimally orchestrate GUI actions and high-level tools using a staged training paradigm, achieving new SOTA.

2605.12481May 12, 2026Xuhao Hu, Xi Zhang, Haiyang Xu +6

OmniNFT: Modality-wise Omni Diffusion Reinforcement for Joint Audio-Video Generation

OmniNFT proposes a novel diffusion RL framework to improve joint audio-video generation by addressing multi-modal challenges like gradient imbalance.

2605.12480May 12, 2026Guohui Zhang, XiaoXiao Ma, Jie Huang +9

Reward Hacking in Rubric-Based Reinforcement Learning

This paper investigates reward hacking in rubric-based RL, finding that even strong verifiers don't prevent issues if rubrics are flawed, leading to quality declines.

2605.12474May 12, 2026Anas Mahmoud, MohammadHossein Rezaei, Zihao Wang +3

KV-Fold: One-Step KV-Cache Recurrence for Long-Context Inference

KV-Fold enables stable, training-free long-context inference by treating the KV-cache as an accumulator, achieving high fidelity and memory efficiency.

2605.12471May 12, 2026Alireza Nadali, Patrick Cooper, Ashutosh Trivedi +1

Solve the Loop: Attractor Models for Language and Reasoning

Attractor Models introduce a stable, efficient fixed-point refinement method for iterative Transformers, significantly boosting performance in language and reasoning tasks.

2605.12466May 12, 2026Jacob Fein-Ashley, Paria Rashidinejad

Towards Affordable Energy: A Gymnasium Environment for Electric Utility Demand-Response Programs

DR-Gym is a new Gymnasium environment for training RL agents to optimize electric utility demand-response programs, improving grid flexibility and affordability.

2605.12462May 12, 2026Jose E. Aguilar Escamilla, Lingdong Zhou, Xiangqi Zhu +1

Enabling AI-Native Mobility in 6G: A Real-World Dataset for Handover, Beam Management, and Timing Advance

This paper introduces a real-world dataset from a commercial 5G network to enable AI-native mobility, focusing on handover and timing advance.

2605.12453May 12, 2026Mannam Veera Narayana, Rohit Singh, Deepa M. R +1

The Algorithmic Caricature: Auditing LLM-Generated Political Discourse Across Crisis Events

This paper audits LLM-generated political discourse during crises, finding it lacks population realism compared to observed online content.

2605.12452May 12, 2026Gunjan, Sidahmed Benabderrahmane, Talal Rahwan

A Causal Language Modeling Detour Improves Encoder Continued Pretraining

A Causal Language Modeling detour during encoder continued pretraining boosts downstream performance, outperforming standard MLM, especially in biomedicine.

2605.12438May 12, 2026Rian Touchent, Eric de la Clergerie

CAAFC: Chronological Actionable Automated Fact-Checker for misinformation / non-factual hallucination detection and correction

CAAFC is a new framework that automates fact-checking, detecting and correcting misinformation and AI hallucinations with actionable, source-backed justifications.

2605.12436May 12, 2026Islam Eldifrawi, Shengrui Wang, Amine Trabelsi

Formalize, Don't Optimize: The Heuristic Trap in LLM-Generated Combinatorial Solvers

LLMs should formalize, not optimize, combinatorial solvers, as attempts at search optimization lead to a "heuristic trap" and reduced correctness.

2605.12421May 12, 2026Haoyu Wang, Yuliang Song, Tao Li +5

Stories in Space: In-Context Learning Trajectories in Conceptual Belief Space

LLMs update beliefs in a low-dimensional conceptual space, showing in-context learning as trajectories through this space, grounded in structured representations.

2605.12412May 12, 2026Eric Bigelow, Raphaël Sarfati, Daniel Wurgaft +5

Predicting Decisions of AI Agents from Limited Interaction through Text-Tabular Modeling

A new text-tabular model, using an "LLM-as-Observer," accurately predicts unfamiliar AI agent decisions in negotiation games from limited interactions.

2605.12411May 12, 2026Eilam Shapira, Moshe Tennenholtz, Roi Reichart

Semantic Reward Collapse and the Preservation of Epistemic Integrity in Adaptive AI Systems

Semantic Reward Collapse (SRC) explains why AI suppresses uncertainty; Constitutional Reward Stratification (CRS) is proposed to preserve epistemic integrity.

2605.12406May 12, 2026William Parris

OGLS-SD: On-Policy Self-Distillation with Outcome-Guided Logit Steering for LLM Reasoning

OGLS-SD enhances LLM reasoning by using outcome-guided logit steering to correct teacher-student mismatches in on-policy self-distillation.

2605.12400May 12, 2026Yuxiao Yang, Xiaoyun Wang, Weitong Zhang

Detecting overfitting in Neural Networks during long-horizon grokking using Random Matrix Theory

A new Random Matrix Theory method detects overfitting in neural networks, even in large LLMs, by identifying "Correlation Traps" in weight matrices.

2605.12394May 12, 2026Hari K. Prakash, Charles H Martin

SEMIR: Semantic Minor-Induced Representation Learning on Graphs for Visual Segmentation

SEMIR is a graph-based representation learning framework for visual segmentation that efficiently handles small, sparse structures by decoupling inference from the image grid.

2605.12389May 12, 2026Luke James Miller, Yugyung Lee

Scalable Token-Level Hallucination Detection in Large Language Models

TokenHD is a scalable pipeline for training token-level hallucination detectors in LLMs, outperforming larger models in detecting reasoning errors.

2605.12384May 12, 2026Rui Min, Tianyu Pang, Chao Du +2

Fill the GAP: A Granular Alignment Paradigm for Visual Reasoning in Multimodal Large Language Models

GAP proposes a granular alignment paradigm to stabilize visual latent reasoning in MLLMs by addressing feature-space mismatches, improving performance.

2605.12374May 12, 2026Yanting Miao, Yutao Sun, Dexin Wang +8
PreviousPage 3 of 72Next

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.