Artificial Intelligence
Research on AI systems, knowledge representation, planning, and general intelligence.
cs.AI · 1428 papersBeyond GRPO and On-Policy Distillation: An Empirical Sparse-to-Dense Reward Principle for Language-Model Post-Training
A new principle for LM post-training uses sparse rewards for strong teachers and dense distillation for students, outperforming direct sparse RL.
ToolCUA: Towards Optimal GUI-Tool Path Orchestration for Computer Use Agents
ToolCUA enables Computer Use Agents to optimally orchestrate GUI actions and high-level tools using a staged training paradigm, achieving new SOTA.
OmniNFT: Modality-wise Omni Diffusion Reinforcement for Joint Audio-Video Generation
OmniNFT proposes a novel diffusion RL framework to improve joint audio-video generation by addressing multi-modal challenges like gradient imbalance.
Reward Hacking in Rubric-Based Reinforcement Learning
This paper investigates reward hacking in rubric-based RL, finding that even strong verifiers don't prevent issues if rubrics are flawed, leading to quality declines.
KV-Fold: One-Step KV-Cache Recurrence for Long-Context Inference
KV-Fold enables stable, training-free long-context inference by treating the KV-cache as an accumulator, achieving high fidelity and memory efficiency.
Solve the Loop: Attractor Models for Language and Reasoning
Attractor Models introduce a stable, efficient fixed-point refinement method for iterative Transformers, significantly boosting performance in language and reasoning tasks.
Towards Affordable Energy: A Gymnasium Environment for Electric Utility Demand-Response Programs
DR-Gym is a new Gymnasium environment for training RL agents to optimize electric utility demand-response programs, improving grid flexibility and affordability.
Enabling AI-Native Mobility in 6G: A Real-World Dataset for Handover, Beam Management, and Timing Advance
This paper introduces a real-world dataset from a commercial 5G network to enable AI-native mobility, focusing on handover and timing advance.
The Algorithmic Caricature: Auditing LLM-Generated Political Discourse Across Crisis Events
This paper audits LLM-generated political discourse during crises, finding it lacks population realism compared to observed online content.
A Causal Language Modeling Detour Improves Encoder Continued Pretraining
A Causal Language Modeling detour during encoder continued pretraining boosts downstream performance, outperforming standard MLM, especially in biomedicine.
CAAFC: Chronological Actionable Automated Fact-Checker for misinformation / non-factual hallucination detection and correction
CAAFC is a new framework that automates fact-checking, detecting and correcting misinformation and AI hallucinations with actionable, source-backed justifications.
Formalize, Don't Optimize: The Heuristic Trap in LLM-Generated Combinatorial Solvers
LLMs should formalize, not optimize, combinatorial solvers, as attempts at search optimization lead to a "heuristic trap" and reduced correctness.
Stories in Space: In-Context Learning Trajectories in Conceptual Belief Space
LLMs update beliefs in a low-dimensional conceptual space, showing in-context learning as trajectories through this space, grounded in structured representations.
Predicting Decisions of AI Agents from Limited Interaction through Text-Tabular Modeling
A new text-tabular model, using an "LLM-as-Observer," accurately predicts unfamiliar AI agent decisions in negotiation games from limited interactions.
Semantic Reward Collapse and the Preservation of Epistemic Integrity in Adaptive AI Systems
Semantic Reward Collapse (SRC) explains why AI suppresses uncertainty; Constitutional Reward Stratification (CRS) is proposed to preserve epistemic integrity.
OGLS-SD: On-Policy Self-Distillation with Outcome-Guided Logit Steering for LLM Reasoning
OGLS-SD enhances LLM reasoning by using outcome-guided logit steering to correct teacher-student mismatches in on-policy self-distillation.
Detecting overfitting in Neural Networks during long-horizon grokking using Random Matrix Theory
A new Random Matrix Theory method detects overfitting in neural networks, even in large LLMs, by identifying "Correlation Traps" in weight matrices.
SEMIR: Semantic Minor-Induced Representation Learning on Graphs for Visual Segmentation
SEMIR is a graph-based representation learning framework for visual segmentation that efficiently handles small, sparse structures by decoupling inference from the image grid.
Scalable Token-Level Hallucination Detection in Large Language Models
TokenHD is a scalable pipeline for training token-level hallucination detectors in LLMs, outperforming larger models in detecting reasoning errors.
Fill the GAP: A Granular Alignment Paradigm for Visual Reasoning in Multimodal Large Language Models
GAP proposes a granular alignment paradigm to stabilize visual latent reasoning in MLLMs by addressing feature-space mismatches, improving performance.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.