ArXiv TLDR
← All categories

Machine Learning

Papers on learning algorithms, neural networks, deep learning, and optimization.

cs.LG · 1353 papers

Predicting Decisions of AI Agents from Limited Interaction through Text-Tabular Modeling

A new text-tabular model, using an "LLM-as-Observer," accurately predicts unfamiliar AI agent decisions in negotiation games from limited interactions.

2605.12411May 12, 2026Eilam Shapira, Moshe Tennenholtz, Roi Reichart

Model-based Bootstrap of Controlled Markov Chains

This paper proposes a model-based bootstrap for controlled Markov chains in offline RL, yielding consistent estimators and valid confidence intervals.

2605.12410May 12, 2026Ziwei Su, Imon Banerjee, Diego Klabjan

OGLS-SD: On-Policy Self-Distillation with Outcome-Guided Logit Steering for LLM Reasoning

OGLS-SD enhances LLM reasoning by using outcome-guided logit steering to correct teacher-student mismatches in on-policy self-distillation.

2605.12400May 12, 2026Yuxiao Yang, Xiaoyun Wang, Weitong Zhang

Detecting overfitting in Neural Networks during long-horizon grokking using Random Matrix Theory

A new Random Matrix Theory method detects overfitting in neural networks, even in large LLMs, by identifying "Correlation Traps" in weight matrices.

2605.12394May 12, 2026Hari K. Prakash, Charles H Martin

Trajectory-Agnostic Asteroid Detection in TESS with Deep Learning

This paper introduces a deep learning W-Net method for trajectory-agnostic asteroid detection in TESS data, robust to varying speeds and directions.

2605.12391May 12, 2026Brian P. Powell, Jorge Martinez-Palomera, Amy Tuson +3

SEMIR: Semantic Minor-Induced Representation Learning on Graphs for Visual Segmentation

SEMIR is a graph-based representation learning framework for visual segmentation that efficiently handles small, sparse structures by decoupling inference from the image grid.

2605.12389May 12, 2026Luke James Miller, Yugyung Lee

Scalable Token-Level Hallucination Detection in Large Language Models

TokenHD is a scalable pipeline for training token-level hallucination detectors in LLMs, outperforming larger models in detecting reasoning errors.

2605.12384May 12, 2026Rui Min, Tianyu Pang, Chao Du +2

Fill the GAP: A Granular Alignment Paradigm for Visual Reasoning in Multimodal Large Language Models

GAP proposes a granular alignment paradigm to stabilize visual latent reasoning in MLLMs by addressing feature-space mismatches, improving performance.

2605.12374May 12, 2026Yanting Miao, Yutao Sun, Dexin Wang +8

Attacks and Mitigations for Distributed Governance of Agentic AI under Byzantine Adversaries

This paper analyzes attacks on agentic AI governance from compromised centralized providers and proposes Byzantine-resilient, monitoring, and auditing solutions.

2605.12364May 12, 2026Matthew D. Laws, Alina Oprea, Cristina Nita-Rotaru

Multi-Variable Conformal Prediction: Optimizing Prediction Sets without Data Splitting

Multi-Variable Conformal Prediction (MCP) optimizes prediction sets by extending conformal prediction to vector-valued scores, eliminating data splitting.

2605.12341May 12, 2026Laura Lützow, Simone Garatti, Marco C. Campi +2

EHR-RAGp: Retrieval-Augmented Prototype-Guided Foundation Model for Electronic Health Records

EHR-RAGp is a retrieval-augmented foundation model for EHRs, dynamically integrating relevant patient history via a prototype-guided module for better clinical predictions.

2605.12335May 12, 2026Saeed Shurrab, Mariam Al-Omari, Dana El Samad +1

From Model Uncertainty to Human Attention: Localization-Aware Visual Cues for Scalable Annotation Review

This paper introduces visual cues for spatial uncertainty in AI-assisted annotation, improving label quality and speed by guiding human attention.

2605.12303May 12, 2026Moussa Kassem Sbeyti, Joshua Holstein, Philipp Spitzer +2

Reconstruction of Personally Identifiable Information from Supervised Finetuned Models

This paper reveals that PII can be reconstructed from supervised finetuned LLMs, proposing COVA to enhance reconstruction under prefix attacks.

2605.12264May 12, 2026Sae Furukawa, Alina Oprea

TMRL: Diffusion Timestep-Modulated Pretraining Enables Exploration for Efficient Policy Finetuning

TMRL introduces diffusion timestep-modulated pretraining to enable efficient exploration and finetuning of robot policies, improving sample efficiency.

2605.12236May 12, 2026Matthew M. Hong, Jesse Zhang, Anusha Nagabandi +1

Optimal Policy Learning under Budget and Coverage Constraints

This paper characterizes optimal policy learning under budget and coverage constraints, showing a knapsack structure and near-optimal algorithms.

2605.12235May 12, 2026Giovanni Cerulli

No More, No Less: Task Alignment in Terminal Agents

A new benchmark, TAB, reveals terminal agents struggle with selectively following relevant instructions while ignoring distractors, highlighting a gap in task alignment.

2605.12233May 12, 2026Sina Mavali, David Pape, Jonathan Evertz +5

TriBand-BEV: Real-Time LiDAR-Only 3D Pedestrian Detection via Height-Aware BEV and High-Resolution Feature Fusion

TriBand-BEV introduces a real-time LiDAR-only 3D pedestrian detection method using a height-aware BEV encoding, outperforming prior methods on KITTI.

2605.12220May 12, 2026Mohammad Khoshkdahan, Alexey Vinel

Self-Supervised Laplace Approximation for Bayesian Uncertainty Quantification

Introduces Self-Supervised Laplace Approximation (SSLA) to quantify Bayesian model predictive uncertainty by refitting on self-predicted data, outperforming classical methods.

2605.12208May 12, 2026Julian Rodemann, Alexander Marquard, Thomas Augustin +1

PrivacySIM: Evaluating LLM Simulation of User Privacy Behavior

PrivacySIM evaluates LLMs' ability to simulate individual privacy decisions, finding persona conditioning improves accuracy but models still struggle.

2605.12147May 12, 2026James Flemings, Murali Annavaram

Learning What Matters: Adaptive Information-Theoretic Objectives for Robot Exploration

QOED improves robot exploration by adaptively identifying and prioritizing observable parameter directions, suppressing nuisance effects for better learning.

2605.12084May 12, 2026Youwei Yu, Jionghao Wang, Zhengming Yu +2
PreviousPage 4 of 68Next

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.