Machine Learning

Papers on learning algorithms, neural networks, deep learning, and optimization.

cs.LG · 1353 papers

Quantifying Concentration Phenomena of Mean-Field Transformers in the Low-Temperature Regime

This paper quantifies how token distributions in mean-field transformers rapidly concentrate in the low-temperature regime, remaining metastable.

2605.10931May 11, 2026Albert Alcalde, Leon Bungert, Konstantin Riedl +1

Dynamic Skill Lifecycle Management for Agentic Reinforcement Learning

SLIM dynamically manages external skills for LLM agents in RL, optimizing their active skill set for improved task performance.

2605.10923May 11, 2026Junhao Shen, Teng Zhang, Xiaoyan Zhao +1

Equivariant Reinforcement Learning for Clifford Quantum Circuit Synthesis

Introduces an equivariant RL agent for efficient, scalable Clifford quantum circuit synthesis across varying qubit counts.

2605.10910May 11, 2026Richie Yeung, Aleks Kissinger, Rob Cornish

DataMaster: Towards Autonomous Data Engineering for Machine Learning

DataMaster automates data engineering for ML, using a novel agent framework with tree search, shared data, and memory to boost model performance.

2605.10906May 11, 2026Yaxin Du, Xiyuan Yang, Zhifan Zhou +12

Beyond Red-Teaming: Formal Guarantees of LLM Guardrail Classifiers

This paper introduces a novel method to formally verify LLM guardrail classifiers by analyzing their pre-activation space, revealing hidden safety vulnerabilities.

2605.10901May 11, 2026Nikita Kezins, Urbas Ekka, Pascal Berrang +1

RubricEM: Meta-RL with Rubric-guided Policy Decomposition beyond Verifiable Rewards

RubricEM is a meta-RL framework that uses rubrics to guide policy decomposition and reflection for training research agents without verifiable rewards.

2605.10899May 11, 2026Gaotang Li, Bhavana Dalvi Mishra, Zifeng Wang +9

V4FinBench: Benchmarking Tabular Foundation Models, LLMs, and Standard Methods on Corporate Bankruptcy Prediction

V4FinBench introduces a new large-scale dataset for corporate bankruptcy prediction, benchmarking tabular foundation models and LLMs against standard methods.

2605.10896May 11, 2026Marcin Kostrzewa, Sebastian Tomczak, Roman Furman +4

Unmasking On-Policy Distillation: Where It Helps, Where It Hurts, and Why

This paper introduces a diagnostic framework to analyze on-policy distillation, revealing it helps more on incorrect rollouts and that optimal context varies.

2605.10889May 11, 2026Mohammadreza Armandpour, Fatih Ilhan, David Harrison +6

LoKA: Low-precision Kernel Applications for Recommendation Models At Scale

LoKA introduces a system-model co-design framework to make FP8 low-precision arithmetic practical and efficient for large recommendation models.

2605.10886May 11, 2026Liang Luo, Yinbin Ma, Quanyu Zhu +20

AssayBench: An Assay-Level Virtual Cell Benchmark for LLMs and Agents

AssayBench is a new benchmark for phenotypic screen prediction in virtual cell models, evaluating LLMs and agents on diverse cellular phenotypes.

2605.10876May 11, 2026Edward De Brouwer, Carl Edwards, Alexander Wu +9

Compute Where it Counts: Self Optimizing Language Models

Self-Optimizing Language Models (SOL) dynamically allocate computation per token, improving LLM inference efficiency and quality over static methods.

2605.10875May 11, 2026Yash Akhauri, Mohamed S. Abdelfattah

BEACON: A Multimodal Dataset for Learning Behavioral Fingerprints from Gameplay Data

BEACON is a large, multimodal dataset from competitive Valorant gameplay for continuous authentication and behavioral fingerprinting research.

2605.10867May 11, 2026Ishpuneet Singh, Gursmeep Kaur, Uday Pratap Singh Atwal +3

Masked Generative Transformer Is What You Need for Image Editing

EditMGT, a novel Masked Generative Transformer, offers faster, more precise image editing by localizing changes, outperforming diffusion models.

2605.10859May 11, 2026Wei Chow, Linfeng Li, Xian Sun +14

The Generalized Turing Test: A Foundation for Comparing Intelligence

The Generalized Turing Test (GTT) offers a formal, dataset-agnostic framework to compare AI agent intelligence via indistinguishability.

2605.10851May 11, 2026Daniel Mitropolsky, Susan S. Hong, Riccardo Neumarker +2

Clin-JEPA: A Multi-Phase Co-Training Framework for Joint-Embedding Predictive Pretraining on EHR Patient Trajectories

Clin-JEPA is a multi-phase co-training framework for JEPA pretraining on EHR patient trajectories, enabling accurate forecasting and risk prediction.

2605.10840May 11, 2026Yixuan Yang, Mehak Arora, Ryan Zhang +10

Transcoda: End-to-End Zero-Shot Optical Music Recognition via Data-Centric Synthetic Training

Transcoda is a zero-shot OMR system using advanced synthetic data, normalized encodings, and grammar-based decoding to achieve state-of-the-art performance.

2605.10835May 11, 2026Daniel Dratschuk, Paul Swoboda

Predicting 3D structure by latent posterior sampling

This paper introduces a method for 3D structure prediction by combining NeRFs with diffusion models for probabilistic latent posterior sampling.

2605.10830May 11, 2026Azmi Haider, Dan Rosenbaum

SLIM: Sparse Latent Steering for Interpretable and Property-Directed LLM-Based Molecular Editing

SLIM enhances LLM molecular editing by using sparse latent steering to precisely control properties and improve success rates.

2605.10831May 11, 2026Mingxu Zhang, Yuhan Li, Lujundong Li +3

LLMs for Secure Hardware Design and Related Problems: Opportunities and Challenges

A review of LLMs in hardware design, covering their capabilities, introduced vulnerabilities, and essential security countermeasures.

2605.10807May 11, 2026Johann Knechtel, Ozgur Sinanoglu, Ramesh Karri

The Last Word Often Wins: A Format Confound in Chain-of-Thought Corruption Studies

Chain-of-thought corruption studies are confounded by explicit answer formats; models often follow the final answer text, not the reasoning.

2605.10799May 11, 2026Gabriel Garcia

PreviousPage 6 of 68Next

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.