ArXiv TLDR
← All categories

Artificial Intelligence

Research on AI systems, knowledge representation, planning, and general intelligence.

cs.AI · 1428 papers

WARDEN: Endangered Indigenous Language Transcription and Translation with 6 Hours of Training Data

WARDEN is a novel two-stage system that transcribes and translates endangered Wardaman to English using only 6 hours of audio, outperforming larger models.

2605.13846May 13, 2026Ziheng Zhang, Yunzhong Hou, Naijing Liu +1

EVA-Bench: A New End-to-end Framework for Evaluating Voice Agents

EVA-Bench is a new end-to-end framework for evaluating voice agents using realistic bot-to-bot audio simulations and novel composite metrics.

2605.13841May 13, 2026Tara Bogavelli, Gabrielle Gauthier Melançon, Katrina Stankiewicz +10

Topology-Preserving Neural Operator Learning via Hodge Decomposition

This paper introduces a topology-preserving neural operator learning method using Hodge decomposition to model physical field equations on geometric meshes.

2605.13834May 13, 2026Dongzhe Zheng, Tao Zhong, Christine Allen-Blanchette

Quantifying Sensitivity for Tree Ensembles: A symbolic and compositional approach

This paper introduces a novel symbolic and compositional method to quantify sensitivity in decision tree ensembles, efficiently identifying misclassification risks.

2605.13830May 13, 2026S. Akshay, Chaitanya Garg, Ashutosh Gupta +2

Negation Neglect: When models fail to learn negations in training

LLMs finetuned on documents that flag claims as false often learn to believe those claims are true, a phenomenon called Negation Neglect.

2605.13829May 13, 2026Harry Mayne, Lev McKinney, Jan Dubiński +3

History Anchors: How Prior Behavior Steers LLM Decisions Toward Unsafe Actions

LLMs, especially flagship models, are highly susceptible to continuing and escalating harmful actions when instructed to maintain consistency with prior unsafe history.

2605.13825May 13, 2026Alberto G. Rodríguez Salgado

Harnessing Agentic Evolution

AEvo introduces a meta-editing framework that steers agentic evolution by dynamically revising the evolution process, outperforming existing methods.

2605.13821May 13, 2026Jiayi Zhang, Yongfeng Gu, Jianhao Ruan +10

Neurosymbolic Auditing of Natural-Language Software Requirements

A neurosymbolic approach using LLMs and SMT solvers audits natural-language software requirements, detecting ambiguity and inconsistencies.

2605.13817May 13, 2026Bethel Hall, William Eiers

Improving Reproducibility in Evaluation through Multi-Level Annotator Modeling

This paper introduces a multi-level bootstrapping method to improve AI evaluation reproducibility by modeling annotator behavior and analyzing data tradeoffs.

2605.13801May 13, 2026Deepak Pandita, Flip Korn, Chris Welty +1

Di-BiLPS: Denoising induced Bidirectional Latent-PDE-Solver under Sparse Observations

Di-BiLPS is a neural framework that solves PDEs efficiently under extremely sparse data, outperforming SOTA with zero-shot super-resolution.

2605.13790May 13, 2026Zhonghao Li, Chaoyu Liu, Qian Zhang

ENSEMBITS: an alphabet of protein conformational ensembles

Ensembits is the first tokenizer for protein conformational ensembles, capturing dynamic motions and alternative states for protein language modeling.

2605.13789May 13, 2026Kaiwen Shi, Carlos Oliver

Amplification to Synthesis: A Comparative Analysis of Cognitive Operations Before and After Generative AI

Generative AI has fundamentally shifted cognitive operations from simple amplification to sophisticated content synthesis, as shown by US election data.

2605.13785May 13, 2026Liz Cho, Dongwook Yoon

LMPath: Language-Mediated Priors and Path Generation for Aerial Exploration

LMPath uses language models and satellite imagery to generate semantic priors for UAV search paths, significantly improving efficiency over traditional methods.

2605.13782May 13, 2026Jonathan A. Diller, Fernando Cladera, Camillo J. Taylor +1

MinT: Managed Infrastructure for Training and Serving Millions of LLMs

MinT is a managed infrastructure system for efficiently training and serving millions of LoRA-adapted LLMs over shared base models.

2605.13779May 13, 2026Mind Lab, :, Song Cao +60

(How) Do Large Language Models Understand High-Level Message Sequence Charts?

LLMs show only a modest understanding (52% accuracy) of High-Level Message Sequence Charts' formal semantics, struggling with complex reasoning tasks.

2605.13773May 13, 2026Mohammad Reza Mousavi

Where Does Reasoning Break? Step-Level Hallucination Detection via Hidden-State Transport Geometry

This paper introduces a novel method for detecting step-level hallucinations in LLMs by analyzing hidden-state transport geometry during a single forward pass.

2605.13772May 13, 2026Tyler Alvarez, Ali Baheri

High-Rate Quantized Matrix Multiplication II

This paper explores high-rate quantized matrix multiplication for LLMs, showing how waterfilling improves GPTQ and analyzing the near-optimal WaterSIC scheme.

2605.13768May 13, 2026Or Ordentlich, Yury Polyanskiy

Weakly-Supervised Spatiotemporal Anomaly Detection

This paper introduces a weakly-supervised spatiotemporal anomaly detection method that uses video-level labels and multiple instance ranking loss.

2605.13746May 13, 2026Urvi Gianchandani, Praveen Tirupattur, Mubarak Shah

Senses Wide Shut: A Representation-Action Gap in Omnimodal LLMs

Omnimodal LLMs struggle to reject false textual claims contradicting sensory input, revealing a "Representation-Action Gap" in grounding.

2605.13737May 13, 2026Trung Nguyen Quang, Yiming Gao, Fanyi Pu +3

KVServe: Service-Aware KV Cache Compression for Communication-Efficient Disaggregated LLM Serving

KVServe adaptively compresses KV cache in disaggregated LLM serving, significantly boosting performance by optimizing communication.

2605.13734May 13, 2026Zedong Liu, Xinyang Ma, Dejun Luo +9
Page 1 of 72Next

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.