Artificial Intelligence
Research on AI systems, knowledge representation, planning, and general intelligence.
cs.AI · 1424 papersWARDEN: Endangered Indigenous Language Transcription and Translation with 6 Hours of Training Data
WARDEN is a novel two-stage system that transcribes and translates endangered Wardaman to English using only 6 hours of audio, outperforming larger models.
EVA-Bench: A New End-to-end Framework for Evaluating Voice Agents
EVA-Bench is a new end-to-end framework for evaluating voice agents using realistic bot-to-bot audio simulations and novel composite metrics.
Topology-Preserving Neural Operator Learning via Hodge Decomposition
This paper introduces a topology-preserving neural operator learning method using Hodge decomposition to model physical field equations on geometric meshes.
Quantifying Sensitivity for Tree Ensembles: A symbolic and compositional approach
This paper introduces a novel symbolic and compositional method to quantify sensitivity in decision tree ensembles, efficiently identifying misclassification risks.
Negation Neglect: When models fail to learn negations in training
LLMs finetuned on documents that flag claims as false often learn to believe those claims are true, a phenomenon called Negation Neglect.
History Anchors: How Prior Behavior Steers LLM Decisions Toward Unsafe Actions
LLMs, especially flagship models, are highly susceptible to continuing and escalating harmful actions when instructed to maintain consistency with prior unsafe history.
Harnessing Agentic Evolution
AEvo introduces a meta-editing framework that steers agentic evolution by dynamically revising the evolution process, outperforming existing methods.
Neurosymbolic Auditing of Natural-Language Software Requirements
A neurosymbolic approach using LLMs and SMT solvers audits natural-language software requirements, detecting ambiguity and inconsistencies.
Improving Reproducibility in Evaluation through Multi-Level Annotator Modeling
This paper introduces a multi-level bootstrapping method to improve AI evaluation reproducibility by modeling annotator behavior and analyzing data tradeoffs.
Di-BiLPS: Denoising induced Bidirectional Latent-PDE-Solver under Sparse Observations
Di-BiLPS is a neural framework that solves PDEs efficiently under extremely sparse data, outperforming SOTA with zero-shot super-resolution.
ENSEMBITS: an alphabet of protein conformational ensembles
Ensembits is the first tokenizer for protein conformational ensembles, capturing dynamic motions and alternative states for protein language modeling.
Amplification to Synthesis: A Comparative Analysis of Cognitive Operations Before and After Generative AI
Generative AI has fundamentally shifted cognitive operations from simple amplification to sophisticated content synthesis, as shown by US election data.
LMPath: Language-Mediated Priors and Path Generation for Aerial Exploration
LMPath uses language models and satellite imagery to generate semantic priors for UAV search paths, significantly improving efficiency over traditional methods.
MinT: Managed Infrastructure for Training and Serving Millions of LLMs
MinT is a managed infrastructure system for efficiently training and serving millions of LoRA-adapted LLMs over shared base models.
(How) Do Large Language Models Understand High-Level Message Sequence Charts?
LLMs show only a modest understanding (52% accuracy) of High-Level Message Sequence Charts' formal semantics, struggling with complex reasoning tasks.
Where Does Reasoning Break? Step-Level Hallucination Detection via Hidden-State Transport Geometry
This paper introduces a novel method for detecting step-level hallucinations in LLMs by analyzing hidden-state transport geometry during a single forward pass.
High-Rate Quantized Matrix Multiplication II
This paper explores high-rate quantized matrix multiplication for LLMs, showing how waterfilling improves GPTQ and analyzing the near-optimal WaterSIC scheme.
Weakly-Supervised Spatiotemporal Anomaly Detection
This paper introduces a weakly-supervised spatiotemporal anomaly detection method that uses video-level labels and multiple instance ranking loss.
Senses Wide Shut: A Representation-Action Gap in Omnimodal LLMs
Omnimodal LLMs struggle to reject false textual claims contradicting sensory input, revealing a "Representation-Action Gap" in grounding.
KVServe: Service-Aware KV Cache Compression for Communication-Efficient Disaggregated LLM Serving
KVServe adaptively compresses KV cache in disaggregated LLM serving, significantly boosting performance by optimizing communication.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.