Artificial Intelligence

Research on AI systems, knowledge representation, planning, and general intelligence.

cs.AI · 1428 papers

A Family of Quaternion-Valued Differential Evolution Algorithms for Numerical Function Optimization

This paper introduces Quaternion-Valued Differential Evolution (QDE) algorithms, showing improved convergence and performance for numerical optimization.

2605.12362May 12, 2026Gerardo Altamirano-Gomez, Álvaro Gallardo, Carlos Ignacio Hernández Castellanos

MedHopQA: A Disease-Centered Multi-Hop Reasoning Benchmark and Evaluation Framework for LLM-Based Biomedical Question Answering

MedHopQA is a new disease-centered multi-hop reasoning benchmark for evaluating LLMs in biomedical QA, designed to resist saturation and contamination.

2605.12361May 12, 2026Rezarta Islamaj, Robert Leaman, Joey Chan +13

EHR-RAGp: Retrieval-Augmented Prototype-Guided Foundation Model for Electronic Health Records

EHR-RAGp is a retrieval-augmented foundation model for EHRs, dynamically integrating relevant patient history via a prototype-guided module for better clinical predictions.

2605.12335May 12, 2026Saeed Shurrab, Mariam Al-Omari, Dana El Samad +1

Set-Aggregated Genome Embeddings for Microbiome Abundance Prediction

This paper uses Set-Aggregated Genome Embeddings (SAGE) with genomic language models to predict microbiome abundance from DNA, showing improved generalization.

2605.12286May 12, 2026Younhun Kim, Georg K. Gerber, Travis E. Gibson

Iterative Audit Convergence in LLM-Managed Multi-Agent Systems: A Case Study in Prompt Engineering Quality Assurance

LLM agents iteratively audited prompt specifications in a multi-agent system (AEGIS), surfacing 51 consistency defects and demonstrating audit convergence.

2605.12280May 12, 2026Elias Calboreanu

TMRL: Diffusion Timestep-Modulated Pretraining Enables Exploration for Efficient Policy Finetuning

TMRL introduces diffusion timestep-modulated pretraining to enable efficient exploration and finetuning of robot policies, improving sample efficiency.

2605.12236May 12, 2026Matthew M. Hong, Jesse Zhang, Anusha Nagabandi +1

No More, No Less: Task Alignment in Terminal Agents

A new benchmark, TAB, reveals terminal agents struggle with selectively following relevant instructions while ignoring distractors, highlighting a gap in task alignment.

2605.12233May 12, 2026Sina Mavali, David Pape, Jonathan Evertz +5

TriBand-BEV: Real-Time LiDAR-Only 3D Pedestrian Detection via Height-Aware BEV and High-Resolution Feature Fusion

TriBand-BEV introduces a real-time LiDAR-only 3D pedestrian detection method using a height-aware BEV encoding, outperforming prior methods on KITTI.

2605.12220May 12, 2026Mohammad Khoshkdahan, Alexey Vinel

Self-Supervised Laplace Approximation for Bayesian Uncertainty Quantification

Introduces Self-Supervised Laplace Approximation (SSLA) to quantify Bayesian model predictive uncertainty by refitting on self-predicted data, outperforming classical methods.

2605.12208May 12, 2026Julian Rodemann, Alexander Marquard, Thomas Augustin +1

Uncertainty Quantification for LLM-based Code Generation

RisCoSet quantifies uncertainty in LLM code generation by creating risk-controlled prediction sets, significantly reducing incorrect code generation.

2605.12201May 12, 2026Senrong Xu, Yuhao Tan, Yanke Zhou +6

Premover: Fast Vision-Language-Action Control by Acting Before Instructions Are Complete

Premover speeds up Vision-Language-Action policies by enabling robots to start acting before user instructions are fully complete, reducing idle time.

2605.12160May 12, 2026Joonha Park, Jiseung Jeong, Taesik Gong

CIDR: A Large-Scale Industrial Source Code Dataset for Software Engineering Research

CIDR is a new large-scale dataset of 2,440 proprietary industrial software repositories from 12 partners, designed for diverse software engineering research.

2605.12153May 12, 2026Vladislav Savenkov

Learning What Matters: Adaptive Information-Theoretic Objectives for Robot Exploration

QOED improves robot exploration by adaptively identifying and prioritizing observable parameter directions, suppressing nuisance effects for better learning.

2605.12084May 12, 2026Youwei Yu, Jionghao Wang, Zhengming Yu +2

Property-Level Reconstructability of Agent Decisions: An Anchor-Level Pilot Across Vendor SDK Adapter Regimes

This paper pilots a method to assess the reconstructability of AI agent decisions across various vendor SDK regimes, finding significant variability.

2605.12078May 12, 2026Oleg Solozobov

The Deepfakes We Missed: We Built Detectors for a Threat That Didn't Arrive

Deepfake detection research is misaligned, focusing on public figure manipulation while real threats are NCII, voice scams, and emotional fraud.

2605.12075May 12, 2026Shaina Raza

Scaling Laws and Tradeoffs in Recurrent Networks of Expressive Neurons

ELM Networks demonstrate optimal resource allocation in recurrent networks, favoring more complex neurons as scale increases, challenging simple-unit defaults.

2605.12049May 12, 2026Aaron Spieler, Georg Martius, Anna Levina

SkillSafetyBench: Evaluating Agent Safety under Skill-Facing Attack Surfaces

SkillSafetyBench evaluates how reusable skills in LLM agents create new attack surfaces, revealing vulnerabilities beyond model-level alignment.

2605.12015May 12, 2026Chang Jin, An Wang, Zeming Wei +7

Random-Set Graph Neural Networks

This paper introduces Random-Set Graph Neural Networks (RS-GNNs) to model node-level epistemic uncertainty using belief functions for improved predictions.

2605.11987May 12, 2026Tommy Woodley, Shireen Kudukkil Manchingal, Matteo Tolloso +2

Cooperative Robotics Reinforced by Collective Perception for Traffic Moderation

This paper introduces a cooperative humanoid robot that uses collective perception and V2X to moderate traffic and prevent collisions at non-line-of-sight intersections.

2605.11972May 12, 2026Mohammad Khoshkdahan, John Pravin Arockiasamy, Andy Flores Comeca +1

AccLock: Unlocking Identity with Heartbeat Using In-Ear Accelerometers

AccLock passively authenticates users via unique in-ear heartbeat signals captured by accelerometers, overcoming limitations of prior systems.

2605.11901May 12, 2026Lei Wang, Jiangxuan Shen, Xi Zhang +6

PreviousPage 4 of 72Next

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.