Artificial Intelligence
Research on AI systems, knowledge representation, planning, and general intelligence.
cs.AI · 1428 papersA Family of Quaternion-Valued Differential Evolution Algorithms for Numerical Function Optimization
This paper introduces Quaternion-Valued Differential Evolution (QDE) algorithms, showing improved convergence and performance for numerical optimization.
MedHopQA: A Disease-Centered Multi-Hop Reasoning Benchmark and Evaluation Framework for LLM-Based Biomedical Question Answering
MedHopQA is a new disease-centered multi-hop reasoning benchmark for evaluating LLMs in biomedical QA, designed to resist saturation and contamination.
EHR-RAGp: Retrieval-Augmented Prototype-Guided Foundation Model for Electronic Health Records
EHR-RAGp is a retrieval-augmented foundation model for EHRs, dynamically integrating relevant patient history via a prototype-guided module for better clinical predictions.
Set-Aggregated Genome Embeddings for Microbiome Abundance Prediction
This paper uses Set-Aggregated Genome Embeddings (SAGE) with genomic language models to predict microbiome abundance from DNA, showing improved generalization.
Iterative Audit Convergence in LLM-Managed Multi-Agent Systems: A Case Study in Prompt Engineering Quality Assurance
LLM agents iteratively audited prompt specifications in a multi-agent system (AEGIS), surfacing 51 consistency defects and demonstrating audit convergence.
TMRL: Diffusion Timestep-Modulated Pretraining Enables Exploration for Efficient Policy Finetuning
TMRL introduces diffusion timestep-modulated pretraining to enable efficient exploration and finetuning of robot policies, improving sample efficiency.
No More, No Less: Task Alignment in Terminal Agents
A new benchmark, TAB, reveals terminal agents struggle with selectively following relevant instructions while ignoring distractors, highlighting a gap in task alignment.
TriBand-BEV: Real-Time LiDAR-Only 3D Pedestrian Detection via Height-Aware BEV and High-Resolution Feature Fusion
TriBand-BEV introduces a real-time LiDAR-only 3D pedestrian detection method using a height-aware BEV encoding, outperforming prior methods on KITTI.
Self-Supervised Laplace Approximation for Bayesian Uncertainty Quantification
Introduces Self-Supervised Laplace Approximation (SSLA) to quantify Bayesian model predictive uncertainty by refitting on self-predicted data, outperforming classical methods.
Uncertainty Quantification for LLM-based Code Generation
RisCoSet quantifies uncertainty in LLM code generation by creating risk-controlled prediction sets, significantly reducing incorrect code generation.
Premover: Fast Vision-Language-Action Control by Acting Before Instructions Are Complete
Premover speeds up Vision-Language-Action policies by enabling robots to start acting before user instructions are fully complete, reducing idle time.
CIDR: A Large-Scale Industrial Source Code Dataset for Software Engineering Research
CIDR is a new large-scale dataset of 2,440 proprietary industrial software repositories from 12 partners, designed for diverse software engineering research.
Learning What Matters: Adaptive Information-Theoretic Objectives for Robot Exploration
QOED improves robot exploration by adaptively identifying and prioritizing observable parameter directions, suppressing nuisance effects for better learning.
Property-Level Reconstructability of Agent Decisions: An Anchor-Level Pilot Across Vendor SDK Adapter Regimes
This paper pilots a method to assess the reconstructability of AI agent decisions across various vendor SDK regimes, finding significant variability.
The Deepfakes We Missed: We Built Detectors for a Threat That Didn't Arrive
Deepfake detection research is misaligned, focusing on public figure manipulation while real threats are NCII, voice scams, and emotional fraud.
Scaling Laws and Tradeoffs in Recurrent Networks of Expressive Neurons
ELM Networks demonstrate optimal resource allocation in recurrent networks, favoring more complex neurons as scale increases, challenging simple-unit defaults.
SkillSafetyBench: Evaluating Agent Safety under Skill-Facing Attack Surfaces
SkillSafetyBench evaluates how reusable skills in LLM agents create new attack surfaces, revealing vulnerabilities beyond model-level alignment.
Random-Set Graph Neural Networks
This paper introduces Random-Set Graph Neural Networks (RS-GNNs) to model node-level epistemic uncertainty using belief functions for improved predictions.
Cooperative Robotics Reinforced by Collective Perception for Traffic Moderation
This paper introduces a cooperative humanoid robot that uses collective perception and V2X to moderate traffic and prevent collisions at non-line-of-sight intersections.
AccLock: Unlocking Identity with Heartbeat Using In-Ear Accelerometers
AccLock passively authenticates users via unique in-ear heartbeat signals captured by accelerometers, overcoming limitations of prior systems.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.