Artificial Intelligence
Research on AI systems, knowledge representation, planning, and general intelligence.
cs.AI · 1428 papersProteus: A Self-Evolving Red Team for Agent Skill Ecosystems
Proteus is a self-evolving red-team framework that uncovers adaptive leakage in LLM agent skills, showing current vetting underestimates risk.
IPI-proxy: An Intercepting Proxy for Red-Teaming Web-Browsing AI Agents Against Indirect Prompt Injection
IPI-proxy is an intercepting proxy for red-teaming web-browsing AI agents against indirect prompt injection by rewriting whitelisted HTTP responses.
Very Efficient Listwise Multimodal Reranking for Long Documents
ZipRerank is a highly efficient listwise multimodal reranker that significantly speeds up M-RAG for long documents by reducing input length and eliminating autoregressive decoding.
EvoNav: Evolutionary Reward Function Design for Robot Navigation with Large Language Models
EvoNav uses LLMs and an efficient three-stage evolutionary framework to automatically design superior reward functions for robot navigation.
Multi-Timescale Conductance Spiking Networks: A Sparse, Gradient-Trainable Framework with Rich Firing Dynamics for Enhanced Temporal Processing
Multi-timescale conductance SNNs offer rich dynamics, sparse activity, and direct gradient training, outperforming SOTA in temporal processing.
Behavioral Integrity Verification for AI Agent Skills
This paper introduces Behavioral Integrity Verification (BIV) to audit AI agent skills, finding widespread deviations and improving malicious skill detection.
A Research Agenda on Agents and Software Engineering: Outcomes from the Rio A2SE Seminar
This paper outlines a community-driven research agenda for agents and software engineering, covering six key thematic areas identified by experts.
Self-organized MT Direction Maps Emerge from Spatiotemporal Contrastive Optimization
A spatiotemporal TDANN model, trained with self-supervised learning, spontaneously generates brain-like direction maps in the visual cortex.
Cochise: A Reference Harness for Autonomous Penetration Testing
Cochise is a minimal Python reference harness for LLM-driven autonomous penetration testing, providing reusable infrastructure for research and comparison.
Exact Stiefel Optimization for Probabilistic PLS: Closed-Form Updates, Error Bounds, and Calibrated Uncertainty
Introduces an end-to-end framework for Probabilistic PLS using exact Stiefel optimization, offering calibrated uncertainty and improved accuracy.
EpiCastBench: Datasets and Benchmarks for Multivariate Epidemic Forecasting
EpiCastBench introduces 40 diverse multivariate epidemic datasets and a standardized benchmark for evaluating forecasting models.
The Evaluation Differential: When Frontier AI Models Recognise They Are Being Tested
This paper introduces the Evaluation Differential, showing AI models behave differently when tested, challenging safety claims from current evaluations.
LPDP: Inference-Time Reward Control for Variable-Length DNA Generation with Edit Flows
LPDP enables training-free, inference-time reward control for variable-length DNA generation using biologically plausible edit flows.
Options, Not Clicks: Lattice Refinement for Consent-Driven MCP Authorization
Conleash is a client-side middleware that uses a risk lattice and policy engine to provide consent-driven, boundary-scoped authorization for MCP tool invocations.
Human-AI Productivity Paradoxes: Modeling the Interplay of Skill, Effort, and AI Assistance
A new model explains how increased AI assistance can paradoxically degrade productivity and polarize skills due to unreliability or skill development.
Much of Geospatial Web Search Is Beyond Traditional GIS
This paper reveals that geospatial web search is far more prevalent and practically oriented than previously understood, often exceeding traditional GIS capabilities.
Natural Language based Specification and Verification
This paper explores using LLMs to generate and verify code implementations based on natural language specifications, showing promising preliminary results.
Unlocking LLM Creativity in Science through Analogical Reasoning
Analogical Reasoning (AR) enables LLMs to generate significantly more diverse and novel solutions for scientific problems, mitigating mode collapse.
ELF: Embedded Language Flows
ELF proposes a continuous diffusion model for language, leveraging flow matching in embedding space to achieve superior generation quality with fewer steps.
Variational Inference for Lévy Process-Driven SDEs via Neural Tilting
This paper introduces a neural exponential tilting framework for variational inference in Lévy-driven SDEs, addressing challenges in modeling extreme events.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.