Artificial Intelligence

Research on AI systems, knowledge representation, planning, and general intelligence.

cs.AI · 1428 papers

Robust and Explainable Bicuspid Aortic Valve Diagnosis Using Stacked Ensembles on Echocardiography

An explainable AI model accurately diagnoses bicuspid aortic valve (BAV) from tricuspid aortic valve (TAV) using routine echocardiography.

2605.13730May 13, 2026Christos Chrysanthos Nikolaidis, Vasileios Sachpekidis, Nikolas Moustakidis +2

Coordinating Multiple Conditions for Trajectory-Controlled Human Motion Generation

CMC is a decoupled framework that generates human motions from text and trajectories, resolving conflicts and improving control accuracy.

2605.13729May 13, 2026Deli Cai, Haoyang Ma, Changxing Ding

AnyFlow: Any-Step Video Diffusion Model with On-Policy Flow Map Distillation

AnyFlow introduces an any-step video diffusion model using flow map distillation, outperforming consistency-based methods and scaling with sampling steps.

2605.13724May 13, 2026Yuchao Gu, Guian Fang, Yuxin Jiang +4

Children's English Reading Story Generation via Supervised Fine-Tuning of Compact LLMs with Controllable Difficulty and Safety

Fine-tuning compact 8B LLMs with expert curricula generates children's English stories with controllable difficulty and safety, outperforming larger models.

2605.13709May 13, 2026Qian Shen, Fanghua Cao, Min Yao +3

Identifying AI Web Scrapers Using Canary Tokens

This paper introduces a novel method using canary tokens to reliably identify which web scrapers are feeding data to specific large language models.

2605.13706May 13, 2026Steven Seiden, Triss Ren, Caroline Zhang +3

RTLC -- Research, Teach-to-Learn, Critique: A three-stage prompting paradigm inspired by the Feynman Learning Technique that lifts LLM-as-judge accuracy on JudgeBench with no fine-tuning

RTLC, a three-stage prompting paradigm inspired by Feynman, significantly boosts LLM-as-judge accuracy on JudgeBench without fine-tuning.

2605.13695May 13, 2026Andrea Morandi

Beyond Perplexity: A Geometric and Spectral Study of Low-Rank Pre-Training

Low-rank pre-training methods yield geometrically distinct solutions from full-rank models and each other, even with similar perplexity, requiring deeper evaluation metrics.

2605.13652May 13, 2026Namrata Shivagunde, Vijeta Deshpande, Sherin Muckatira +1

Causality-Aware End-to-End Autonomous Driving via Ego-Centric Joint Scene Modeling

CaAD is a causality-aware end-to-end autonomous driving framework that models ego-vehicle and agent interactions for reliable trajectory prediction.

2605.13646May 13, 2026Seokha Moon, Minseung Lee, Joon Seo +2

AttenA+: Rectifying Action Inequality in Robotic Foundation Models

AttenA+ rectifies action inequality in robotic foundation models by prioritizing kinematically critical, low-velocity segments for improved manipulation.

2605.13548May 13, 2026Daojie Peng, Fulong Ma, Jiahang Cao +7

RealICU: Do LLM Agents Understand Long-Context ICU Data? A Benchmark Beyond Behavior Imitation

RealICU is a new benchmark for evaluating LLM agents on long-context ICU data, revealing recall-safety tradeoffs and anchoring biases in existing models.

2605.13542May 13, 2026Chengzhi Shen, Weixiang Shen, Tobias Susetzky +8

Locale-Conditioned Few-Shot Prompting Mitigates Demonstration Regurgitation in On-Device PII Substitution with Small Language Models

An on-device PII substitution pipeline uses locale-conditioned few-shot prompting to prevent SLM regurgitation, though rule-based methods aid downstream NER more.

2605.13538May 13, 2026Anuj Sadani, Deepak Kumar

CUBic: Coordinated Unified Bimanual Perception and Control Framework

CUBic is a novel framework for bimanual robot control that unifies perception and coordination, outperforming state-of-the-art visuomotor baselines.

2605.13452May 13, 2026Xingyu Wang, Pengxiang Ding, Jingkai Xu +2

AI Harness Engineering: A Runtime Substrate for Foundation-Model Software Agents

Proposes AI Harness Engineering, a runtime substrate, to make foundation-model software agents reliable by mediating their interaction with projects.

2605.13357May 13, 2026Hailin Zhong, Shengxin Zhu

Inducing Overthink: Hierarchical Genetic Algorithm-based DoS Attack on Black-Box Large Language Reasoning Models

A hierarchical genetic algorithm can induce "overthink" in black-box LLMs, creating DoS attacks by significantly increasing response length and resource consumption.

2605.13338May 13, 2026Shuqiang Wang, Wei Cao, Jiaqi Weng +4

The Readability Spectrum: Patterns, Issues, and Prompt Effects in LLM-Generated Code

LLMs generate code with readability comparable to human code but distinct issue patterns, with prompt design having limited impact.

2605.13280May 13, 2026Hengzhi Ye, Fengyuan Ran, Weiwei Xu +1

Improving Code Translation with Syntax-Guided and Semantic-aware Preference Optimization

CTO enhances LLM code translation using syntax-guided and semantic-aware preference optimization, outperforming baselines.

2605.13229May 13, 2026Yuhan Wu, Huan Zhang, Wei Cheng +3

Protocol-Driven Development: Governing Generated Software Through Invariants and Evidence

Protocol-Driven Development (PDD) governs generated software by using machine-enforceable protocols, invariants, and verifiable evidence chains.

2605.12981May 13, 2026Jun He, Deying Yu

AgentLens: Revealing The Lucky Pass Problem in SWE-Agent Evaluation

AgentLens reveals the 'Lucky Pass' problem in SWE-agent evaluation, introducing a process-level framework to assess trajectory quality beyond simple pass/fail.

2605.12925May 13, 2026Priyam Sahoo, Gaurav Mittal, Xiaomin Li +4

AlphaGRPO: Unlocking Self-Reflective Multimodal Generation in UMMs via Decompositional Verifiable Reward

AlphaGRPO enhances multimodal generation in UMMs using GRPO and a novel Decompositional Verifiable Reward for self-reflection and reasoning.

2605.12495May 12, 2026Runhui Huang, Jie Wu, Rui Yang +2

Learning, Fast and Slow: Towards LLMs That Adapt Continually

Fast-Slow Training enables LLMs to adapt continually with improved efficiency and less forgetting by combining fast context and slow parameter updates.

2605.12484May 12, 2026Rishabh Tiwari, Kusha Sareen, Lakshya A Agrawal +6

PreviousPage 2 of 72Next

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.