Artificial Intelligence
Research on AI systems, knowledge representation, planning, and general intelligence.
cs.AI ยท 1428 papersRobust and Explainable Bicuspid Aortic Valve Diagnosis Using Stacked Ensembles on Echocardiography
An explainable AI model accurately diagnoses bicuspid aortic valve (BAV) from tricuspid aortic valve (TAV) using routine echocardiography.
Coordinating Multiple Conditions for Trajectory-Controlled Human Motion Generation
CMC is a decoupled framework that generates human motions from text and trajectories, resolving conflicts and improving control accuracy.
AnyFlow: Any-Step Video Diffusion Model with On-Policy Flow Map Distillation
AnyFlow introduces an any-step video diffusion model using flow map distillation, outperforming consistency-based methods and scaling with sampling steps.
Children's English Reading Story Generation via Supervised Fine-Tuning of Compact LLMs with Controllable Difficulty and Safety
Fine-tuning compact 8B LLMs with expert curricula generates children's English stories with controllable difficulty and safety, outperforming larger models.
Identifying AI Web Scrapers Using Canary Tokens
This paper introduces a novel method using canary tokens to reliably identify which web scrapers are feeding data to specific large language models.
RTLC -- Research, Teach-to-Learn, Critique: A three-stage prompting paradigm inspired by the Feynman Learning Technique that lifts LLM-as-judge accuracy on JudgeBench with no fine-tuning
RTLC, a three-stage prompting paradigm inspired by Feynman, significantly boosts LLM-as-judge accuracy on JudgeBench without fine-tuning.
Beyond Perplexity: A Geometric and Spectral Study of Low-Rank Pre-Training
Low-rank pre-training methods yield geometrically distinct solutions from full-rank models and each other, even with similar perplexity, requiring deeper evaluation metrics.
Causality-Aware End-to-End Autonomous Driving via Ego-Centric Joint Scene Modeling
CaAD is a causality-aware end-to-end autonomous driving framework that models ego-vehicle and agent interactions for reliable trajectory prediction.
AttenA+: Rectifying Action Inequality in Robotic Foundation Models
AttenA+ rectifies action inequality in robotic foundation models by prioritizing kinematically critical, low-velocity segments for improved manipulation.
RealICU: Do LLM Agents Understand Long-Context ICU Data? A Benchmark Beyond Behavior Imitation
RealICU is a new benchmark for evaluating LLM agents on long-context ICU data, revealing recall-safety tradeoffs and anchoring biases in existing models.
Locale-Conditioned Few-Shot Prompting Mitigates Demonstration Regurgitation in On-Device PII Substitution with Small Language Models
An on-device PII substitution pipeline uses locale-conditioned few-shot prompting to prevent SLM regurgitation, though rule-based methods aid downstream NER more.
CUBic: Coordinated Unified Bimanual Perception and Control Framework
CUBic is a novel framework for bimanual robot control that unifies perception and coordination, outperforming state-of-the-art visuomotor baselines.
AI Harness Engineering: A Runtime Substrate for Foundation-Model Software Agents
Proposes AI Harness Engineering, a runtime substrate, to make foundation-model software agents reliable by mediating their interaction with projects.
Inducing Overthink: Hierarchical Genetic Algorithm-based DoS Attack on Black-Box Large Language Reasoning Models
A hierarchical genetic algorithm can induce "overthink" in black-box LLMs, creating DoS attacks by significantly increasing response length and resource consumption.
The Readability Spectrum: Patterns, Issues, and Prompt Effects in LLM-Generated Code
LLMs generate code with readability comparable to human code but distinct issue patterns, with prompt design having limited impact.
Improving Code Translation with Syntax-Guided and Semantic-aware Preference Optimization
CTO enhances LLM code translation using syntax-guided and semantic-aware preference optimization, outperforming baselines.
Protocol-Driven Development: Governing Generated Software Through Invariants and Evidence
Protocol-Driven Development (PDD) governs generated software by using machine-enforceable protocols, invariants, and verifiable evidence chains.
AgentLens: Revealing The Lucky Pass Problem in SWE-Agent Evaluation
AgentLens reveals the 'Lucky Pass' problem in SWE-agent evaluation, introducing a process-level framework to assess trajectory quality beyond simple pass/fail.
AlphaGRPO: Unlocking Self-Reflective Multimodal Generation in UMMs via Decompositional Verifiable Reward
AlphaGRPO enhances multimodal generation in UMMs using GRPO and a novel Decompositional Verifiable Reward for self-reflection and reasoning.
Learning, Fast and Slow: Towards LLMs That Adapt Continually
Fast-Slow Training enables LLMs to adapt continually with improved efficiency and less forgetting by combining fast context and slow parameter updates.
๐ฌ Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week โ summarized, scored, and delivered to your inbox every Monday.