Qifeng Chen

6 papers · Latest: May 12, 2026

CausalCine: Real-Time Autoregressive Generation for Multi-Shot Video Narratives

CausalCine is a real-time autoregressive framework for generating multi-shot video narratives, enabling interactive, coherent storytelling across shot changes.

2605.12496May 12, 2026

Computer Vision

MedHorizon: Towards Long-context Medical Video Understanding in the Wild

MedHorizon introduces a new benchmark for long-context medical video understanding, revealing current MLLMs struggle with sparse evidence retrieval and clinical reasoning.

2605.06537May 7, 2026

Artificial Intelligence

Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond

This paper introduces a "levels x laws" taxonomy for agentic world models, synthesizing over 400 works and outlining a roadmap for future development.

2604.22748Apr 24, 2026

Computer Vision

Divide-then-Diagnose: Weaving Clinician-Inspired Contexts for Ultra-Long Capsule Endoscopy Videos

This paper introduces a new task, dataset (VideoCAP), and framework (DiCE) for diagnosis-driven summarization of ultra-long capsule endoscopy videos.

2604.21814Apr 23, 2026

Computer Vision

AnimationBench: Are Video Models Good at Character-Centric Animation?

AnimationBench is a new benchmark for evaluating image-to-video models' ability to generate character-centric animation, addressing limitations of realism-focused tools.

2604.15299Apr 16, 2026

Robotics

Switch: Learning Agile Skills Switching for Humanoid Robots

Switch is a hierarchical multi-skill system enabling humanoid robots to seamlessly transition between diverse locomotion skills using a skill graph and online scheduler.

2604.14834Apr 16, 2026

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.