Kun Wang

4 papers · Latest: May 5, 2026

RoboAlign-R1: Distilled Multimodal Reward Alignment for Robot Video World Models

RoboAlign-R1 improves robot video world models by using reward-aligned post-training and stabilized long-horizon inference, boosting task consistency and realism.

2605.03821May 5, 2026

Cryptography & Security

ProjLens: Unveiling the Role of Projectors in Multimodal Model Safety

ProjLens demystifies MLLM backdoors, revealing how projector fine-tuning introduces vulnerabilities and detailing their low-rank structure and activation mechanism.

2604.19083Apr 21, 2026

Computer Vision

NTIRE 2026 Challenge on Video Saliency Prediction: Methods and Results

NTIRE 2026 challenge overview: methods and results for video saliency prediction using a new 2,000-video dataset and crowdsourced fixations.

2604.14816Apr 16, 2026

Cryptography & Security

Hijacking Large Audio-Language Models via Context-Agnostic and Imperceptible Auditory Prompt Injection

AudioHijack injects imperceptible audio prompts to hijack large audio-language models, forcing unauthorized actions and exposing critical vulnerabilities.

2604.14604Apr 16, 2026

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.