Kun Wang
4 papers ยท Latest:
RoboAlign-R1: Distilled Multimodal Reward Alignment for Robot Video World Models
RoboAlign-R1 improves robot video world models by using reward-aligned post-training and stabilized long-horizon inference, boosting task consistency and realism.
ProjLens: Unveiling the Role of Projectors in Multimodal Model Safety
ProjLens demystifies MLLM backdoors, revealing how projector fine-tuning introduces vulnerabilities and detailing their low-rank structure and activation mechanism.
NTIRE 2026 Challenge on Video Saliency Prediction: Methods and Results
NTIRE 2026 challenge overview: methods and results for video saliency prediction using a new 2,000-video dataset and crowdsourced fixations.
Hijacking Large Audio-Language Models via Context-Agnostic and Imperceptible Auditory Prompt Injection
AudioHijack injects imperceptible audio prompts to hijack large audio-language models, forcing unauthorized actions and exposing critical vulnerabilities.
๐ฌ Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week โ summarized, scored, and delivered to your inbox every Monday.