Computer Vision

Papers on image recognition, object detection, video analysis, and visual understanding.

cs.CV · 701 papers

R-DMesh: Video-Guided 3D Animation via Rectified Dynamic Mesh Flow

R-DMesh solves pose misalignment in video-guided 3D animation using a novel VAE and rectification offset for high-fidelity 4D mesh generation.

2605.13838May 13, 2026Zijie Wu, Lixin Xu, Puhua Jiang +3

Unlocking Patch-Level Features for CLIP-Based Class-Incremental Learning

This paper introduces SPA, a novel method that unlocks and aligns CLIP's patch-level features with semantic descriptions for state-of-the-art class-incremental learning.

2605.13835May 13, 2026Hao Sun, Zi-Jun Ding, Da-Wei Zhou

QLAM: A Quantum Long-Attention Memory Approach to Long-Sequence Token Modeling

QLAM introduces a quantum long-attention memory, extending state-space models to efficiently capture long-range dependencies using quantum superposition.

2605.13833May 13, 2026Hoang-Quan Nguyen, Sankalp Pandey, Khoa Luu

Training Long-Context Vision-Language Models Effectively with Generalization Beyond 128K Context

This paper introduces MMProLong, a new recipe for training long-context vision-language models effectively, generalizing beyond 128K context.

2605.13831May 13, 2026Zhaowei Wang, Lishu Luo, Haodong Duan +9

History Anchors: How Prior Behavior Steers LLM Decisions Toward Unsafe Actions

LLMs, especially flagship models, are highly susceptible to continuing and escalating harmful actions when instructed to maintain consistency with prior unsafe history.

2605.13825May 13, 2026Alberto G. Rodríguez Salgado

OmniLiDAR: A Unified Diffusion Framework for Multi-Domain 3D LiDAR Generation

OmniLiDAR is a unified diffusion framework that generates 3D LiDAR scans across eight diverse domains using text conditioning, addressing single-domain limitations.

2605.13815May 13, 2026Youquan Liu, Weidong Yang, Ao Liang +9

JANUS: Anatomy-Conditioned Gating for Robust CT Triage Under Distribution Shift

JANUS introduces a physiology-guided dual-stream architecture for robust CT triage, improving accuracy and reliability under distribution shifts.

2605.13813May 13, 2026Lavsen Dahal, Yubraj Bhandari, Geoffrey Rubin +1

EvoGround: Self-Evolving Video Agents for Video Temporal Grounding

EvoGround introduces self-evolving agents for video temporal grounding, achieving state-of-the-art results without human-labeled data.

2605.13803May 13, 2026Minjoon Jung, Byoung-Tak Zhang, Lorenzo Torresani

VoxCor: Training-Free Volumetric Features for Multimodal Voxel Correspondence

VoxCor provides training-free volumetric features from frozen 2D ViTs for robust multimodal 3D medical image voxel correspondence.

2605.13798May 13, 2026Guney Tombak, Ertunc Erdil, Ender Konukoglu

BlitzGS: City-Scale Gaussian Splatting at Lightning Speed

BlitzGS is a distributed 3DGS framework for lightning-fast city-scale reconstruction, optimizing Gaussian workload across system, model, and view levels.

2605.13794May 13, 2026Zhongtao Wang, Huishan Au, Yilong Li +6

Realtime-VLA FLASH: Speculative Inference Framework for Diffusion-based VLAs

Realtime-VLA FLASH uses speculative inference with a lightweight draft model to significantly reduce latency in diffusion-based VLAs for real-time embodied tasks.

2605.13778May 13, 2026Jiahui Niu, Kefan Gu, Yucheng Zhao +5

RoboEvolve: Co-Evolving Planner-Simulator for Robotic Manipulation with Limited Data

RoboEvolve co-evolves a VLM planner and VGM simulator to overcome data scarcity in robotic manipulation, achieving high efficiency with limited unlabeled data.

2605.13775May 13, 2026Harold Haodong Chen, Sirui Chen, Yingjie Xu +2

Generative Texture Diversification of 3D Pedestrians for Robust Autonomous Driving Perception

Generates diverse 3D pedestrian textures using StyleGAN2 for synthetic data, enhancing autonomous driving perception robustness.

2605.13755May 13, 2026Arka Bhowmick, Enes Ozeren, Ahmed Abdullah +1

Weakly-Supervised Spatiotemporal Anomaly Detection

This paper introduces a weakly-supervised spatiotemporal anomaly detection method that uses video-level labels and multiple instance ranking loss.

2605.13746May 13, 2026Urvi Gianchandani, Praveen Tirupattur, Mubarak Shah

Aligning Network Equivariance with Data Symmetry: A Theoretical Framework and Adaptive Approach for Image Restoration

This paper introduces a theoretical framework and adaptive network for image restoration, aligning network equivariance with data symmetry to improve performance.

2605.13744May 13, 2026Feiyu Tan, Qi Xie, Zongben Xu +1

LEXI-SG: Monocular 3D Scene Graph Mapping with Room-Guided Feed-Forward Reconstruction

LEXI-SG is the first dense monocular visual mapping system for open-vocabulary 3D scene graphs using only RGB camera input, enabling scalable reconstruction.

2605.13741May 13, 2026Christina Kassab, Hyeonjae Gil, Matías Mattamala +2

Coordinating Multiple Conditions for Trajectory-Controlled Human Motion Generation

CMC is a decoupled framework that generates human motions from text and trajectories, resolving conflicts and improving control accuracy.

2605.13729May 13, 2026Deli Cai, Haoyang Ma, Changxing Ding

AnyFlow: Any-Step Video Diffusion Model with On-Policy Flow Map Distillation

AnyFlow introduces an any-step video diffusion model using flow map distillation, outperforming consistency-based methods and scaling with sampling steps.

2605.13724May 13, 2026Yuchao Gu, Guian Fang, Yuxin Jiang +4

Guide, Think, Act: Interactive Embodied Reasoning in Vision-Language-Action Models

GTA-VLA is an interactive Vision-Language-Action framework that uses user-provided spatial guidance to improve robot reasoning and robustness in embodied tasks.

2605.13632May 13, 2026Yiran Ling, Qing Lian, Jinghang Li +6

LoREnc: Low-Rank Encryption for Securing Foundation Models and LoRA Adapters

LoREnc is a training-free framework that secures foundation models and LoRA adapters against IP leakage and model recovery attacks with minimal overhead.

2605.13163May 13, 2026Beomjin Ahn, Jungmin Kwon, Chanyong Jung +1

Page 1 of 36Next

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.