Haowen Sun

2 papers · Latest: May 8, 2026

Proxy3D: Efficient 3D Representations for Vision-Language Models via Semantic Clustering and Alignment

Proxy3D introduces efficient 3D representations for Vision-Language Models by using semantic-aware clustering of scene features from video frames.

2605.08064May 8, 2026

Robotics

AffordSim: A Scalable Data Generator and Benchmark for Affordance-Aware Robotic Manipulation

AffordSim is a novel simulation framework that generates affordance-aware robotic manipulation data, enabling more realistic and challenging task generation for learning.

2604.11674Apr 13, 2026

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.