Haowen Sun
2 papers ยท Latest:
Computer Vision
Proxy3D: Efficient 3D Representations for Vision-Language Models via Semantic Clustering and Alignment
Proxy3D introduces efficient 3D representations for Vision-Language Models by using semantic-aware clustering of scene features from video frames.
2605.08064
RoboticsAffordSim: A Scalable Data Generator and Benchmark for Affordance-Aware Robotic Manipulation
AffordSim is a novel simulation framework that generates affordance-aware robotic manipulation data, enabling more realistic and challenging task generation for learning.
2604.11674
๐ฌ Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week โ summarized, scored, and delivered to your inbox every Monday.