Hao Li
5 papers ยท Latest:
Fill the GAP: A Granular Alignment Paradigm for Visual Reasoning in Multimodal Large Language Models
GAP proposes a granular alignment paradigm to stabilize visual latent reasoning in MLLMs by addressing feature-space mismatches, improving performance.
A Universal Dance of Galactic Disks: Ubiquitous Precession and Its Implications
Galactic disk precession is ubiquitous, driven by tidal torques, and significantly impacts galaxy evolution, including warps and satellite alignment.
PhysInOne: Visual Physics Learning and Reasoning in One Suite
PhysInOne is a new large-scale dataset with 2 million videos and detailed annotations for training AI in physics-grounded visual reasoning.
Do AI Coding Agents Log Like Humans? An Empirical Study
AI coding agents log differently than humans, often less, and struggle to follow explicit logging instructions, requiring human intervention.
ViVa: A Video-Generative Value Model for Robot Reinforcement Learning
ViVa is a video-generative value model that improves robot reinforcement learning by using a pretrained video generator to estimate future dynamics and task value.
๐ฌ Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week โ summarized, scored, and delivered to your inbox every Monday.