Yu Qiao

3 papers · Latest: May 7, 2026

MARBLE: Multi-Aspect Reward Balance for Diffusion RL

MARBLE introduces a gradient-space optimization framework to balance multiple rewards for diffusion RL, improving all dimensions simultaneously without manual weighting.

2605.06507May 7, 2026

Computer Vision

InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks

InternVL is a 6-billion parameter vision-language foundation model that aligns large-scale vision models with LLMs to achieve state-of-the-art results across diverse visual-linguistic tasks.

2312.14238Dec 21, 2023

Machine Learning

Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization

MODPO is a novel, RL-free method for aligning language models to multiple human preferences simultaneously, achieving stable and efficient optimization across diverse objectives.

2310.03708Oct 5, 2023

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.