Yu Qiao
3 papers ยท Latest:
Computer Vision
MARBLE: Multi-Aspect Reward Balance for Diffusion RL
MARBLE introduces a gradient-space optimization framework to balance multiple rewards for diffusion RL, improving all dimensions simultaneously without manual weighting.
2605.06507
Computer VisionInternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
InternVL is a 6-billion parameter vision-language foundation model that aligns large-scale vision models with LLMs to achieve state-of-the-art results across diverse visual-linguistic tasks.
2312.14238
Machine LearningBeyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization
MODPO is a novel, RL-free method for aligning language models to multiple human preferences simultaneously, achieving stable and efficient optimization across diverse objectives.
2310.03708
๐ฌ Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week โ summarized, scored, and delivered to your inbox every Monday.