SSL-R1: Self-Supervised Visual Reinforcement Post-Training for Multimodal Large Language Models
Jiahao Xie, Alessio Tonioni, Nathalie Rauschmayr, Federico Tombari, Bernt Schiele
TLDR
SSL-R1 introduces a self-supervised visual reinforcement learning framework for MLLMs, deriving verifiable rewards directly from images.
Key contributions
- Presents SSL-R1, a self-supervised RL framework that generates verifiable rewards directly from images.
- Reformulates widely-used self-supervised learning tasks into visual puzzles for RL post-training.
- Eliminates the need for human or external model supervision for reward design.
- Substantially improves MLLM performance on multimodal understanding and reasoning benchmarks.
Why it matters
Current MLLM reinforcement learning relies on costly language priors and manual annotations, hindering scalability and intrinsic visual understanding. SSL-R1 addresses this by providing a self-supervised, vision-centric RL framework that derives verifiable rewards directly from images. This enables more scalable and effective MLLM post-training, improving multimodal reasoning without human supervision.
Original Abstract
Reinforcement learning (RL) with verifiable rewards (RLVR) has demonstrated the great potential of enhancing the reasoning abilities in multimodal large language models (MLLMs). However, the reliance on language-centric priors and expensive manual annotations prevents MLLMs' intrinsic visual understanding and scalable reward designs. In this work, we introduce SSL-R1, a generic self-supervised RL framework that derives verifiable rewards directly from images. To this end, we revisit self-supervised learning (SSL) in visual domains and reformulate widely-used SSL tasks into a set of verifiable visual puzzles for RL post-training, requiring neither human nor external model supervision. Training MLLMs on these tasks substantially improves their performance on multimodal understanding and reasoning benchmarks, highlighting the potential of leveraging vision-centric self-supervised tasks for MLLM post-training. We think this work will provide useful experience in devising effective self-supervised verifiable rewards to enable RL at scale. Project page: https://github.com/Jiahao000/SSL-R1.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.