Saadia Gabriel
2 papers ยท Latest:
Machine Learning
When Can LLMs Learn to Reason with Weak Supervision?
LLMs generalize under weak supervision when reward saturation is slow and reasoning is faithful, with SFT on traces being crucial.
2604.18574
Artificial IntelligenceSUPERNOVA: Eliciting General Reasoning in LLMs with Reinforcement Learning on Natural Instructions
SUPERNOVA is a data curation framework that uses RL with natural instructions to significantly improve LLM general reasoning by adapting existing instruction-tuning datasets.
2604.08477
๐ฌ Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week โ summarized, scored, and delivered to your inbox every Monday.