Hamid Palangi
2 papers ยท Latest:
Machine Learning
When Can LLMs Learn to Reason with Weak Supervision?
LLMs generalize under weak supervision when reward saturation is slow and reasoning is faithful, with SFT on traces being crucial.
2604.18574
Natural Language ProcessingOrca: Progressive Learning from Complex Explanation Traces of GPT-4
Orca is a 13B parameter model that improves small model reasoning by progressively learning from GPT-4's complex explanation traces and step-by-step thought processes, achieving state-of-the-art zero-shot performance on challenging benchmarks.
2306.02707
๐ฌ Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week โ summarized, scored, and delivered to your inbox every Monday.