Colin Raffel
3 papers ยท Latest:
How Can We Synthesize High-Quality Pretraining Data? A Systematic Study of Prompt Design, Generator Model, and Source Data
A study on synthetic data for LLMs reveals structured formats and source data are crucial, while large generators aren't, leading to the efficient FinePhrase dataset.
Crosslingual Generalization through Multitask Finetuning
This paper demonstrates that multitask finetuning of large multilingual language models on English and machine-translated prompts enables strong zero-shot crosslingual generalization to many languages, including those unseen during training.
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
This paper introduces a unified text-to-text framework for transfer learning in NLP, achieving state-of-the-art results across diverse language tasks by systematically exploring pre-training and fine-tuning strategies.
๐ฌ Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week โ summarized, scored, and delivered to your inbox every Monday.