Teven Le Scao

2 papers · Latest: January 8, 2024

Mixtral of Experts

Mixtral 8x7B is a Sparse Mixture of Experts language model that achieves performance on par with or exceeding much larger models like Llama 2 70B and GPT-3.5 by dynamically routing tokens through a subset of experts.

2401.04088Jan 8, 2024

Natural Language Processing

Crosslingual Generalization through Multitask Finetuning

This paper demonstrates that multitask finetuning of large multilingual language models on English and machine-translated prompts enables strong zero-shot crosslingual generalization to many languages, including those unseen during training.

2211.01786Nov 3, 2022

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.