Luke Zettlemoyer
7 papers ยท Latest:
Fast Byte Latent Transformer
The Fast Byte Latent Transformer (BLT) introduces novel training and generation techniques to significantly speed up byte-level language models.
Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation
Tuna-2 is a unified multimodal model using pixel embeddings for understanding and generation, outperforming vision encoders and simplifying architecture.
Micro Language Models Enable Instant Responses
Micro LMs (8M-30M params) enable instant, contextually grounded responses on edge devices by initiating replies while cloud models complete them.
LIMA: Less Is More for Alignment
LIMA shows that fine-tuning a large language model on just 1,000 curated examples can achieve performance comparable to state-of-the-art models, highlighting the dominant role of pretraining over extensive instruction tuning.
Toolformer: Language Models Can Teach Themselves to Use Tools
Toolformer enables language models to autonomously learn to use external tools via APIs, significantly enhancing their performance on diverse tasks without extra supervision.
OPT: Open Pre-trained Transformer Language Models
OPT is a suite of openly released large-scale transformer language models comparable to GPT-3 but developed with significantly lower environmental impact.
RoBERTa: A Robustly Optimized BERT Pretraining Approach
RoBERTa revisits BERT pretraining with optimized hyperparameters and more data, achieving state-of-the-art NLP performance and revealing that BERT was originally undertrained.
๐ฌ Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week โ summarized, scored, and delivered to your inbox every Monday.