Neil Houlsby
4 papers ยท Latest:
Gemini: A Family of Highly Capable Multimodal Models
Gemini is a new family of multimodal AI models excelling in image, audio, video, and text understanding, achieving state-of-the-art results across numerous benchmarks including human-expert level on MMLU.
Transcending Scaling Laws with 0.1% Extra Compute
UL2R fine-tuning significantly improves large language model performance and scaling efficiency with only 0.1% extra compute, enabling substantial computational savings and emergent abilities.
Scaling Vision Transformers
This paper studies how Vision Transformers scale with model size and data, improving their architecture and training to achieve state-of-the-art ImageNet accuracy with a 2-billion parameter model.
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
This paper demonstrates that a pure Transformer model applied directly to image patches can achieve state-of-the-art image classification performance without relying on convolutional networks.
๐ฌ Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week โ summarized, scored, and delivered to your inbox every Monday.