Neil Houlsby

4 papers · Latest: December 19, 2023

Gemini: A Family of Highly Capable Multimodal Models

Gemini is a new family of multimodal AI models excelling in image, audio, video, and text understanding, achieving state-of-the-art results across numerous benchmarks including human-expert level on MMLU.

2312.11805Dec 19, 2023

Natural Language Processing

Transcending Scaling Laws with 0.1% Extra Compute

UL2R fine-tuning significantly improves large language model performance and scaling efficiency with only 0.1% extra compute, enabling substantial computational savings and emergent abilities.

2210.11399Oct 20, 2022

Computer Vision

Scaling Vision Transformers

This paper studies how Vision Transformers scale with model size and data, improving their architecture and training to achieve state-of-the-art ImageNet accuracy with a 2-billion parameter model.

2106.04560Jun 8, 2021

Computer Vision

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

This paper demonstrates that a pure Transformer model applied directly to image patches can achieve state-of-the-art image classification performance without relying on convolutional networks.

2010.11929Oct 22, 2020

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.