Tri Dao

5 papers · Latest: May 12, 2026

Search Your Block Floating Point Scales!

ScaleSearch optimizes Block Floating Point quantization scales by searching for minimal error, significantly improving generative model performance.

2605.12464May 12, 2026

Software Engineering

StarCoder 2 and The Stack v2: The Next Generation

StarCoder2 is a next-generation open-source Code LLM trained on a vastly expanded and diverse dataset, achieving state-of-the-art performance on multiple code benchmarks while being more parameter-efficient than larger models.

2402.19173Feb 29, 2024

Machine Learning

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Mamba is a novel linear-time sequence model using selective state space parameters that enables efficient, content-based reasoning and outperforms Transformers on long sequences across multiple modalities.

2312.00752Dec 1, 2023

Natural Language Processing

StarCoder: may the source be with you!

StarCoder is a 15.5B parameter open-source code generation model trained on a trillion tokens that outperforms existing open Code LLMs across multiple languages and offers advanced safety and usability features.

2305.06161May 9, 2023

Machine Learning

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

FlashAttention is an IO-aware exact attention algorithm that significantly speeds up Transformer training and enables longer context lengths by optimizing GPU memory access patterns.

2205.14135May 27, 2022

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.