Tri Dao
5 papers ยท Latest:
Search Your Block Floating Point Scales!
ScaleSearch optimizes Block Floating Point quantization scales by searching for minimal error, significantly improving generative model performance.
StarCoder 2 and The Stack v2: The Next Generation
StarCoder2 is a next-generation open-source Code LLM trained on a vastly expanded and diverse dataset, achieving state-of-the-art performance on multiple code benchmarks while being more parameter-efficient than larger models.
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Mamba is a novel linear-time sequence model using selective state space parameters that enables efficient, content-based reasoning and outperforms Transformers on long sequences across multiple modalities.
StarCoder: may the source be with you!
StarCoder is a 15.5B parameter open-source code generation model trained on a trillion tokens that outperforms existing open Code LLMs across multiple languages and offers advanced safety and usability features.
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
FlashAttention is an IO-aware exact attention algorithm that significantly speeds up Transformer training and enables longer context lengths by optimizing GPU memory access patterns.
๐ฌ Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week โ summarized, scored, and delivered to your inbox every Monday.