Jeffrey Quesnelle
2 papers ยท Latest:
Natural Language Processing
Long Context Pre-Training with Lighthouse Attention
Lighthouse Attention enables efficient long-context transformer pre-training by using a subquadratic, gradient-free hierarchical attention that's removed post-training.
2605.06554
Natural Language ProcessingEfficient Pre-Training with Token Superposition
Token-Superposition Training (TST) is a simple, drop-in method that significantly boosts LLM pre-training efficiency, reducing time by up to 2.5x.
2605.06546
๐ฌ Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week โ summarized, scored, and delivered to your inbox every Monday.