Jeffrey Quesnelle

2 papers · Latest: May 7, 2026

Long Context Pre-Training with Lighthouse Attention

Lighthouse Attention enables efficient long-context transformer pre-training by using a subquadratic, gradient-free hierarchical attention that's removed post-training.

2605.06554May 7, 2026

Natural Language Processing

Efficient Pre-Training with Token Superposition

Token-Superposition Training (TST) is a simple, drop-in method that significantly boosts LLM pre-training efficiency, reducing time by up to 2.5x.

2605.06546May 7, 2026

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.