OASIS: Online Activation Subspace Learning for Memory-Efficient Training
Sakshi Choudhary, Utkarsh Saxena, Kaushik Roy
TLDR
OASIS is an online activation subspace learning algorithm that significantly reduces memory requirements for training large language models while maintaining performance.
Key contributions
- Continuously learns and updates a low-dimensional activation subspace online during training.
- Projects intermediate activations onto this evolving subspace to reduce memory footprint.
- Maintains gradients and optimizer states directly within the learned low-rank activation subspace.
- Uses a projection-aware optimizer for stable training across subspace updates.
Why it matters
This paper tackles the critical memory bottleneck of activations in large language model training. OASIS offers a novel online subspace learning method that reduces memory by up to 2x without performance loss or altering the forward pass, outperforming prior low-rank techniques. This enables more efficient training of larger LLMs.
Original Abstract
Training large language models (LLMs) is constrained by memory requirements, with activations accounting for a substantial fraction of the total footprint. Existing approaches reduce memory using low-rank weight parameterizations or low-rank gradient subspaces for optimizer states, while activation memory is addressed through architectural modifications or compression schemes based on periodically updated projections. We propose OASIS, an online activation subspace learning algorithm for memory-efficient training that tracks and continuously updates a low-dimensional activation subspace during training. Intermediate activations are projected onto this evolving subspace, reducing memory without modifying forward-pass computations. The evolving activation subspace induces low-rank gradient representations, enabling both gradients and optimizer states to be maintained directly in this subspace, while a projection-aware optimizer consistently transports optimizer states across subspace updates for stable training. Across various finetuning and pretraining tasks, OASIS achieves up to $2\times$ lower peak memory than full fine-tuning while matching its performance and outperforming prior low-rank methods.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.