ArXiv TLDR

Transformers Efficiently Perform In-Context Logistic Regression via Normalized Gradient Descent

🐦 Tweet
2605.06609

Chenyang Zhang, Yuan Cao

cs.LGstat.ML

TLDR

Transformers can efficiently perform in-context logistic regression through layers that mimic normalized gradient descent steps.

Key contributions

  • Constructs multi-layer transformers that perform in-context logistic regression.
  • Each transformer layer executes one step of normalized gradient descent on an in-context loss.
  • Shows these transformers can be trained by looping a single self-attention layer.
  • Provides convergence and out-of-distribution generalization guarantees for the model.

Why it matters

This work advances the theoretical understanding of in-context learning by revealing how softmax transformers implicitly perform algorithmic steps. It provides a concrete mechanism for ICL, showing how models can learn and generalize from context.

Original Abstract

Transformers have demonstrated remarkable in-context learning (ICL) capabilities. The strong ICL performance of transformers is commonly believed to arise from their ability to implicitly execute certain algorithms on the context, thereby enhancing prediction and generation. In this work, we investigate how transformers with softmax attention perform in-context learning on linear classification data. We first construct a class of multi-layer transformers that can perform in-context logistic regression, with each layer exactly performing one step of normalized gradient descent on an in-context loss. Then, we show that our constructed transformer can be obtained through (i) training a single self-attention layer supervised by one-step gradient descent, and (ii) recurrently applying the trained layer to obtain a looped model. Training convergence guarantees of the self-attention layer and out-of-distribution generalization guarantees of the looped model are provided. Our results advance the theoretical understanding of ICL mechanism by showcasing how softmax transformers can effectively act as in-context learners.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.