ArXiv TLDR

Biologically-Grounded Multi-Encoder Architectures as Developability Oracles for Antibody Design

🐦 Tweet
2604.09369

Simon J. Crouzet

q-bio.BMcs.LGq-bio.QM

TLDR

CrossAbSense introduces biologically-grounded multi-encoder neural oracles to predict antibody developability, significantly reducing screening costs.

Key contributions

  • Presents CrossAbSense, a framework combining frozen protein language model encoders with attention decoders.
  • Achieves 12-20% improvement over baselines on 3/5 developability assays on the GDPa1 benchmark.
  • Shows self-attention suffices for aggregation, while cross-attention is required for stability and expression.
  • Reveals heavy-chain dominance in aggregation and balanced contributions for thermal stability.

Why it matters

Generative antibody design is bottlenecked by expensive biophysical characterization. CrossAbSense offers a cost-effective solution by accurately predicting developability, accelerating therapeutic discovery. This reduces experimental screening costs and provides key insights into antibody chain interactions.

Original Abstract

Generative models can now propose thousands of \emph{de novo} antibody sequences, yet translating these designs into viable therapeutics remains constrained by the cost of biophysical characterization. Here we present CrossAbSense, a framework of property-specific neural oracles that combine frozen protein language model encoders with configurable attention decoders, identified through a systematic hyperparameter campaign totaling over 200 runs per property. On the GDPa1 benchmark of 242 therapeutic IgGs, our oracles achieve notable improvements of 12--20\% over established baselines on three of five developability assays and competitive performance on the remaining two. The central finding is that optimal decoder architectures \emph{invert} our initial biological hypotheses: self-attention alone suffices for aggregation-related properties (hydrophobic interaction chromatography, polyreactivity), where the relevant sequence signatures -- such as CDR-H3 hydrophobic patches -- are already fully resolved within single-chain embeddings by the high-capacity 6B encoder. Bidirectional cross-attention, by contrast, is required for expression yield and thermal stability -- properties that inherently depend on the compatibility between heavy and light chains. Learned chain fusion weights independently confirm heavy-chain dominance in aggregation ($w_H = 0.62$) versus balanced contributions for stability ($w_H = 0.51$). We demonstrate practical utility by deploying CrossAbSense on 100 IgLM-generated antibody designs, illustrating a path toward substantial reduction in experimental screening costs.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.