CellxPert: Inference-Time MCMC Steering of a Multi-Omics Single-Cell Foundation Model for In-Silico Perturbation
Andac Demir, Erik W. Anderson, Jeremy L. Jenkins, Srayanta Mukherjee
TLDR
CellxPert is a multi-omics single-cell foundation model using MCMC for biologically interpretable in-silico perturbation and superior performance.
Key contributions
- Unifies single-cell and spatial multi-omics (scRNA-seq, ATAC-seq, CITE-seq, MERFISH) into a common representation.
- Introduces MCMC-based sampler for in-silico perturbations, avoiding out-of-distribution artifacts.
- Outperforms baselines in cell-type annotation (154 identities), perturbation prediction, and multi-omic integration.
- Supports efficient fine-tuning with LoRA and genome-wide transcriptomic response prediction.
Why it matters
CellxPert advances single-cell biology by providing a robust, scalable foundation model for multi-omics data. Its novel MCMC-based perturbation method offers more biologically interpretable in-silico experiments, overcoming limitations of prior approaches. This enables more accurate cell-type annotation and perturbation response prediction.
Original Abstract
In this work, we introduce CellxPert, a scalable multimodal foundation model that unifies single-cell and spatial multi-omics within a common representation space. CellxPert jointly encodes transcriptomic (scRNA-seq), chromatin-accessibility (ATAC-seq), and surface-proteomic (CITE-seq) measurements, while directly incorporating MERFISH and imaging mass-cytometry data as 2D or 3D spatial-visual layers. CellxPert facilitates four key downstream tasks out of the box: (i) cell-type annotation across a broad ontology of 154 largely overlapping identities -- the largest label space addressed to date and a stringent test of fine-grained discrimination, (ii) efficient fine-tuning using Low Rank Adaptation (LoRA), (iii) genome-wide transcriptomic response prediction to in-silico perturbations (ISP), and (iv) seamless multi-omic integration across various assays and platforms. Unlike current single-cell foundation models, which approximate gene perturbations by deleting or reordering tokenized gene expression ranks, CellxPert employs a Metropolis-Hastings sampler whose proposal kernel uses the model's masked conditional distributions to transition to new transcriptomic states conditioned on the perturbed genes. This Markov-chain procedure mitigates out-of-distribution artifacts introduced by abrupt token manipulation and produces trajectories that are biologically interpretable. Evaluations on PBMC68K, Replogle Perturb-seq, Systema, and BMMC benchmarks show that CellxPert surpasses classical and state-of-the-art baselines in cell-type annotation, perturbation response prediction, and multi-omic integration.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.