ArXiv TLDR

A Quasi-Regression Method for the Mediation Analysis of Zero-Inflated Single-Cell Data

🐦 Tweet
2604.08507

Seungjun Ahn, Donald Porchia, Panos Roussos, Maaike van Gerwen, Qing Lu + 1 more

stat.MEq-bio.QMstat.AP

TLDR

QuasiMed is a new quasi-regression method for causal mediation analysis in single-cell data, relaxing strict distributional assumptions.

Key contributions

  • Introduces QuasiMed, a quasi-regression framework for causal mediation analysis in single-cell data.
  • Relaxes strict distributional assumptions by specifying only mean functions in mediation models.
  • Screens mediators and estimates indirect effects considering both average expression and cell proportion.
  • Demonstrates high power, FDR control, and computational efficiency in real-data-inspired simulations.

Why it matters

This paper addresses a critical gap in causal mediation analysis for single-cell data, which has unique structural properties. By relaxing strict distributional assumptions, QuasiMed offers a more robust and flexible tool for researchers. Its ability to identify mediating causal pathways in complex single-cell datasets can significantly advance our understanding of gene regulation and disease mechanisms.

Original Abstract

Recent advances in single-cell technologies have advanced our understanding of gene regulation and cellular heterogeneity at single-cell resolution. Single-cell data contain both gene expression levels and the proportion of expressing cells, which makes them structurally different from bulk data. Currently, methodological work on causal mediation analysis for single-cell data remains limited and often requires specific distributional assumptions. To address this challenge, we present QuasiMed, a mediation framework specialized for single-cell data. Our proposed method comprises three steps, including (i) screening mediator candidates through penalized regression and marginal models (similar to sure independence screening), (ii) estimation of indirect effects through the average expression and the proportion of expressing cells, (iii) and hypothesis testing with multiplicity control. The key benefit of QuasiMed is that it specifies only the mean functions of the mediation models through a quasi-regression framework, thereby relaxing strict distributional assumptions. The method performance was evaluated through the real-data-inspired simulations, and demonstrated high power, false discovery rate control, and computational efficiency. Lastly, we applied QuasiMed to ROSMAP single-cell data to illustrate its potential to identify mediating causal pathways. R package is freely available on GitHub repository at https://github.com/sjahnn/QuasiMed.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.