ArXiv TLDR

Flow Matching for Count Data

🐦 Tweet
2605.07746

Ganchao Wei, John Pearson

stat.MLcs.LGq-bio.QM

TLDR

Introduces count-FM, a novel flow-matching framework for high-dimensional count data, achieving better sample quality and efficiency.

Key contributions

  • Proposes count-FM: a flow-matching framework for count data using a continuous-time birth-death process.
  • Efficiently learns marginal transitions in count space via simulation-free training of conditional rates.
  • Achieves better sample quality and modeling efficiency than baselines with fewer parameters.
  • Applied to scRNA-seq and neural spike-trains for generation, transport, and conditional generation.

Why it matters

The paper addresses a key challenge in analyzing high-dimensional count data by proposing a more natural and efficient modeling approach. This method offers significant improvements in data generation and transport for critical applications like single-cell RNA sequencing and neuroscience, providing interpretable insights.

Original Abstract

High-dimensional count data arise in applications such as single-cell RNA sequencing and neural spike trains, where mapping between distributions across successive batches or time points form critical components of data analysis. The recent success of diffusion- and flow-based deep generative models for images, video, and text motivates extending these ideas to count-valued settings, but many existing methods either treat each count as a categorical state or transform counts into a continuous space, neither of which is natural or efficient when the count range is large. We propose count-FM, a flow-matching framework for count data based on a continuous-time birth-death process with local unit jumps. Count-FM learns marginal transitions efficiently in count space through simulation-free training of conditional transition rates, allowing transport between arbitrary count-distributed source and target populations. In simulation, count-FM achieves better sample quality than representative baselines while using substantially fewer parameters. We further apply count-FM to scRNA-seq and neural spike-train data for unconditional generation, transport, and conditional generation. Across these tasks, count-FM yields improved sample quality, greater modeling efficiency, and interpretable transport paths.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.