Set-Aggregated Genome Embeddings for Microbiome Abundance Prediction

May 12, 20262605.12286

Younhun Kim, Georg K. Gerber, Travis E. Gibson

q-bio.GNcs.AI

TLDR

This paper uses Set-Aggregated Genome Embeddings (SAGE) with genomic language models to predict microbiome abundance from DNA, showing improved generalization.

Key contributions

Introduces Set-Aggregated Genome Embeddings (SAGE) for microbiome abundance prediction.
Leverages genomic language models (GLMs) for few-shot learning from raw DNA sequences.
Demonstrates improved generalization on novel genomes over classical bioinformatics methods.
Shows community-level latent representations and intermediate transformations enhance prediction.

Why it matters

This work advances microbiome analysis by predicting community properties directly from DNA sequences using novel embedding techniques. It offers a more generalized and robust approach than traditional methods, crucial for understanding and manipulating complex microbial ecosystems. This could accelerate discoveries in health, agriculture, and environmental science.

Original Abstract

Microbiome functions are encoded within the genes of the community-wide metagenome. A natural question is whether properties of a microbial community can be predicted just from knowing the raw DNA sequences of its members. In this work, we employ set-aggregated genome embeddings (SAGE) to predict community-level abundance profiles, exploiting the few-shot learning capabilities of genomic language models (GLMs). We benchmark this approach to show improved generalization on novel genomes compared to classical bioinformatics approaches. Model ablation shows that community-level latent representations directly result in improved performance. Lastly, we demonstrate the benefits of intermediate transformations between latent representations and demonstrate the differences between GLM embedding choices.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers