Genomics
Computational genomics, gene expression analysis, and DNA sequence modeling.
q-bio.GN · 53 papersA Resampling-Based Framework for Network Structure Learning in High-Dimensional Data
RSNet is an R package for robust, interpretable network inference in high-dimensional data, using resampling and graphlet analysis for structural insights.
scShapeBench: Discovering geometry from high dimensional scRNAseq data
scShapeBench introduces a benchmark and scReebTower, a new method for automated shape detection in high-dimensional scRNAseq data, outperforming baselines.
Set-Aggregated Genome Embeddings for Microbiome Abundance Prediction
This paper uses Set-Aggregated Genome Embeddings (SAGE) with genomic language models to predict microbiome abundance from DNA, showing improved generalization.
LPDP: Inference-Time Reward Control for Variable-Length DNA Generation with Edit Flows
LPDP enables training-free, inference-time reward control for variable-length DNA generation using biologically plausible edit flows.
SCOPE: Siamese Contrastive Operon Pair Embeddings for Functional Sequence Representation and Classification
SCOPE introduces a Siamese MLP with protein language model embeddings for scalable operon pair classification, achieving competitive ROC-AUC.
MicroFuse: Protein-to-Genome Expert Fusion for Microbial Operon Reasoning
MicroFuse integrates protein and genome context using a Mixture-of-Experts model to accurately predict microbial operons, outperforming baselines.
A Linear-Transformer Hybrid for SNP-Based Genotype-to-Phenotype Prediction in Grapevine
LiT-G2P, a linear-Transformer hybrid, improves genotype-to-phenotype prediction in grapevines, enhancing breeding decisions and genetic gain.
Feature Dimensionality Outweighs Model Complexity in Breast Cancer Subtype Classification Using TCGA-BRCA Gene Expression Data
This study shows that feature dimensionality is more critical than model complexity for breast cancer subtype classification, with logistic regression excelling.
A Versatile AI Agent for Rare Disease Diagnosis and Risk Gene Prioritization
Hygieia is a versatile AI agent that integrates multi-modal data for accurate rare disease diagnosis and risk gene prioritization, outperforming physicians.
OmicsLM: A Multimodal Large Language Model for Multi-Sample Omics Reasoning
OmicsLM is a multimodal LLM that connects quantitative omics data with natural language for biological reasoning, outperforming existing models.
When Does Gene Regulatory Network Inference Break? A Controlled Diagnostic Study of Causal and Correlational Methods on Single-Cell Data
This paper diagnoses why causal gene regulatory network inference methods often fail, revealing they excel in clean data but are vulnerable to specific pathologies.
Statistics of a multi-factor function from its Fourier transform
A new theorem enables deriving multi-factor function statistics from its Fourier transform, revealing hidden relationships via index annihilation.
ORBIT: Learning Gene Program Co-Activation Structure for Cell-Type-Stratified Pathway Rewiring Analysis in Single-Cell Transcriptomics
ORBIT is a self-supervised transformer that learns asymmetric gene program dependencies from single-cell RNA-seq, revealing cell-type-specific pathway rewiring.
EFGPP: Exploratory framework for genotype-phenotype prediction
EFGPP is a reproducible framework that integrates diverse genetic and clinical data to improve complex human trait prediction, demonstrated on migraine.
PhenotypeToGeneDownloaderR: automated multi-source retrieval and validation of phenotype-associated genes
PhenotypeToGeneDownloaderR automates multi-source retrieval, validation, and harmonization of phenotype-associated genes for downstream analysis.
Beyond Continuity: Simulation-free Reconstruction of Discrete Branching Dynamics from Single-cell Snapshots
Unbalanced Schrödinger Bridge (USB) reconstructs discrete branching cell dynamics from snapshots, integrating stochastic and birth-death events.
CellxPert: Inference-Time MCMC Steering of a Multi-Omics Single-Cell Foundation Model for In-Silico Perturbation
CellxPert is a multi-omics single-cell foundation model using MCMC for biologically interpretable in-silico perturbation and superior performance.
CRC-Screen: Certified DNA-Synthesis Hazard Screening Under Taxonomic Shift
CRC-Screen offers certified DNA-synthesis hazard screening, maintaining low miss and false-flag rates even under taxonomic shifts.
Hyper Input Convex Neural Networks for Shape Constrained Learning and Optimal Transport
Introducing HyCNNs, a novel neural network architecture for learning convex functions, combining Maxout and ICNNs for better efficiency and performance.
Robust Clustering Analysis of Genes Related to Age-related Macular Degeneration using RNA-Seq
This paper presents a robust gene clustering analysis of Age-related Macular Degeneration (AMD) RNA-Seq data, identifying novel and known hub genes.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.