Genomics
Computational genomics, gene expression analysis, and DNA sequence modeling.
q-bio.GN · 53 papersTranscriptomic Models for Immunotherapy Response Prediction Show Limited Cross-cohort Generalisability
This paper finds that current transcriptomic models for predicting immunotherapy response show limited cross-cohort generalisability and inconsistent biomarker signals.
Entropy, Disagreement, and the Limits of Foundation Models in Genomics
High entropy in genomic sequences causes poor performance and instability in foundation models, suggesting self-supervised training limitations.
An Imbalanced Dataset with Multiple Feature Representations for Studying Quality Control of Next-Generation Sequencing
A new imbalanced dataset with two distinct feature representations is introduced to improve quality control of next-generation sequencing data.
Synonymous Codon Usage Bias Overrides Phylogeny to Reflect Convergent Frond Architecture in a Rapidly Radiating Fern Family Thelypteridaceae
Ferns show that synonymous codon usage bias (CUB) can override phylogeny, reflecting convergent frond architecture driven by specific photosynthesis genes.
High-dimensional Many-to-many-to-many Mediation Analysis
Introduces a high-dimensional many-to-many-to-many (MMM) mediation analysis framework for variable selection, effect estimation, and outcome prediction.
Re-analysis of the Human Transcription Factor Atlas Recovers TF-Specific Signatures from Pooled Single-Cell Screens with Missing Controls
This paper re-analyzes the human TF Atlas, recovering robust TF-specific signatures from pooled single-cell screens despite missing internal controls.
QuantumXCT: Learning Interaction-Induced State Transformation in Cell-Cell Communication via Quantum Entanglement and Generative Modeling
QuantumXCT uses quantum entanglement and generative modeling to learn cell-cell communication as state transformations, moving beyond static ligand-receptor databases.
Benchmarking Heritability Estimation Strategies Across 86 Configurations and Their Downstream Effect on Polygenic Risk Score Performance
This study benchmarks 86 heritability estimation strategies, finding significant variability in estimates but surprisingly robust polygenic risk score performance.
annbatch unlocks terabyte-scale training of biological data in anndata
Annbatch enables terabyte-scale biological data training by providing an out-of-core mini-batch loader for anndata, drastically speeding up ML workflows.
VeloTree: Inferring single-cell trajectories from RNA velocity fields with varifold distances
VeloTree infers single-cell differentiation trees from RNA velocity fields using a novel varifold distance-based dissimilarity measure.
Non-ignorable fuzziness in granular counts: the case of RNA-seq data
This paper shows that fuzzy counts in RNA-seq data lead to non-ignorable reporting mechanisms and introduces a hierarchical model to address this.
Large Language Models for Variant-Centric Functional Evidence Mining
This paper introduces AcmGENTIC, an LLM-powered pipeline and benchmark for automating the extraction and classification of functional evidence for genomic variants.
Genetic algorithms for multi-omic feature selection: a comparative study in cancer survival analysis
Sweeping*, a new multi-view genetic algorithm, improves multi-omic feature selection for cancer survival prediction by optimizing accuracy and biomarker set size.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.