ArXiv TLDR

Mamba-SSM with LLM Reasoning for Biomarker Discovery: Causal Feature Refinement via Chain-of-Thought Gene Evaluation

🐦 Tweet
2604.14334

Pushpa Kumar Balan, Aijing Feng

q-bio.QMcs.AI

TLDR

Mamba-SSM and LLM CoT refine biomarker discovery, improving classification performance with fewer features, even with selective reasoning faithfulness.

Key contributions

  • Mamba-SSM identifies candidate biomarkers from RNA-seq data using gradient saliency.
  • LLM Chain-of-Thought reasoning filters initial gene lists, removing confounders for improved performance.
  • LLM-filtered 17-gene set outperforms a 5,000-gene baseline (AUC 0.927) using 294x fewer features.
  • Introduces 'selective faithfulness,' where targeted confounder removal boosts performance despite incomplete recall.

Why it matters

This research introduces a novel method combining deep sequence models with LLM reasoning for biomarker discovery. It significantly improves classification accuracy while drastically reducing feature count. The concept of 'selective faithfulness' offers a new perspective on LLM utility in scientific discovery.

Original Abstract

Gradient saliency from deep sequence models surfaces candidate biomarkers efficiently, but the resulting gene lists are contaminated by tissue-composition confounders that degrade downstream classifiers. We study whether LLM chain-of-thought (CoT) reasoning can faithfully filter these confounders, and whether reasoning quality drives downstream performance. We train a Mamba SSM on TCGA-BRCA RNA-seq and extract the top-50 genes by gradient saliency; DeepSeek-R1 evaluates every candidate with structured CoT to produce a final 17-gene set. The raw 50-gene saliency set (no LLM) performs worse than a 5,000-gene variance baseline (AUC 0.832 vs. 0.903), while the LLM-filtered set surpasses it (AUC 0.927), using 294x fewer features. A faithfulness audit (COSMIC CGC, OncoKB, PAM50) reveals only 6 of 17 selected genes (35.3%) are validated BRCA biomarkers, yet 10 of 16 known BRCA genes in the input were missed - including FOXA1. This gap between downstream performance and reasoning faithfulness suggests selective faithfulness: targeted confounder removal is sufficient for performance gains even without comprehensive recall.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.