Re-analysis of the Human Transcription Factor Atlas Recovers TF-Specific Signatures from Pooled Single-Cell Screens with Missing Controls

April 2, 20262604.02511

cs.LGq-bio.GNq-bio.MN

TLDR

This paper re-analyzes the human TF Atlas, recovering robust TF-specific signatures from pooled single-cell screens despite missing internal controls.

Key contributions

Developed a reproducible pipeline for re-analyzing pooled single-cell TF perturbation screens with missing controls.
Recovered TF-specific signatures for 59 of 61 TFs using external controls and background subtraction.
Identified key transcriptional remodelers (e.g., HOPX, FOS) and linked TFs to specific biological pathways.
Validated findings against published rankings, demonstrating robust analysis despite data limitations.

Why it matters

Public single-cell atlases are valuable but often lack complete controls. This paper offers a robust method to re-analyze the human TF Atlas, recovering TF-specific signatures despite missing internal controls. This approach makes challenging datasets more accessible, enabling deeper biological insights and enhancing the utility of existing public resources.

Original Abstract

Public pooled single-cell perturbation atlases are valuable resources for studying transcription factor (TF) function, but downstream re-analysis can be limited by incomplete deposited metadata and missing internal controls. Here we re-analyze the human TF Atlas dataset (GSE216481), a MORF-based pooled overexpression screen spanning 3,550 TF open reading frames and 254,519 cells, with a reproducible pipeline for quality control, MORF barcode demultiplexing, per-TF differential expression, and functional enrichment. From 77,018 cells in the pooled screen, we assign 60,997 (79.2\%) to 87 TF identities. Because the deposited barcode mapping lacks the GFP and mCherry negative controls present in the original library, we use embryoid body (EB) cells as an external baseline and remove shared batch/transduction artifacts by background subtraction. This strategy recovers TF-specific signatures for 59 of 61 testable TFs, compared with 27 detected by one-vs-rest alone, showing that robust TF-level signal can be rescued despite missing intra-pool controls. HOPX, MAZ, PAX6, FOS, and FEZF2 emerge as the strongest transcriptional remodelers, while per-TF enrichment links FEZF2 to regulation of differentiation, EGR1 to Hippo and cardiac programs, FOS to focal adhesion, and NFIC to collagen biosynthesis. Condition-level analyses reveal convergent Wnt, neurogenic, EMT, and Hippo signatures, and Harmony indicates minimal confounding batch effects across pooled replicates. Our per-TF effect sizes significantly agree with Joung et al.'s published rankings (Spearman $ρ= -0.316$, $p = 0.013$; negative because lower rank indicates stronger effect). Together, these results show that the deposited TF Atlas data can support validated TF-specific transcriptional and pathway analyses when paired with principled external controls, artifact removal, and reproducible computation.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers