ArXiv TLDR

A Unified HI Rotation Curve Corpus for Computational Astrophysics: 438 Galaxies from SPARC, THINGS, LITTLE THINGS, and WALLABY DR2

🐦 Tweet
2604.13489

David C. Flynn

astro-ph.GAastro-ph.IM

TLDR

This paper introduces a unified corpus of 8,963 HI rotation curve measurements from 438 galaxies across four major surveys.

Key contributions

  • Presents a unified corpus of 8,963 HI rotation curves from 438 galaxies across four major surveys.
  • Data is distributed in structured JSON and flat CSV, including quality annotations.
  • Designed for traditional numerical analysis and Large Language Model (LLM) RAG pipelines.

Why it matters

This corpus unifies disparate HI rotation curve datasets, simplifying access for researchers. It supports both traditional numerical analysis and modern LLM-based computational astrophysics, providing a crucial resource for studying galaxy dynamics and dark matter distribution.

Original Abstract

We present a unified corpus of 8,963 spatially resolved HI rotation curve measurements across 423 galaxies (438 total catalog entries including 15 metadata-only THINGS galaxies), drawn from four major surveys: SPARC (175), THINGS (34), LITTLE THINGS (26), and WALLABY DR2 (203). The corpus is distributed as a single structured JSON file with nested per-ring kinematic data, survey metadata, column definitions, and data-quality annotations, accompanied by a 438-row flat CSV for catalog-level filtering. All radii are in kiloparsecs, all velocities in km/s. Kinematic parameters have been verified against scanned primary tables. A two-tier quality system distinguishes hand-curated rotation curves with per-point uncertainties (Tier 1) from automated pipeline products (Tier 2). The corpus was designed for both traditional numerical analysis and Large Language Model retrieval-augmented generation (RAG) pipelines. Three worked examples demonstrate single-galaxy rotation curve plotting, multi-component baryonic analysis, and corpus-level parameter-space exploration, each requiring fewer than 15 lines of Python. The corpus is publicly available at Zenodo (DOI: 10.5281/zenodo.19563417) under CC BY 4.0.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.