ArXiv TLDR

PHALAR: Phasors for Learned Musical Audio Representations

🐦 Tweet
2605.03929

Davide Marincione, Michele Mancusi, Giorgio Strano, Luca Cerovaz, Donato Crisostomi + 2 more

cs.SDcs.AIcs.LGeess.SP

TLDR

PHALAR is a novel contrastive framework for musical stem retrieval, achieving state-of-the-art accuracy with fewer parameters and faster training.

Key contributions

  • Introduces PHALAR, a contrastive framework for musical stem retrieval.
  • Achieves up to 70% relative accuracy increase over SOTA with <50% parameters and 7x speedup.
  • Employs Learned Spectral Pooling and a complex-valued head for pitch/phase equivariance.
  • Sets new state-of-the-art in stem retrieval across MoisesDB, Slakh, and ChocoChorales.

Why it matters

PHALAR significantly improves musical stem retrieval by overcoming limitations of prior models, offering superior accuracy and efficiency. Its innovative use of spectral pooling and complex-valued processing sets a new benchmark, making it highly relevant for music production and audio analysis. The model also captures robust musical structures beyond its primary task.

Original Abstract

Stem retrieval, the task of matching missing stems to a given audio submix, is a key challenge currently limited by models that discard temporal information. We introduce PHALAR, a contrastive framework achieving a relative accuracy increase of up to $\approx 70\%$ over the state-of-the-art while requiring $&lt;50\%$ of the parameters and a 7$\times$ training speedup. By utilizing a Learned Spectral Pooling layer and a complex-valued head, PHALAR enforces pitch-equivariant and phase-equivariant biases. PHALAR establishes new retrieval state-of-the-art across MoisesDB, Slakh, and ChocoChorales, correlating significantly higher with human coherence judgment than semantic baselines. Finally, zero-shot beat tracking and linear chord probing confirm that PHALAR captures robust musical structures beyond the retrieval task.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.