Dimensionality-Aware Anomaly Detection in Learned Representations of Self-Supervised Speech Models

May 4, 20262605.02715

Sandra Arcos-Holzinger, Sarah M. Erfani, James Bailey, Sanjeev Khudanpur

eess.AScs.CRcs.LG

TLDR

This paper introduces GRIDS, a framework using Local Intrinsic Dimensionality to detect anomalies and monitor performance in self-supervised speech models.

Key contributions

Introduces GRIDS, a framework using Local Intrinsic Dimensionality (LID) for analyzing S3M representations.
LID increases with low SNR perturbations and diverges between benign noise and adversarial inputs at high SNR.
Demonstrates that LID elevation correlates with increased Word Error Rate (WER) in ASR performance.
Enables transcript-free anomaly detection in S3Ms using layer-wise LID features, achieving high AUROC (0.78-1.00).

Why it matters

This paper introduces a crucial method for monitoring the robustness of self-supervised speech models without transcripts. By tracking local geometric changes, it enables early detection of performance degradation, vital for reliable deployment and ensuring model integrity in real-world applications.

Original Abstract

Self-supervised speech models (S3Ms) achieve strong downstream performance, yet their learned representations remain poorly understood under natural and adversarial perturbations. Prior studies rely on representation similarity or global dimensionality, offering limited visibility into local geometric changes. We ask: how do perturbations deform local geometry, and do these shifts track downstream automatic speech recognition (ASR) degradation? To address this, we present GRIDS, a framework using Local Intrinsic Dimensionality (LID) across layer-wise representations in WavLM and wav2vec 2.0. We find that LID increases for all low signal-to noise ratio (SNR) perturbations and diverges at high SNR: benign noise converges toward the clean profile, while adversarial inputs retain early-layer LID elevation. We show LID elevation co-occurs with increased WER, and that layer-wise LID features enable anomaly detection (AUROC 0.78-1.00), opening the door to transcript-free monitoring in S3Ms.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers