Environmental Sound Deepfake Detection Using Deep-Learning Framework

April 21, 20262604.19652

Lam Pham, Khoi Vu, Dat Tran, Phat Lam, Vu Nguyen + 4 more

cs.SDcs.AI

TLDR

This paper proposes a deep-learning framework for detecting environmental sound deepfakes, achieving high accuracy by fine-tuning pre-trained models.

Key contributions

Proposes a deep-learning framework for environmental sound deepfake detection (ESDD).
Shows that detecting deepfake sound scenes and events should be treated as distinct tasks.
Demonstrates fine-tuning pre-trained models (e.g., WavLM) outperforms training from scratch.
Achieves 0.98 Accuracy on EnvSDD and 0.88 Accuracy on ESDD-Challenge-TestSet with a 3-stage strategy.

Why it matters

Deepfake audio poses a significant threat, and detecting environmental sound deepfakes is crucial for security and authenticity. This research provides an effective deep-learning solution, highlighting key strategies like task separation and fine-tuning, which can advance the field of audio deepfake detection.

Original Abstract

In this paper, we propose a deep-learning framework for environmental sound deepfake detection (ESDD) -- the task of identifying whether the sound scene and sound event in an input audio recording is fake or not. To this end, we conducted extensive experiments to explore how individual spectrograms, a wide range of network architectures and pre-trained models, ensemble of spectrograms or network architectures affect the ESDD task performance. The experimental results on the benchmark datasets of EnvSDD and ESDD-Challenge-TestSet indicate that detecting deepfake audio of sound scene and detecting deepfake audio of sound event should be considered as individual tasks. We also indicate that the approach of finetuning a pre-trained model is more effective compared with training a model from scratch for the ESDD task. Eventually, our best model, which was finetuned from the pre-trained WavLM model with the proposed three-stage training strategy, achieve the Accuracy of 0.98, F1 Score of 0.95, AuC of 0.99 on EnvSDD Test subset and the Accuracy of 0.88, F1 Score of 0.77, and AuC of 0.92 on ESDD-Challenge-TestSet dataset.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers