ArXiv TLDR

Synthesizing real-world distributions from high-dimensional Gaussian Noise with Fully Connected Neural Network

🐦 Tweet
2604.09091

Joanna Komorniczak

cs.LG

TLDR

A new FCNN-based method generates high-quality synthetic data from Gaussian noise, outperforming state-of-the-art methods in speed and accuracy.

Key contributions

  • Introduces a time-efficient synthetic data generation method using a Fully Connected Neural Network.
  • Employs a randomized loss function to transform Gaussian noise into real-world data distributions.
  • Outperforms state-of-the-art generative methods and achieves faster MMD scores on 25 datasets.
  • Integrates PCA for improved data privacy, classification quality, and reduced complexity.

Why it matters

This paper addresses the need for efficient and high-quality synthetic data. Its novel FCNN approach significantly speeds up generation while maintaining accuracy, crucial for data augmentation, privacy, and model assessment. The method's superior performance and speed make it highly valuable for practical machine learning applications.

Original Abstract

The use of synthetic data in machine learning applications and research offers many benefits, including performance improvements through data augmentation, privacy preservation of original samples, and reliable method assessment with fully synthetic data. This work proposes a time-efficient synthetic data generation method based on a fully connected neural network and a randomized loss function that transforms a random Gaussian distribution to approximate a target real-world dataset. The experiments conducted on 25 diverse tabular real-world datasets confirm that the proposed solution surpasses the state-of-the-art generative methods and achieves reference MMD scores orders of magnitude faster than modern deep learning solutions. The experiments involved analyzing distributional similarity, assessing the impact on classification quality, and using PCA for dimensionality reduction, which further enhances data privacy and can boost classification quality while reducing time and memory complexity.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.