NoGAN, a New Generation of Synthetic Data Algorithms

I introduce a new, NoGAN alternative to standard data synthetization methods. It is designed to run faster by several orders of magnitude, compared to training generative adversarial networks (GAN). In addition, the quality of the generated data is far superior to almost all other products available on the market.

Many evaluation metrics to measure faithfulness have critical flaws, sometimes ranking a replication as excellent, when it is actually a failure, due to using on low-dimensional indicators. I fix this problem with the full multivariate empirical distribution (ECDF). As an additional benefit, both for synthetization and evaluation, all types of features — categorical, ordinal, or continuous — are processed with a single formula, regardless of type, even in the presence of missing values.

In a real-life case study involving tabular data, the synthetization was generated in less than 5 seconds, versus 10 minutes with GAN. It produced much better results, verified via cross-validation. Thanks to the very fast implementation, it is possible to automatically and efficiently fine-tune the hyperparameters. I also discuss next steps to further improve the speed, the faithfulness of the generated data, and applications other than synthetization.

Session Summary

Dr. Vincent Granville

Code of Conduct

Refund Policy

Press Inquiries

Don't miss a thing!