Machine learning

Amazon helps launch workshop on synthetic data generation

Workshop at ICLR 2021 unites communities investigating synthetic data generation to improve machine learning and protect privacy.

By Sergul Aydore, Krishnaram Kenthapadi

May 3, 2021

3 min read

We are excited to announce the first Workshop on Synthetic Data Generation, to be held virtually at ICLR 2021 on May 7, 2021.

Synthetic data is a powerful solution to two different problems: data limitations and privacy risks. In cases of limited labeled data, synthetic data can be used to augment training data, mitigating overfitting. In the case of protecting privacy, data curators can share synthetic data instead of real data in a manner that both protects the privacy of users and preserves the utility of the original data.

Although these two scenarios share similar technical challenges, such as quality and fairness, they are often studied separately. Our workshop aims to deepen our understanding of the challenges of synthetic data generation in both scenarios.

Two Amazon scientists, applied scientist Sergul Aydore *(left)* and principal scientist Krishnaram Kenthapadi, are among the organizers of the First Workshop on Data Augmentation at this year's ICLR.

The workshop is organized by a team of researchers from academia and industry with expertise in topics such as privacy, fairness, healthcare, and robustness in machine learning. The team consists of two Amazon scientists, Sergul Aydore, an applied scientist on the Amazon Web Services external-security-services team, and Krishnaram Kenthapadi, a principal applied scientist on the Amazon Web Services machine learning team; Haipeng Chen, from Harvard University; Edward Choi, from the Korea Advanced Institute of Science and Technology (KAIST); Jamie Hayes, from Google DeepMind; Mario Fritz, from the CISPA Helmholtz Center for Information Security; and Rachel Cummings, from Columbia University.

Our workshop includes invited talks, contributed talks, poster sessions, and a panel discussion, and it involves a diverse group of researchers and practitioners. We are proud to host the following seven invited talks (in order of appearance):

Can machine learning revolutionize healthcare? Synthetic data may be the answer, Mihaela van der Schaar, University of Cambridge, the Alan Turing Institute, UCLA
Generative models for image synthesis, Jan Kautz, NVIDIA
Differentially private synthetic data generations using generative adversarial networks, Jinsung Yoon, Google Cloud AI
Towards financial synthetic data, Manuela M. Veloso, J. P. Morgan, CMU
Bias and generalization of deep generative models, Stefano Ermon, Stanford University
Generative modeling for music generation, Sander Dieleman, DeepMind
Ethical considerations of generative AI, Emily Denton, Google’s Ethical AI team

The workshop features 24 accepted papers, each of which will have an individual breakout session for a poster presentation. Among these papers, the following seven will have oral presentations:

Synthetic data for model selection, Matan Fintz (Amazon); Alon Shoshan (Technion); Nadav Bhonker (Amazon); Igor Kviatkovsky (Amazon); Gérard Medioni (USC) (PDF)
Ensembles of GANs for synthetic training data generation, Gabriel Eilertsen (Linköping University); Apostolia Tsirikoglou (Linköping University); Claes Lundström (Linköping University); Jonas Unger (Linköping University) (PDF)
Few-shot learning via tensor hallucination, Michalis M. L. Lazarou (Imperial College London); Tania Stathaki (Imperial College London); Yannis Avrithis (Inria) (PDF)
Leveraging public data for practical private query release, Terrance Liu (Carnegie Mellon University); Giuseppe Vietri (University of Minnesota); Thomas Steinke (Google); Jonathan Ullman (Northeastern University); Steven Wu (Carnegie Mellon University) (PDF)
FFPDG: Fast, fair and private data generation, Weijie Xu (Amazon); Jinjin Zhao (Amazon); Francis Iannacci (Amazon); Bo Wang (Amazon) (PDF)
Overcoming barriers to data sharing with medical image generation: A comprehensive evaluation, August DuMont Schütte (Max Planck Institute for Intelligent Systems); Jürgen Hetzel (University Hospital of Tübingen); Sergios Gatidis (University of Tübingen); Tobias Hepp (Max Planck Institute for Intelligent Systems); Benedikt Dietz (ETH Zurich); Stefan Bauer (Max Planck Institute); Patrick Schwab (ETH Zurich) (PDF)
Imperfect imaGANation: Implications of GANs exacerbating biases on facial data, Niharika Jain (Arizona State University); Alberto Olmo (Arizona State University); Sailik Sengupta (Arizona State University); Lydia Manikonda (Rensselaer Polytechnic Institute); Subbarao Kambhampati (Arizona State University) (PDF)

We will conclude the workshop with a panel discussion with the invited speakers and an award ceremony.

About the Author

Sergul Aydore

Sergul Aydore is an applied scientist with Amazon Web Services.

Krishnaram Kenthapadi

Krishnaram Kenthapadi is a principal scientist with Amazon Web Services.

Amazon helps launch workshop on synthetic data generation

Workshop at ICLR 2021 unites communities investigating synthetic data generation to improve machine learning and protect privacy.

Related content

Work with us