Create artificial data with Gretel Synthetics and Google Colaboratory

Alexander Watson
Gretel.ai Engineering and Data Science
2 min readMar 31, 2020

In this post we’ll use Gretel Synthetics and Google Colaborary’s free GPUs to train a machine learning model to automatically generate fake, anonymized data with differential privacy guarantees.

Today we will walk through some of the new features in Gretel’s gretel_synthetics open-source synthetic data library ver 0.6.0 including:

  • Google SentencePiece support for unsupervised tokenization, with configurable vocabulary size & character coverage.
  • smart_open support to load datasets from AWS, GCP, Azure.
  • Launch directly into Colaboratory.

Check out the walk-through screencast below, or click the Colab link to get started creating your own synthetic dataset!

Try out Gretel-Synthetics in Google Colaboratory
https://vimeo.com/400326654

For a deep dive on anonymizing precise location data, check out our previous deep dive on anonymizing scooter ride-share data, and how we discovered and partnered with Uber to fix privacy concerns in public ride-share feeds.

--

--

Alexander Watson
Gretel.ai Engineering and Data Science

Co-Founder at Gretel.ai, previously GM at AWS. Love artificial intelligence and security. @alexwatson405