What is synthetic data? And how can it accelerate public policy research?

Photo by Markus Spiske on Unsplash

Synthetic data is like taking some original data and smudging it to look more like an artist’s impression

The solution we are proposing is synthetic data, but perhaps not as you currently know it. (And if you’re like most people, you don’t currently know it at all.)

Synthetic data is like an artist’s impression of the information

Synthetic data is like taking some original data and smudging it to look more like an artist’s impression. It’s a new version of a data set that is generated at random, but made to follow the structure and some of the patterns of the original data set.

We’re making it easy

As well as getting the word out, we have also released a Python notebook that guides a user through generating low-fidelity synthetic data. This is designed to be self-explanatory for anyone who works with data — even if they have limited experience with Python. The main reason we wrote this new notebook is that existing tools, methods and tutorials for generating synthetic data focus on high-fidelity synthetic data. By writing a guided script that only generates low-fidelity, lower risk synthetic data, we keep things simpler and safer for researchers and data owners.

--

--

Designing our world for who and how we are: brought to you by the Behavioural Insights Team — The Nudge Unit

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
BIT

We are The Behavioural Insights Team (BIT), one of the world’s leading behavioural science organisations, working around the world to improve people’s lives.