The one with the dragons

Johan Müllern-Aspegren
AIE Nordics
Published in
4 min readOct 24, 2022

This week much of the team spread out over Europe for vacation. As much as we miss them in office we are very happy to receive pictures from relaxed colleagues in the sun.

As for the ones who stayed behind, we keep working with startups and global enterprises to uncover potential business opportunities in the interface of emerging tech, shifts in consumer behaviour and demographic changes.

… and data data data …

Data and the Reign of Dragons

We all heard it; data is the new orange, the new oil, the new gold. And, as would be expected — a lot of people will be more than to happy tell you that is in fact *not the new gold as well :-)

Fig 1: A corporate dragon keeping a watchful eye on that sweet, sweet data

Anecdotal evidence from the health care industry clearly supports the former notion. Data is gold so better pile it up in a cosy and safe dungeon.

As a direct effect, the situation for data-driven innovation in public health is a very frustrating one. It is a well-known fact that AI-driven and ML-based applications are already today capable of relieving pain and identifying illnesses. They can also improve planning, forecast which remedies will have the best effect and much, much, much more, basically solving many of the dire challenges our society is facing in the coming years. And the solutions are out there. That is the good part.

However, most of the novel ideas need to be trained on sensitive and/or locked-up data. Privacy protecting regulation (that makes sense) and outdated business models (that makes less sense) is holding everyone back. That is the frustrating part.

Do You Believe in Make-believe?

You should. And more importantly — your ML algorithms should, it is called synthetic data and it will very likely rock your world. Though you may not notice.

Synthetic data is data that looks like real data, feels like real data and acts like real data but is not real data. This means that it has some extremely interesting properties. To start with, it can be generated instead of collected, which frees up tons of time and, to really add a cherry on to the top, it can be freely distributed without any integrity issues whatsoever. (You may want to to read that again, because that is dynamite.)

Synthetic data has been around for a few years, especially in the field of medical research, but has recently started to gain a lot of attention in larger circles. The need for data to train increasingly mature AI and ML models is sky-rocketing and the tools for generating relevant synthetic data to feed that need are now in place.

Some of the most advanced tools use a concept called GAN (Generative Adversarial Network) to generate data. One can think of it as two “AI-machines” battling it out at a hysterical pace, one trying to create plausible data and one trying to invalidate that same data. As a base they use a description of the data they must mimic and from that they create images, personal records, texts and even videos. Only the description mind you, the sensitive data is never touched and thus stays as protected from prying eyes as the data owner is from a lawsuit.

The Alchemist

Yesterday’s alchemists were tasked by their masters to create physical gold, and despite their sometimes as innovative as unmentionable efforts; it was a march down Failure Avenue.

The formula for digital gold was crackable though, and it is even better than we could have hoped for. Within a few years from now, public actors will have large amounts of synthetic data including some replicating the most sensitive data. This data can be used for strategic planning, it can be shared with other organisations and jurisdictions, it can be used for testing and maintenance of critical applications.

But most of all, it can be zipped and shared freely with all entrepreneurs, companies, students and whomever wants to develop AI-driven solutions to bring us happiness and health–without security or integrity issues.

We like to talk about the 10x factor, that innovation should not just make things twice as good but rather 10 times as good. Now with synthetic data it is within our grasp to accelerate data-driven innovation by a hundred times.

Interesting times ahead!

--

--