Close-up: Anonymisation is dead, long live data protection

Thomas Wilke
9 min readAug 27, 2019

Michael Platzer on data-driven innovation in the era of privacy

Mostly AI is part of our portfolio and is funded with one million euros in the seed phase. This interview with co-founder and CEO Michael Platzer is part of our Close-up series, in which we present entrepreneurs from our portfolio: their milestones, their thinking, their learnings.

Very early on, Michael realised that traditional anonymisation was no longer working in the age of big data and began searching for a way to solve this global problem in a completely new way. As early as 2014, he determined how reliably customer behaviour can be simulated using artificial intelligence during his research work. Back then, data protection was something people would only pay lip service to. After the data scandals of recent years, however, large corporations are now listening to him: in Asia, the USA and, of course, in Europe. This is because he and his team provide them with synthetic customers — who exhibit real behaviour. As an investor, it fascinates me how ahead of the curve Michael was when it comes to recognising the billion-dollar conflict between data-driven innovation and privacy. It wasn’t just his years at Microsoft that enabled him to think big, but his models also understand information in such depth that they can recreate entire worlds of data. All three founders of Mostly AI are characterised by this way of thinking. Good reason for an interview, then.

You founded the enterprise back in 2017. What was the impetus for Mostly AI? Can you give us an idea of the big picture behind it?

Michael Platzer: Nobody argues that data is of immense value to society. Health data, for example, opens up new insights into the progression of diseases and illuminates previously undiscovered interrelationships for research. This poses a dilemma, however — and not just for the world of research, but also for society as a whole: how is it possible to analyse large data sets without violating the privacy of the people behind them? Corporations are even more acutely faced with this question: they need to keep on being innovative without losing the trust of their customers, while customer focus is only possible with meaningful data. Up until a few years ago, the narrative went along the lines of: If you want to stay competitive internationally, you must loosen data protection, in order to exploit the data treasure.

The common approach to anonymisation is simply to remove or distort personal information. As several scandals have shown, however, this is not sufficient in practice, because the possibilities of re-identification are underestimated and the methods to do so are developing rapidly. Another common problem of anonymisation today is that it heavily reduces the statistical significance and thus the value of data for analysis and research. Before and after we founded Mostly AI, the impetus for our research was the knowledge that the more data you collect, the the bigger this problem gets, because this makes it easier to trace said data back to individuals. From today’s perspective, the only response to this would be to destroy even more information to ensure anonymity.

“Anonymisation in the conventional sense is not fit for purpose”

What doesn’t work about anonymisation as we know it, and how long has this been known?

Date of birth, gender and zip code are all that is needed to clearly identify over 87 percent of the population. Therefore, it is far from sufficient to simply remove first names and surnames. Even if we remove all personal information, I can still be re-identified on the basis of my behavioural data. Netflix already had to admit this in court in 2009, after it passed customer data about film preferences to third parties. It was normally the case that no two people had watched the same films on the same days. Through comparison with a publicly available film forum, it was possible to quickly re-identify users and disclose their entire history of watched films. Way back in 2015, researchers such as Yves-Alexandre de Montjoye also demonstrated and published this by using more sensitive data, such as financial transaction or movement data. These findings and the associated risk for companies have recently been receiving a lot of attention in the media.

How exactly does your product work?

Classic anonymisation takes a destructive approach. In an attempt to protect information, it destroys it. In contrast, we work with a generative approach. We generate new customer data on the basis of learned patterns — which is very challenging; it is much more difficult to bake a cake, than to eat it.

Our algorithms analyse existing data and map it using a probability model. We subsequently use this to generate a completely new population of synthetic customers with synthetic data. The magic here is that single data sets are realistic and can hardly be distinguished from real data. The entire synthetic population is also statistically representative; it can not only be used for analyses, but also to train further models with this data.

What kind of AI are we talking about here?

We’re talking about Generative AI here. It not only tries to hit a single target variable as accurately as possible; it must also extract complete structures and deeper connections and reliably reproduce them. In recent years, academic research has produced impressive results in the area of image synthesis. We have further developed the methods from image synthesis and were able to apply them to personal data. Another advantage when data is generated by an AI is the nature of machine learning. It is designed to learn as much information as possible without being dependent on individual data sets or individuals stored in the data set. Machine learning looks for formulas that not only apply to one person, but can also be generalised to other individuals. This automatically ensures that each individual’s unique information remains protected.

“The bigger the company, the easier the pitch.”

Who are your customers and which companies are interested in your product?

Our first customers came from the financial industry. Data handling was an important issue for them even before the General Data Protection Regulation was introduced. The industry is sensitive to data and keeps it strictly under lock and key. However, it is also under pressure to change, to produce digital innovations and to remain customer-driven. Our customers now also include health insurers and telecommunication companies. Not only do they preside over very large data sets, their data is also particularly exciting when it comes to analysing customer behaviour. In particular, large companies that have already developed a data strategy are opening their doors to us. And it is also true that the bigger the company, the easier the pitch. This is because it is almost impossible to trace who accesses what data in large organisations.

How much awareness of your product is there in the international market?

Much like environmental protection, data protection used to be something people paid mere lip service to. The scandals of the last two years have definitely been a wake-up call for the USA. The topic is much discussed, not least by the CEOs of the big players themselves such as Apple and Microsoft. We are also currently involved in the Plug & Play Accelerator program in Silicon Valley, where we are encountering a lot of interest in synthetic data. At the same time, we are seeing an increase in momentum in markets such as Germany and Austria. We are personally very happy about this development.

You founded Mostly AI back in 2017, long before the data abuse and leaks connected to Facebook. What made you believe in your solution when the market hadn’t even admitted that there was a problem?

All of us in the founding team are mathematicians. I have a doctorate in marketing sciences, my co-founders Roland and Klaudius are Doctors of Medical Physics. This means that we’ve always been very familiar with the topic of sensitive data. Roland also worked in the telco industry for several years and was all too familiar with the challenge of completely anonymising mobility data. We were very impressed with the progress made in AI image synthesis at that time. Although hardly any research had been done in this direction, it was clear to us that these methods could be applied to other areas just as well. For us, there was never any doubt about the solution and the market for it, because traditional anonymisation was no longer fit for the ever-growing amounts of data. Maybe that’s why we’ve been approaching this with confidence for years now and never let ourselves get irritated.

You just won a competition held by SONY in Japan and you’ve pitched to companies in the USA — two completely different cultures. What drives them with regard to privacy and how do you tell your story?

The story is not the synthetic customers, but rather effective data protection that enables innovation — and this is cross-cultural. The fact that traditional anonymisation offers no protection is something that governments and companies around the world now understand. This topic easily provides us with starting points for connecting with telco giants in Japan. However, data leaks concerning well-known individuals are also a big issue there. One only has to imagine politicians being identified, and the fact that this is not just possible using geodata, i.e. their place of residence. In the USA, on the other hand, data protection is strongly driven by the big tech companies, albeit not as a matter of compliance; they are raising it to the strategic level to avoid becoming the next media scandal. As a start-up based in Vienna, we definitely benefit from Europe’s pioneering role in data protection and the good reputation that goes with it.

You are still a relatively young company, have 10 employees and are currently recruiting. How do you find talent?

We, the founders, are already 41 and 35 years old and have a personal network that has grown over the years. In addition, each of us has several years of experience in the industry. We were able to build up a good reputation before founding Mostly AI. Accordingly, we were able to put together a very strong core team of talented people via our network. It’s a lot of fun exchanging ideas and developing new ones in such a team each day. What most attracts talent, however, is that we are truly at the forefront of AI research and are also developing a radically new product for a global market which builds on that research. It couldn’t be more exciting.

“The market is just emerging. In this phase, it pays off to be a good listener.”

Can you tell us what your next steps will be, e.g. in terms of financing, after the 2018 seed round?

Right now, our primary focus is our customers. They have top priority and provide us with essential feedback. We’ve seen that we already correctly anticipated many things, but the market for synthetic data is only just emerging, and in this phase, it pays off to be a good listener. In addition, we intend to continue investing heavily in research and development. We assume that the competition will go for data quality, however usability is also an important aspect for companies. The easier it is to understand and the more accurate the data synthesis is, the greater its value. In view of the global market for our solution, we are currently also considering a next step in the form of a further financing round. As you well know, investors interest in regard to investing in privacy tech solutions is growing massively, and we are excited that the topic is now receiving so much attention.

What are the most important things you have learnt so far?

This may sound a little unusual for a data scientist, but one important learning experience is that it invariably makes sense to trust your gut when making big decisions. In this regard, the choice of the right investors is no different — and we went through this experience together, of course — than the choice of a romantic partner. In any case, we had a very good basis for discussions with our seed investors right from the start, which was not to be taken as a given with such a complex subject. We’re pleased with the ongoing support, but also with the critical questioning, which of course you already know.

German version

--

--