What data science can & can’t do to track Covid-19, a talk with Maxime Agostini, cofounder at Sarus

XAnge
XAngeVC
Published in
4 min readJun 8, 2020
Maxime Agostini is the cofounder of Sarus

Because of the fundamental tension that exists between making data useful and strongly anonymous, the ongoing health crisis has forced us to make a choice. But there should be other ways. Maxime co-founded Sarus with the idea that organizations can learn how to leverage their data without ever compromising on privacy.

Is it fair to say that Western Europe has prioritized privacy over efficiency when managing the crisis?

European leaders have been cautious with personal data protection, and still managed to find some level of efficiency. Take the example of European telecom companies, they used technical data on mobile devices to map aggregate movements during lockdown. To a limited extent, it helped governments anticipate possible outbreaks without threatening privacy.

They took strong protection measures to enforce data anonymity: it was aggregated to a level that makes sure no one can use it for any other purpose than following high-level population movements. The flip side is that this data cannot be put to work on other tasks. Because it is strongly anonymous, its utility has to be very narrow.

What kind of data would we need to collect if we wanted to build the most efficient model to understand the pandemic — and curb its spread?

You could think of rich models that would follow precisely how people interact with one another and in what context. They could also leverage age, medical records and of course the location of those who tested positive.

As a scientist, I would love to work on virus propagation models in families, in homes and in schools. But this is science fiction: the data is way too personal, it does not exist today and probably should not be generated in the first place.

The truth is, very few organizations can lead this fight on their own. Public or private, they don’t have the data at their disposal and politically the subject is as sensitive as it gets. No democratic leader wants to be remembered for tracking down and monitoring all citizens.

Does it mean we should still collect more data if we wanted to build efficient models?

I’m not in a position to say we should collect more data, but in the case of Covid, if we want to build better models, we’re going to need more of it.

While most privacy professionals advocate for less data collection, we believe that reasonable data collection alongside proper protection measures can go a long way. What matters most is that the data doesn’t get in the wrong hands. Everything should be made to limit exposure and potential leakage.

How is it possible to use richer data and still make sure nobody has access to it?

A new generation of startups are building alternative solutions to improve data protection while still proposing advanced data applications. Technologies bearing the names of “homomorphic encryption”, “federated learning”, or “differentially-private learning” make it possible to look at the problem from a different angle: maybe you can achieve the same outcome while not exposing the data. Sarus was created with this vision that data practitioners don’t ever need to access personal information to build successful data applications or carry out research.

We take a privacy-by-design approach to the entire data science workflow. The general principle is that the data scientist should be able to work exactly the same even though they never access the original data. Naturally, the outcome or their work is anonymous as well. Very sensitive data can therefore be safely leveraged for innovation opportunities and not at the cost of data protection.

In the case of Covid, you could think of a model to answer questions such as: for what classes and ages should we start reopening schools for each city? The information that is needed here is extremely personal (health records, COVID test results, GPS traces…) but the answer to our question is not. The idea is to get straight to the answer without having to expose personal information along the way.

The model sounds counter-intuitive, doesn’t it?

It is! Culturally and technically, few people realize that you can work on data sets without accessing them. We are just at the beginning but potential applications in mobility, in banking and of course in healthcare are huge — anywhere data is sensitive and needs to be protected.

Read also

--

--

XAnge
XAngeVC

#VC funds - @siparex group - “We love entrepreneurs who rock the codes”