Privacy-preserving Machine Learning

3 min readApr 16, 2018

During the last few weeks we are witnessing a rising concern about privacy. For the first time many people are realizing that their data can be used against them. Machine Learning in the wrong hands can be used to diminish their individual liberties, to influence the public opinion or as a control mechanism used by totalitarian states.

My data is worthless, but my privacy is priceless.

We see Privacy as a Fundamental Human Right, and at the same time we acknowledge that not everyone has the money to pay for services that have become a need. That’s why many months ago we started to work on Privacy-preserving Machine Learning. As a way to keep augmenting and improving people’s lives but without the risk of a third party storing, misusing or losing sensitive data .

If computers are a bicycle for the mind, ML is a kite. It will let us fly.

Gathering data and training models on the cloud is the design choice nowadays. It has many advantages for companies and Data Scientists, it is the easy choice for them, but not for the users. With Privacy-preserving ML we want to empower users, putting them in control of the algorithms they interact with. This will be fundamental for healthcare applications and critical infrastructures among many. Privacy by design will be the norm, not the exception.

Distributed | Federated ML it is not an easy problem to solve, is mainly a problem of efficient communication, synchronicity and computing power at the edge.

We started replicating the Federated Learning paper from Google, on a bunch of Raspberry Pi’s using TensorFlow. You can find tutorials and code here!

This was really cool and the results are great! However, the benefits of distributing the learning on more devices fade out due to communication overheads, and we wanted a solution that can run on mobile devices, be communication efficient, with the same or more privacy guarantees than the federated averaging approach and unsupervised (the only way to label data in a distributed setup is using the interaction of the user with the device/app).

We have evaluated different approaches, homomorphic encryption, secure multi-party computation… and at this moment they are not feasible solutions, more on this in a future post. People need something now, not in 5 to 10 years.

The good news is that we have developed a Novel Unsupervised Federated algorithm. We get up to 50x speedups compared to state-of-the-art methods and is up to 1 Million times more robust on constrained devices.

It is a work in progress, with this approach we can solve a subset of the problems in ML, but this is a step towards training models on the edge, without data leaving the device. We foresee many applications on mobile devices, industrial IoT and digital healthcare.

When you get a competitor that gives back the ownership of the data but is just as efficient, change can happen overnight.

— Alex Pentland

If you want to know more, stay tuned for more details to come, or contact us at contact@acuratio.com

— The Acuratio Team

Thanks to Boost VC for their support during this journey.

Privacy-preserving Machine Learning

Written by Acuratio