Private Analytics with SDA
…and an open-source Rust implementation
Today we’re happy to open source our Secure Distributed Aggregator (or SDA), and for several reasons: not only do we believe that cryptography should be out in the open, we would also love to see a world moving away from centralised storage of sensitive information, benefitting not only users but also companies — we hope this can be a push in that direction.
Analytics and surveys
Analytics plays a big part in our digital world today, ranging from small startups trying to understand how users behave in their app, to big advertising companies building detailed user profiles.
Most often this is done by first gathering all user data in a central location and only then distilling it into useful information. However, while easy and flexible, this process has implications for both parties, with users giving up on privacy and the companies being either limited to insensitive data or having to put measures in place to avoid a costly data leakage.
And if we move away from analytics the problem can quickly grow bigger. For instance, it doesn’t seem unrealistic to assume that health related surveys are limited today due to privacy concerns, and that significant progress could be made in healthcare if users could somehow be sure that their privacy would not suffer by participating.
One solution concept to the above problems come from modern cryptography, namely secure multi-party computation, where a function is computed on a set of inputs in such a way that the output of the function is revealed but none of the individual inputs.
Secure Distributed Aggregator
While general secure multi-party computation protocols have been around since the 80s, our protocol in SDA differs in that it is optimised to work on weak and sporadic devices such as mobile phones, and with only a single company motivated in investing computing resources. The price to pay is that these optimisations are achieved partly by the fact that the above applications only require us to compute linear functions, yet this still allows us to perform more advanced computations such as generating location-based heatmaps.
The way SDA works is simple but technical, and uses two inventions from modern cryptography: secret sharing and homomorphic encryption. We’re already talked about both in former blog posts, but the useful principle behind them is simple: while completely hidden, data that is respectively shared or encrypted using these schemes may still be computed on.
In slightly more detail, when a user gives an input to SDA, this input is first secret shared between the company and a set of clerks. Due to the properties of the secret sharing scheme, this means that neither of them can learn anything about the input from seeing their own share, yet can still combine shares from several users in order to obtain a share of a linear function of the inputs! Once done, by bringing the combined shares together, the function output may be reconstructed, but nothing else.
Moreover, by encrypting the shares of the clerks using a homomorphic encryption scheme, the work of adding these together may instead be performed by the company, reducing the work of the clerks to a simple decryption operation!
The privacy of our protocol is guaranteed when the users have enough trust in the clerks that hold and process the shares. Users need to trust them to be competent enough to not expose their share by accident, that they play by the rules, and that they do not collude with the other clerks or the company — if they do it would expose the original input.
For any organisation, setting up partnerships with people or organisations to play the role of a clerk can be complicated and slow, and it’s even more difficult when you are a small entity, like a startup company. On the other hand, in the past years there have been a few successful attempts at crowdsourcing trust or privacy from online communities.
Tor, for instance, is a network of secured proxies allowing a user to interact with a web-server through a multiple hop chain. It can used as a way to work around censorship, but also to protect a user IP address. Trust is established by the fact that each server in the chain will perform one level of anonymisation from the next server, so that an attacker would need to corrupt all the servers in an unpredictable chain to be able to spy on a user.
Another example is Bitcoin. Based on a blockchain, a distributed database running as a peer-to-peer network, it implements an alternate currency with trust rooted in a community of non-colluding peers.
Both cases have different involvement models: Tor nodes are often run by altruist system administrators, considering that the cost and the risk involved stays below the benefits of enforcing privacy on Internet. Participating in the blockchain is rewarded in bitcoins. But both examples show that crowd-sourcing trust might be an option, at least when a balance can found between the cost of involvement and some kind concrete or abstract gratification.
Piggybacking on mobile apps
So in order to consider crowd-sourcing a good option, we aimed to lower the involvement of clerks to a bare minimum, to the point where all operations can be run in the background of an iOS or Android application without it getting in the way of the user or draining the battery. This way, a start-up company can get its own distributed trust network just by using its end-user base.
This also gives us a framework of operational constraints: these devices are relatively small in terms of computing power, and operate on limited and intermittent connection. A user can choose to remove an application at any time, so an high level of redundancy has to be built-in.
But by a careful choice of cryptographic schemes, including packed secret sharing, we managed to stay away from heavy computations, to keep the amount of data received by each clerk small, and to distribute the work across more clerks for demanding applications (see our upcoming whitepaper for details).
And of course, one may also consider more capable and more reliable clerks than mobile phones; in particular, the operations of clerks might in many cases be efficient enough to run in a browser plugin, or on a home server such as a NAS.
With SDA we have focused on a very simple problem that has real business value and incur an important (and often ignored) invasion of privacy. We have tried to bridge a gap between our operational realities, our specific applications, and generic MPC protocols studied in academia.
By open-sourcing this work it is our hope to generate some interest in the area, including building a community and developing more powerful protocols for the setting.
Secure Distributed Aggregator can be found on github.
If you enjoyed this article, it would really help if you hit recommend below :)
If you want to work on AI + Privacy, check our jobs page!