Federated Learning: An Overview

Pooja Vinod
Secure and Private AI Writing Challenge
5 min readAug 12, 2019

Breaking down what Federated Learning is, the opportunities and challenges it presents

Organizational Hierarchy of your Company

Imagine yourself as Miky Davis, Chief Executive Officer(CEO) of a successful corporation. As CEO, you have too little time to oversee the little details of daily work that goes on in your firm, everyday. You want to be up to speed on everything that is going on, yet you want to do that in a much more efficient manner. You do not intend to be lectured on every little problem that every single employee faced, but at the same time you want to know how each department is progressing. This is exactly where the federated organization structure shown above, proves useful.

The Financial Manager, Technical Manager, Human Resources Manager and Administration Manager report to the CEO with a comprehensive update everyday, that is more or less a summary of the progress their concerned department made on that day. This update does not mention every task each of the staff did that day, but rather gives an overall picture of what the department as a whole, accomplished with respect to the current project the company is working on. The CEO now has information about the status of the various departments under his leadership, and can make good decisions keeping all of this in mind, while knowing very little about the specifics.

Federated Learning draws from the same concept. You could think of every mobile user as a source generating data, that could potentially be used to train a model to get better performance(like commonly used vocabulary and app preferences, for example). But snooping directly on user data would be a privacy crime. This is because, it is very much possible to leak sensitive details about users(that the user might not wish to make public)by accessing user-generated statistics.

Security and Privacy are the founding principles of Federated Learning. So, instead of taking user data from a device(which involves the question of their permission)to a cloud where it is aggregated with data coming from several other devices- Federated Learning chooses to bring down the model to be trained, to each user device instead.

As a sequence, Federated Learning would involve the following steps:

  1. A local copy of the model is downloaded to each user device
  2. The device is made checked for being ‘eligible’ for training(making sure the device is not currently engaged in any major tasks, and can be allocated for training the model- so concerns of battery drainage are alleviated)
  3. User data is then used to train the downloaded model.
  4. The training results(just like the comprehensive updates that the managers gave the CEO everyday, in our previous example) from millions of devices are sent up to the central server where all these results are averaged before making the big update to our original model. So at this point, our model has become smarter, but does not have A CLUE about any of the specific details of individual users, simply because it never directly saw any of the individual updates reported from each mobile device(it saw only, the average of these updates).

This smarter model is now downloaded to each user device, to provide better services to every user. Despite each user not generating sufficient data to make the model smarter, every user is getting the benefit of the user community collectively generating sufficient training data.

See these cool cartoons from Google, that summarize the process creatively:

You may now ask, is it possible to ever track these updates and somehow override the privacy aspect that we are going for?

Secure aggregation is a cryptographic protocol which takes care of this. It works by coordinating the exchange of random masks among pairs of participating clients, such that the masks cancel out when a sufficient number of inputs are received. As I said earlier, the model will only be able to view an AVERAGE of all updates from training, and never each individual update.

To play around with Federated Learning, you can use an extension of the PyTorch framework called PySyft, which offers tools to perform deep learning techniques on remote machines. If you feel inspired after reading this article, check out this awesome notebook by Andrew Trask curated for the Facebook Udacity Secure and Private AI Challenge 2019, where you can explore the creation of remote workers and employing them to implement federated learning: https://github.com/udacity/private-ai/blob/master/Section%202%20-%20Federated%20Learning.ipynb

Applications

This kind of technology could be used to safely aggregate medical statistics from wearable tracking devices to speed up disease diagnosis, improve timely and accurate recognition of symptoms in users and inform medical personnel based on predictions. Another application could be in improving the timely servicing of vehicles. Tracking devices implanted in vehicles of a particular brand could be used to derive information about when these cars face a need for servicing- this could even help in quickly detecting malfunctions, predicting accurately when a vehicle needs to go to the mechanic, making the right technical corrections and prove helpful in providing the best maintenance services to their customers. In both these use cases however, the question of complete privacy and security still remains. Users may not feel comfortable about their medical statistics being divulged for the purpose of training public models, while in case of vehicle manufacturers, things could go downhill if company A somehow hacked into company B’s model and figured some shortfalls that they could use to gain an unfair competitive advantage.

While it may not yet be a perfect solution, in short, Federated Learning is one of those awe-inspiring technologies that shows the promise and potential to help protect the fundamental human right of privacy, even in this ‘data-is-the-new-oil’ age where we as users, always risk our privacy being taken for granted.

--

--