Federated Learning: A collaborative approach of machine learning

Pragati Baheti
4 min readFeb 3, 2020

--

Figure 1: Operational structure of federtaed learning

Alone we can do so little, together we can do so much”- The concept of federated learning in terms of machine learning is very much based on this. Federated learning considered to be a new milestone in the achievements of AI, is a decentralized way of training with data distributed over millions of devices. Federated learning finds its role when accumulating data all at one place is privacy invading and as a result is not prone to be given by users.

Beyond giving Android users a smarter keyboard, Google is exploring the use of federated learning to improve security, Google head of account security Mark Risher told VentureBeat AI staff writer Kyle Wiggers in a recent phone interview. The shift in the approach of tech giants such as Google, Microsoft , Facebook in implementing federated learning is itself a new change in the era of data science.

Federated Learning vs Centralized Learning

Federated training vs Centralized training
Figure 2: Comapritive study of federated and centralized approach of training

While the comparisions between the naive approach of training the centralized model and distributed model is somewhat proxy of what federated learning is all about. In centralized learning, all the data was accumulated at a single place and training was done on all the accumulated data at once. This naive approach of training was not only time consuming but also required large resource being used to process huge quantities of personal data.

The shift towards decentralized approach was more like distributed server framework where the computes were distributed between central server(s) and multiple client computers. Each node train model on its own set of data. Federated learning is a type of remote execution wherein these models from distributed nodes are sent to a central accumulator. This eliminates the need to store sensitive training data on a central server.

How it works?

Figure 3: An example of federated learning for the task of next-word prediction on mobile phones. Devices communicate with a central server periodically to learn a global model.

CREATE A MODEL- A data scientist creates a model using PyTorch, TensorFlow, or Keras, and defines a training configuration (number of epochs, alpha, etc.).

HOST A MODEL ON DISTRIBUTED NETWORK- Using PySyft, a data scientist uploads their private model to a secure Grid node. It can now be accessed by devices, which download the model for local training.

USERS TRAIN THE MODEL-In parallel, many different devices update their local models by training on local data. These devices could be a mobile phone (Android or iOS), web browser, IoT device (Raspberry Pi), or another cloud machine.

AVERAGE THE RESULTS- After a certain amount of training, models are averaged into a new global model which privately aggregates intelligence collected by local models and expilicitly trains the model on the entire dataset without the need of accumulating the data.

DELIVER THE RESULTS SECURELY- Once a success criteria is met, the model is delivered back to the model owner as a global update. Mission accomplished!

What will Enable the Growth of Federated Learning?

You may not have noticed, but two of the world’s most popular machine learning frameworks — TensorFlow and PyTorch — have taken steps in recent months toward privacy with solutions that incorporate federated learning.

Federated learning improves privacy of data by encorporating diffrential privacy- Updates sent from devices can still contain some personal data or tell you about a person, and so differential privacy is used to add gaussian noise to data shared by devices.

Use of federated learning, for example, led to a 50x decrease in the number of rounds of communication necessary to get a reasonably accuracy. This not only saves the time but also computational power and resources.

The challenges to be overcome by the concept of Federated Learning in future is the bandwidth consumption for the transfer of local updates and global model as updation. This can be overcome by the concept of sharing only the model parameters eg. weight etc rather than the complete model.

Frameworks for implementing federated learning

PYSYFT

PySyft is a Python library for secure, private machine learning. PySyft extends PyTorch, Tensorflow, and Keras with capabilities for remote execution, federated learning, differential privacy and multi-party computation.

PYGRID

PyGrid is a platform for private AI using PySyft, enabling one to privately host models and datasets in the cloud for encrypted, federated prediction and training.

Get a hands on experience on these libraries and dive deeper into this unique and smart way of training model on distributed records.

--

--

Pragati Baheti

SDE at Microsoft | Amalgamation of different technologies | Deep learning enthusiast