Adaptive Distillation for Decentralized Learning from Heterogeneous Clients (ICPR’20)

Published in

OMRON SINIC X

4 min readJan 4, 2021

We are pleased to announce that our recent work on federated learning for heterogeneous clients will be presented at the International Conference on Pattern Recognition (ICPR) 2020. Our talk is scheduled in poster session T1.13 (2 pm — 3 pm GMT on Jan. 15).

Jiaxin Ma, Ryo Yonetani, and Zahid Iqbal, “Adaptive Distillation for Decentralized Learning from Heterogeneous Clients”, Proc. ICPR, 2020 [arXiv] [Video presentation]

This project was done while the last author, Zahid Iqbal from the Universiti Sains Malaysia, was doing an internship at OMRON SINIC X.

What’s Wrong with Federated Learning?

Suppose that you want to train a deep neural network for classification tasks, but you had no labeled data to train the model. Then you found that people in the world have relevant data in their devices such as smartphones. Nevertheless, accessing data directly is impossible due to privacy and security concerns. Then, how can we leverage such distributed data for training your model?

Overview of a typical federated learning framework

A promising approach is Federated Learning (FL). FL is a kind of collaborative learning between the server and clients distributed over the world. Instead of data themselves, the server and clients exchange a model (global model) to train it collaboratively as follows: 1) the server distributes the global model to some random clients, 2) the clients train the model using their own data and send the updated model back to the server, and 3) the server aggregates the client’s models to further update the global model and distributes it again — until the global model achieves sufficient performance.

While FL has extensively been studied in recent times, we identify two limitations:

Each client must train a model of the same architecture. This is problematic when clients have devices with different hardware — someone may have a high-end server with GPUs and can update a ResNet in a few seconds, while others may have a smartphone with only limited computational resources, and require a lot more time for model updates.
The server and clients must keep communicating with each other. Although many studies have tried to make this communication efficient, we are interested in training a good model under more limited communication conditions — for example, if the clients can submit their models only once.

Adaptive Distillation for Federated Learning

Our Decentralized Learning via Adaptive Aggregation (DLAD) framework

Based on the observations above, we have developed a new learning framework named Decentralized Learning via Adaptive Distillation (DLAD). The key idea is to leverage a network distillation technique to transfer the trained recognition ability of client models to the global one.

More specifically, we suppose that the server and clients are given plenty of unlabeled data resources that we refer to as “distillation data”. Then, we ask the server to train the global model using the distillation data to imitate the outputs from the (pre-trained) client models. This way, we allow the clients to have a model of different architectures and share their trained model with the server only once.

Computing confidence weights for adaptive aggregation

A technical challenge here was how such imitation can be done for multiple client models that each were trained with non-identical data. We address this problem by asking each client to train an additional classifier that distinguishes their own data from distilled data. Once trained, the output of this classifier can be used as a “confidence” of the client for the input sample, as it indicates how similar that sample is to what clients had in their own data. With an extensive experimental evaluation, we show that this confidence value can be used to adaptively aggregate outputs from multiple clients and allow the global model to learn from more confident clients.

Here’s our presentation at ICPR 2020.

What’s Next?

While this framework was for supervised classification tasks, we have also developed a similar framework for transfer reinforcement learning problems, where each client shares a policy so that the server can train a new agent in a sample efficient fashion. For more details, please refer to this post.

At OMRON SINIC X, we will continue fundamental research on computer vision, machine learning, and robotics. If you are interested in working with us as an intern, send us your application at internships@sinicx.com and get in touch!

Relevant posts:

MULTIPOLAR: Multi-Source Policy Aggregation for Transfer Reinforcement Learning between Diverse Environmental Dynamics

Adaptive Distillation for Decentralized Learning from Heterogeneous Clients (ICPR’20)

What’s Wrong with Federated Learning?

Adaptive Distillation for Federated Learning

What’s Next?

Written by Ryo Yonetani