Machine Learning Intern Journal — Federated Learning
As the title indicates, this is the journal of a Machine Learning (ML) intern at the impactIA Foundation. I’ll be attempting to keep a weekly journal of my activities in the Foundation to keep track of my progress and leave a roadmap for the interns who come after me.
Introduction
Last week was a time that many researchers look forward to — the annual Conference on Neural Information Processing Systems. It was started in 1986 as an open interdisciplinary meeting for researchers exploring biological and artificial Neural Networks. It is today one of the most popular conferences related to Artificial Intelligence, attracting thousands of paper submissions every year. This edition, due to COVID-19, was held virtually.
Our Foundation puts a significant accent on education while on the job, understanding that continually developing one’s knowledge in a constantly evolving field like AI is imperative to the Foundation’s ambitions. Therefore, I was able to spend half my working hours last week attending various tutorials and workshops. There was so much on it was quite overwhelming. The main talks I attended were:
- ‘Where Neuroscience meets AI (And What’s in Store for the Future)’ by Jane Wang, Kevin Miller, Adam Marblestone
- ‘Offline Reinforcement Learning: From Algorithm Design to Practical Applications’ by Sergey Levine, Aviral Kumar
- ‘Abstraction & Reasoning in AI systems: Modern Perspectives’ by Francois Chollet, Melanie Mitchell, Christian Szegedy
- ‘Federated Learning and Analytics: Industry Meets Academia’ by Brendan McMahan, Virginia Smith, Peter Kairouz
They were all incredibly exciting and informative — and yes I’ll admit at some times beyond my comprehension. In this blog I want to discuss Federated Learning, an exciting new avenue of Machine Learning.
Time For Change
Over recent years, big tech companies like Amazon and Google have built up a cloud-based data empire. Amazon Web Services and Google Cloud have become key components to these behemoth’s business strategies. In the world of AI, many users store their model and data on these popular cloud services in a centralised way. While these offer immediate practical solutions, such a centralised model is not good for society in the long run. Clive Humbly said ‘Data is the new oil’, and we do not want to live in a world where all our data is owned and stored by the big tech. There is a historic hearing taking place this year in the United States involving the big tech companies concerning anti-trust issues. People care about their data and want to retain a certain control over it.
Over the past few years, a new approach has been gaining traction: Federated Learning (FL).While traditional model architectures require all the data to be stored in once place, FL instead distributes the learning process over the edge. Wait, what?
What is Federated Learning?
Okay, let’s take a step back. Imagine you’re building a mobile translation service. You create and run a neural network on your training data, deployed it and now you want to collect user generated data from your website or app to keep on improving the service. This local data is usually sent over to centralised machine or data centre (the ‘cloud’). There are a few downsides to this approach. Firstly, the data collected by local devices or sensors has to be sent back to the centralised machine or data centre for processing, before being returned to the local devices or sensors. This takes time and limits the capacity for real-time learning. Secondly, there are data privacy issues with collecting and storing your users data in one centralised place. We’ve seen the promises of blockchain’s decentralised system gain popularity over the years. FL also uses a decentralised framework to improve on the above limitations.
FL instead downloads the current model onto the local device or sensor, and computes an updated model using local data. These locally trained models then encrypt and send an update (usually weights) back to the central server, which aggregates all of these small updates from various sources into a single consolidated and improved global model — which is then sent back to the devices, and the cycle repeats.
Let’s quickly to a step-by-step to make sure you understood the above paragraph:
- A local device downloads a generic model from the central server
- Calculate the error of the model with respect to the local data
- The local device encrypts and prepares an update (the error), which it sends back to the central server
- The central server aggregates (ie. averages) all the small updates it has received to form a consensus change to the shared model
- The procedure is repeated
Now while this approach could have theoretically existed for a while, it is only recently that it has become a practical approach. Over the past few years, smartphones have been getting ‘smarter’ (no really, I mean it for once). Apple, Samsung and the likes have been kitting their flagship phones with ‘neural engines’ capable of rapid matrix multiplication (the holy grail of current AI systems, for a great explanation, go read my colleague’s recent blog about graphical processing units).
Why is this useful?
Firstly, it addresses the problem of real-time prediction and learning by downloading the model locally and improving the model based on local data. Secondly, it’s a step towards user privacy, since the central server never comes into contact with the user’s data. This enables several organisations to collaborate on the development of models by pooling together their data, without needing to directly share the secure data with each other. Therefore, FL decentralises machine learning by removing the need of storing data in a central location. This is invaluable to hospitals and other organisations working in the medical domain where user privacy is primordial.
Challenges
This may all seem too good to be true, and as with everything in life, there are some challenges ahead for Federated Learning.
Communication is critical in FL networks, since the data remains on each local devices and only the ‘updates’ from the model are shared with the central server. This may become a bottleneck, since a simplistic implementation of a federated network requires each device to send a full model update back to the central server for each round. You can imagine that this gets impractical when models get very large. However, research is underway to address this challenge, such as “Federated Learning: Strategies for Improving Communication Efficiency” (Oct. 2017), which investigates methods that can reduce the uplink communication costs.
Furthermore, federated networks must be able to handle small fractions of the devices being active at once, as well as the variability in hardware which will affect storage, computational and communication capabilities. They also need to manage dropped devices in the network.
Maybe most importantly, although it’s a step towards user privacy, by intercepting and using the gradient or model updates from different participating nodes one can reconstruct the original ‘private’ data. For true guaranteed privacy, we need differential privacy (beyond the scope of this blog, but this is a good introduction).
Conclusion
This blog was meant as a gentle introduction to Federated Learning, and therefore has not addressed all aspects of this approach. If you are interested in learning more about it, I strongly suggest you read “Federated Learning: Challenges, Methods, and Future Directions” (Aug. 2019) by a group of Carnegie Mellon University researchers that provide an extensive summary of recent work in this domain.