Federated Learning — a business explainer
The history of data and algorithms has been simple:
Get all your data in one place and do stuff with it
Whether a data warehouse or a business specific data store, this is the way things have always been, you bring the data in one place and then run your queries and algorithms on that. In a Collaborative Data Ecosystem however, you can’t always guarantee you can centralize all the data
But what if you can’t do that? What if due to privacy or IP protection concerns people won’t just hand over their data to you? What if it is so damned large that shifting it just isn’t feasible? What if I don’t really trust the people I’m working with but know I need to work with them?
This is where Federated Learning comes in, along with other technologies such as Differential Privacy and Homomorphic encryption — for a quick explainer on those elements see this example on how it can help selling beer. With Federated Learning you leave the data where it is, but have a learning process that is able to remotely learn and then coordinate across multiple environments to create a coherent single model.
Federated Learning means you can work with external companies, protect privacy and IP, while working together to create better outcomes. It represents a whole new way for companies to work together. Capgemini’s work on COVID response in Spain, just wouldn’t have been possible if not for Federated Learning, it allowed Pharma companies to collaborate with hospitals while ensuring patient privacy.
There are two types of Federated Learning, horizontal and vertical, without going into the full details the broad differences are as follows
Vertical Federated Learning is where multiple parties have data on the same ‘thing’ for example a set of people. In this case the distributed data sets need to be unified in someway to facilitate federated learning and being able to identify that Person A in Company A is the same as Person B in Company B. So either a set of identifiers (Private Set Intersection) are shared or more advanced privacy preserving approaches are used. The point in vertical federated learning is you are aiming to work across companies around the same entities. The data is still federated, but the goal is linking the unique ‘things’ between the data sets.
Horizontal Federated Learning is what Capgemini did in the COVID response, we had similar data in multiple separate places that gave us more and more information about a specific challenge — COVID response — and thus by being able to learn across those stores we could see more and learn more. So in horizontal learning its about the problem or challenge to be solved and we aren’t looking at identifying the same people or things and linking them across stores, but trying to find similar behaviors, similar discoveries and create a coherent model as if we could do a single learn across all of that data.
Federated Learning means that while your technical business processes might be trapped inside your organization, your ability to work with data is not.