The Mystery of Data Sharing and Privacy Protection: What Is Federated Learning?
In the process of data value release, GoodData realizes data privacy protection and secure sharing through the combination of various technologies. In the last article, we talked about the important role that differential privacy plays in preventing data disclosure. Now, we will introduce a technology applied by GoodData to assist multiple participants in machine learning on the premise of ensuring data privacy and security: federated learning.
The concept of federated learning
Federated learning is a machine learning technology that can train algorithms between multiple distributed edge devices or servers with local data samples without exchanging data samples. Participants do not need to transfer data to the server, but instead to the local training model. It only needs to transfer parameters between the server and each node, which solves the problem of data privacy.
According to the different data distribution among multiple data owners, federated learning can be divided into three categories: horizontal federated learning, vertical federated learning, and federated transfer learning.
- Horizontal federated learning
Horizontal federated learning refers to the joint learning of participants when there is more overlap of sample features but less overlap of users. For example, banks A and B in different regions have similar businesses, but different users. With the cooperation of a third party (such as GoodData), the system aligns the encrypted samples of the data of A and B, selects samples with the same characteristics but different users, and then jointly trains a machine learning model in GoodData. In this process, participants’ data are trained in an encrypted environment. Data privacy protection is guaranteed.
2. Vertical federated learning
Vertically federated learning aims at joint learning among multi-party data owners with less sample feature overlap but more user overlap. For example, hospital A and bank B in the same region have data from users in the region. Due to different businesses, the sample special diagnosis is also different. Through vertical federated learning, both A and B can jointly improve the model effect on the premise of data protection, and will not lose their original data.
3. Federated transfer learning
Federated transfer learning is applicable where there is little overlap of characteristics and samples among participants, such as the combination of hospital A and bank B in different regions. Essentially, both data owners use the similarity between data to apply the model learned in the source domain to the target domain. This learning process is similar to a person who can sketch to learn to draw watercolor.
Application of federated learning
The contribution of federated learning to data security and sharing makes it widely used in various fields. Take the medical industry for example.
The problem of data “information silos” in the medical and health field is a major obstacle to the development of the industry. There is no interconnection and unified standard for medical data between different hospitals in different regions. Federated learning allows medical institutions to update model parameters only by transmitting encrypted information through protocols without uploading their medical data to the server or exchanging data samples, so as to realize the training results of using data without exposing data privacy. In addition, federated learning can solve key problems such as data rights confirmation, privacy protection and access to heterogeneous data, which provides a strong support for the Metaverse constructed by data in the future. It is also an indispensable part of the GoodData blockchain to realize data monetization.
Summary of GoodData’s official account
WeChat subscription account：Good Data Foundation