“Empowering Secure Machine Learning in Retail with Federated Learning”

Midhun Mohan
Litmus7 Systems Consulting
13 min readJun 2, 2023

Introduction

AI has been a buzzword in the industry for quite a while and has widespread applications and implementations across all industries. And Data is the raw ingredient on which AI is built. Knowingly or unknowingly we produce petabytes of data everyday.

Data - personally identifiable information - is a double-edged sword, it can certainly make businesses understand their stakeholders and drive growth but at the same time if not managed and secure it can turn out to be the biggest of threats.

There is an increased importance given to data privacy and security. Governments all over are taking this seriously by bringing into effect various laws and regulations with varying degrees of constraint and restraint - GDPR and CCPA being some of the foremost ones

For Retailers this is all the more pertinent since it involves tapping into consumer behavior, understanding them better and tracking their ever-changing needs. Armed with this data is what enables a retailer to empower a customer’s decision-making journeys and guide them towards better product discovery and eventually master clienteling [1]

Now, Federated learning, is one of the recent technology paradigms that enables AI models deployed at the edge for eg: mobile/wearables where the model does not have to share data or “talk” to a central server to deliver insights but can do so by executing on the edge server itself.[2]

Distributed Learning

Distributed machine learning, is one of the most critical components in the ML stack - a multi-node ML system, boosts accuracy, handles larger input data volumes, and improves performance. It further reduces machine errors, assists with data analysis and decision-making and is capable of handling large data sets.

Large amounts of data can usually be challenging because machine learning techniques have scalability and efficiency problems.

Large-scale learning requires distributed ML algorithms where they can distribute learning operations across a number of workstations generating a large amount of data. And because data is so large, programmers frequently retrain data to avoid interfering with workflow and use parallel loading.[3]

What is federated learning?

Federated learning allows for a way to train AI models without sharing private user data.

The objective of federated learning is the transfer of computations to where the data resides. When a globally shared model is purchased and the data is, for instance, on smartphones. Transferring a model to the apparatus allows us to train it collectively.[3]

Anyone can take part in federated learning on their devices directly or indirectly. And where computationally constrained devices become a bottleneck, edge devices like smartphones and IoT devices can benefit from on-device data without the data ever leaving the device.[5]

It’s a great idea to move calculations to data in order to create any kind of intelligent system while protecting user privacy. It is actually the decentralized form of machine learning.

How Federated Learning Works?

Federated learning enables learning at the periphery, enabling model training to be applied to data that is dispersed among millions of devices. Additionally, it gives you the ability to enhance results obtained from outlying areas.

You will initially choose a model for the capability launch that has either never been trained or has already been trained on the main server. The dissemination of the initial model to the clients would be the subsequent phase in the process.

Each client continues to use local data to train it locally. It is crucial that this training data, which could include privileged emails, chat logs, private images, and health measurements, be kept secret. It might be challenging or even impossible to collect this data in cloud-based environments.

After being locally trained, the updated models are transmitted via encrypted communication channels to the main server. It’s crucial to keep in mind that in this scenario, the server only receives trained model parameters rather than actual data. The updates from all clients are averaged and combined to increase the shared model’s accuracy. This model is then returned to all computers and servers.

The exciting aspect of federated learning is its iterative training process. As a result, the participants can still communicate while the server and clients pass back and forth the updated parameters without jeopardizing its privacy.[5]

How different is Federated learning from classical distributed/centralized learning?

The primary distinction between federated learning and distributed learning is the assumptions made about the properties of the local datasets -

  • distributed learning’s original goal is to parallelize computing power
  • federated learning’s initial goal is to train on heterogeneous datasets

In conventional machine learning, the participant data are independent and uniformly distributed. On the other hand, federated learning makes the non-i.i.d assumption because different users have various data types.

In conventional machine learning, all of the training is done on a single server after the data has been gathered. This training presents numerous privacy issues when the data is shared with the primary cloud server.

Federation learning has surpassed distributed data training in importance. Users can collectively train local models on local data using federated learning, which protects users’ private information from being shared with a central cloud server. It allows for continuous learning on end-user devices while guaranteeing that no end-user data leaves the device.

In traditional machine learning, a centralized environment uses all training data to build a single ML model. This function works without a hitch when a central server is available to deliver the predictions. [3]

Federated learning can be implemented on the end-user device, but continuous learning is difficult because models need to be trained on a large dataset to which the end-user device does not have access.

Role of Federated Learning in Personalization while maintaining data privacy

Federated learning uses decentralized edge devices or servers to maintain distributed data and applies machine learning techniques to them. To a centralized server, the original data is never ever moved. It persists on the gadget.

The results are sent back to the server once the training algorithm has finished processing the data. The supplied results have been encrypted. This makes it impossible for anybody to examine the outcomes and recreate the original data. To increase security, the findings can be encrypted using a key which the server doesn’t know, making it nearly hard to decrypt any data. Each device is capable of producing a high-quality model by sending many packages of training data over time. After then, the training algorithm may remove itself from the machine it was using. To provide the fastest upload speeds possible, these updates are quantized and rotated randomly.

The server only uses the average results from the updates when using the federated averaging technique. Utilizing secure aggregation is an alternative strategy. The server can only decode the aggregate data at that point because this method aggregates the encrypted results from any number of edge devices. This adds another layer of security, making it even more difficult to recover the original data. Each edge device that will transmit its training results first adds zero-sum masks to the results. The secure aggregation protocol is used for this. The results are then presented in an obscured format. The masks, on the other hand, precisely cancel out when the training results are added up.

Even though the server cannot access any edge device’s results, there is still a privacy concern about what would happen if one device had provided special information that stood out from the other findings gathered. The question is whether sharing the data from that one device with the other results on the server might compromise its source’s right to privacy. That it might happen is the unsettling response. To prevent this from happening, any anomalous data is removed. The argument is that in order for machine learning to function at its best, it must recognize and utilize common patterns in the data.

An alternative strategy that could be used is differential privacy. This limits the amount of data that can come from a single-edge device and be used in the model. Additionally, noise can be added to mask any uncommon data. This prevents one device from supplying too much data and having a big impact on the model being built. Model memorization is the term for this.[4]

Model Size and Federated Learning

Model size describes the amount of storage and processing power needed to store and use a machine-learning model. Larger models usually have more parameters and can capture more intricate patterns in the data, but they also take more time and money to train and use.

The model size is a crucial consideration when using federated learning, a distributed machine learning technique that enables multiple parties to train a shared model without sharing their data.

Each participant in federated learning trains a local version of the model on their own data, and the updates are then combined to enhance the overall model. The parties may find it difficult to train the model locally when the model size is large because it might need an excessive amount of memory and processing power. As a result, the training period may be prolonged and the updates to the global model may communicate slowly.

It is possible to reduce the size of the model without noticeably degrading its performance using methods like model compression or pruning to alleviate this problem. In order to design a model architecture that can be effectively trained and deployed on end devices, it is also crucial to take into account the device and network constraints.

Utilizing methods like Federated Learning by Co-gradient Descent, which enables the parties to train using a subset of the global model parameters, is another option that can help to lower the memory and processing demands on the local devices.

Large models can be trained on distributed data by combining federated learning and model size, which can enhance the efficiency and scalability of machine learning systems. The design and implementation of federated learning systems must carefully address these challenges as they are presented by this strategy, including the communication and privacy issues it raises.

How to achieve Personalization in FL Models?

With the freedom to store data on the user’s device for further processing, the potential for personalizing the user experience has increased dramatically. Generally, there are three stages in creating a personalized federated learning experience.

  • Initial Stage

The initial stage marks the first step in creating a communication channel between different participants in a network. This phase refers to the initial exchange of information. While edge computing encrypts end-user data, locally trained models are then shared across a network based on federated learning. So the initial set of communications here is offloaded to the next layer [7] [8]

  • Training Stage

In the training phase, a local model is built using the end-user's data. In edge computing, this happens at the edge and in federated learning on the device itself. It is an iterative process where the global model converged on the central server is updated collectively, improving accuracy for each learning cycle [7] [8]

  • Personalization Stage

This phase is the continuation of the learning phase. The personal information is used for creating a local model on the local device. In federated learning, this phase addresses the end-user device, the source of custom data, and enables the local models to more efficiently align with user information, creating an overall personalized experience [7] [8]

What motivates Personalization in Federated Learning?

Federated Learning enables the development of the local model on the user’s device. However, no two devices or users are the same. There are a number of variations in on-premise models built across multiple edge devices. This leads to heterogeneity of the data on different levels. This is the main driver for the implementation of personalization concepts in the entire communication network. Thus, the motivation to implement personalization is categorized below. [8]

  • Device Heterogeneity

The increasing use of IoT devices and the constant improvement in the communication capabilities of end devices have led to a multitude of options. Edge devices now range from data storage and processing capabilities to communications and hardware capabilities. All of these features directly impact the data processing and communications costs of running these devices locally. It becomes particularly challenging when devices of varying complexity are subjected to the same iterative process of model training through global updates. Therefore, communication problems with heterogeneous devices can only be addressed by implementing custom federated learning techniques [7] [8]

  • Data Heterogeneity

The key term in this category is non-IID data distribution. It refers to the non-identical distribution of independent data. Each user has their device commit environment, resulting in a unique result for each device. There exists a variety of different data types in each device and also the number of samples taken to train the local model. This problem leads to the divergence of the global model, which is the exact opposite of the end result.

Although, there are solutions in the form of federated averaging (FedAvg) to deal with data heterogeneity [9], this may lead to reduced performance and can lead to insignificant results. Hence, it’s very much required to implement personalized Federated Learning models [7] [8]

  • Model Heterogeneity

When multiple devices are connected over a network, users typically agree on a predefined communication architecture. This enables an efficient environment for the convergence of local models and eases the deployment of the global model to all edge devices. However, with IoT devices, the adaptability and need for a more personalized communication model have come to light. Due to the variety of environments and resource constraints, each device searches for the model that best suits its operation, while maintaining the privacy of its architecture. Therefore, an adaptation of the FL model is required to create a solution in which different model architectures can be connected and communicated via a global standard model.

While these heterogeneous problems increase the complexity of the communication network, the challenges have initiated the search for likely solutions. Therefore, these have become the main motivating factors that have led to the rise of personalized federated learning [7] [8]

Techniques Used in Federated Learning for Personalization

  • Meta-Learning

The concept of meta-learning is focused on increasing the adaptability of the local model by introducing the learning algorithm to a variety of functions [10]. Meta-learning offers several variations of data for the local model by improving the ability to train and adapt to new data. This increases the local model’s ability to train variations of the global model.

Some examples of meta-learning algorithms are Model-Agnostic Meta-Learning (MAML) [11] and Reptile [12]. They are quite popular for their fast data processing and swift adaptability. MAML lets data processing in two series: meta-training and meta-testing. While the training phase focuses on building the global model using a variety of tasks, the testing phase redesigns global learning for local models. MAML functions are based on the concept of federated averaging, with the Reptile algorithm playing a role similar to FedAvg during the meta-testing phase.

So the idea is to improve the accuracy of the global model so that it can be easily adjusted for the local endpoint enabling meta-learning to address personalization both at the core and at the end users

  • Transfer Learning [7] [8]

As the name says, this technique is based on transferring machine learning results from one source to another source. It allows the use of previously trained learning models and speeds up the process at the local end. With federated learning, the idea revolves around sharing the global model with edge devices, so they can customize the update at the local end.

The process is mainly carried out through two approaches.

In the first approach, the global model is first trained via the traditional FL approach, and the output is shared with the end-user devices. Post that, Each device then uses the global model and its custom data to create a local model. Only a few selected parameters are retrained with the local data to avoid training challenges. At the same time, the lower layers of the global model are transferred. and re-used.

The second approach divides learning into two layers: foundation and personalization. The foundation layer is common and trained collectively using traditional Federated Learning approaches. These Custom layers are trained locally with customer-specific data. Each end-user device trains these custom layers using the typical global model leading to improved incorporation of personalized learning techniques into FL models

  • Multi-Task Learning

This technique focuses on identifying the relationship between different local models during the training process. This learns specific tasks from multiple devices at the same time without compromising the privacy of the input data resulting in personalized models and allows each device to gain from other devices by training their initial models.

The central server handles the similarity of model parameters between different clients, allowing the end devices to update their models based on the identified data relationships. As a result, the issue of data heterogeneity is addressed along with the improved quality of personalization at the local end. MOCHA [13] is a standard algorithm for implementing multitasking learning

Conclusion

Personalization has become significant in the rapidly evolving market of digital communications. So, with the progressive idea of edge computing and FL, the idea of personalization evolved, which led to personalized federated learning. User experience can be improved by implementing the three major personalization stages; Initial, Training, and Personalization.

The key challenge for personalization remains the diversity of different stakeholders within a single network. This is called heterogeneity, which can be multi-fold in federated learning. Spread across variations of edge devices, data types, and model architectures; Heterogeneity creates diverse challenges. Having said that, these challenges are the important motivating factors to develop custom FL models

The standard mechanism that is implemented to achieve Personalized incorporates Meta-Learning, Transfer Learning, and Mult-Tasking learning. All the techniques focus on generating a personalized experience for the end device users without compromising on privacy and solving various heterogeneity issues. [7] [8]

References

[1] https://pwc.com/future-of-cx

[2] https://www.analyticsvidhya.com/blog/2021/05/federated-learning-a-beginners-guide/

[3] https://research.ibm.com/blog/what-is-federated-learning

[4] https://analyticsindiamag.com/distributed-machine-learning-vs-federated-learning- which-is-better/

[5] https://federated.withgoogle.com/

[6] https://research.aimultiple.com/federated-learning/

[7] Towards Personalized Federated Learning Alysa Ziying Tan, Han Yu∗, Lizhen Cui∗, and Qiang Yang∗, Fellow, IEEE

[8] https://blog.nimbleedge.ai/personalized-fl-101/

[9] S. P. Karimireddy, S. Kale, M. Mohri, S. J. Reddi, S. U. Stich, and A. T. Suresh, “SCAFFOLD: Stochastic Controlled Averaging for Federated Learning,” in ICML, 2020, pp. 5132–5143

[10] T. Hospedales, A. Antoniou, P. Micaelli, and A. Storkey, “Metalearning in neural networks: A survey,” IEEE TPAMI, no. 01, pp. 1–1, 2020

[11] C. Finn, P. Abbeel, and S. Levine, “Model-agnostic meta-learning for fast adaptation of deep networks,” in ICML, 2017, pp. 1126–1135

[12] A. Nichol, J. Achiam, and J. Schulman, “On First-Order Meta-Learning Algorithms,” arXiv:1803.02999, 2018

[13] V. Smith, C.-K. Chiang, M. Sanjabi, and A. Talwalkar, “Federated Multi-Task Learning,” in NeurIPS, vol. 30, 2017, pp. 4427–4437

--

--