Tackling Non-independent and identically distributed data in Federated Learning

8 min readNov 7, 2022

[Source: https://miro.medium.com/max/1400/1*Bbgj7VNT-01KD3U2lWXDgw.png]

Main Motivation behind Federated Learning

We’ve all seen how useful crowdsourcing can be. Customers may report all kinds of driving circumstances on mobile applications like Waze, including automobile accidents, traffic jams, police speed traps, cars stopped on the side of the road, etc. In turn, other platform users can benefit from such cooperation to make wiser driving choices. As a basic illustration, Waze can select an alternative route to get me to my destination if there is a significant traffic delay on a particular road.

What if we could use the same characteristics of crowdsourcing to tackle significant deep learning and machine learning problems?

Imagine, for example, if we could teach machine learning systems using all the data that is now available from hospitals throughout the world. This partnership would undoubtedly help applications like melanoma or breast cancer diagnostics. The same justification applies to insurance and banking firms that could use private information to create more accurate prediction models. However, another crucial issue — privacy — is what is really standing in the way of us working together to improve healthcare.

You and I should be able to give our data to these companies to get the benefits of deep learning models; but without actually giving them the data.

These are the primary motives behind Federated Learning.

But Federated Learning has its own set of challenges a few of which will be discussed in the next section.

Challenges in Federated Learning

If training is done on heterogeneous devices without data being shared, the challenges are mainly because of Non-IID data and the Unreliability of devices.

Non-IID data: Data is not independent and identically distributed across nodes as in distributed learning. The training performance may vary significantly according to the unbalancedness of local data samples as well as the probability distribution of the training examples at each node.

Unreliable devices: Devices can vary from small IoT EDGE devices to phones and sometimes servers. All have varying computing, storage, and network connectivity. So the training process has to be robust enough for all the failures and limitations of the nodes in the network.

In this write-up, we will be looking at two papers that tackle the Non-IID data issue from two different angles. This write-up is a detailed view of a presentation that me and my teammate Sandeep did for our graduate course. Check out the tackling Unreliable devices issue in the Federated Learning part here.

Model-contrastive learning (MOON)

Model-contrastive learning (MOON)[1], tackles Non-IID Data distribution by correcting the local updates by maximizing the agreement of representation learned by the current local model and the representation learned by the global model. One of the main challenges in federated learning is non-identically distributed data. Because of the heterogeneity of Data (non-IID data) in Federated Learning, When each client node updates its local model, its local objective may be far from the global objective. Thus, the averaged global model which is obtained by averaging the weights of the client nodes (used in vanilla federated learning model algorithm) is away from the global optima.

MOON[1] tries to decrease the distance between the representation learned by the local model and the representation learned by the global model, and increase the distance between the representation learned by the local model and the representation learned by the local model in the previous iteration. MOON increases the distance between representation/weights learned in the current iteration to the previous iteration in the local model because every local node has non-IID data and weights learned from these non-IID data hampers the model performance.

Fig: a and Fig:b shows a visual representation of how weight updates are done in Model contrastive Learning. The equation below shows how model contrastive loss is calculated.

Now moving into technical details of how model contrastive loss is calculated at each local node. As shown in Fig a, the output of the global model at the end of the projection head (before the softmax layer) which gives us a feature vector on an image is calculated. Similarly, At the client node, the client nodes it receives the parameters from the global model, and updates its parameters based on the data it has and the output of the updated local model at the end of the projection head (before the softmax layer) which gives us a feature vector on an image is calculated. Once we get the feature vectors both from the local model(Zglob) and the updated local model Zlocal, we try to maximize the similarity between these representations (feature vectors). To maximize the similarity between these representations, we introduce a loss function called model contrastive loss, which calculates the negative log of the exponential cosine similarity between the feature vector obtained from the global model and the feature vector obtained from the local model. The higher the value of the contrastive loss, the bigger the difference between representations of the global model and the local model. So, we add this loss term to the supervised learning loss used for classification and minimize this contrastive loss over iterations using gradient descent.

MOON achieves about 7% higher accuracy on CIFAR-10, CIFAR-100, and Tiny-Imagenet datasets when compared with naive federated learning algorithms like FedAvg and FedProx.

FairFed: Enabling Group Fairness in Federated Learning

Paper[2] discusses a novel algorithm called FairFed to mitigate the potential bias against certain populations via a fairness-aware aggregation method of weights, aiming to provide fair model performance across different sensitive groups (e.g., racial, and gender groups) while maintaining high utility.

In sensitive machine learning applications such as health care, and loan assessment, a data sample often contains private and sensitive demographic information that can lead to discriminatory models biased on sensitive attributes (like gender or race). In particular, we assume that each data point is associated with a sensitive binary attribute A, such as gender or race. For a model with a binary output Ŷ(θ, x), (E.g The Loan assessment model) the fairness is evaluated with respect to how it performs compared to the underlying groups defined by sensitive attribute A.

Several works have been published to measure the model bias based on sensitive demographic information, and one of the most commonly used metrics to evaluate the bias of the model is Equal Opportunity Difference. To explain the Equal Opportunity metric, (difference between probabilities of true positive rate conditioned on values of sensitive attribute)The model is considered fair from the equal opportunity perspective if the true positive rate is independent of the sensitive attribute A.

For Centralized Learning, To calculate the bias of a model we use the below metric:

Equal Opportunity Difference (EOD):

EOD = Pr(Ŷ = 1|A = 0, Y = 1) − Pr(Ŷ = 1|A = 1, Y = 1).

For Federated Learning:

FGlobal = Pr(Ŷ = 1|A = 0, Y = 1) − Pr(Ŷ = 1|A = 1, Y = 1)

FK = Pr(Ŷ = 1|A = 0, Y = 1, C = k) − Pr(Ŷ = 1|A = 1, Y = 1, C = k)

But the problem here is to calculate the EOD without sharing the local data

Computing Global EOD metrics (without sharing the local dataset):

Equations for Computing Global EOD without sharing data

the global EOD metric Fglobal can be computed by aggregating the values of mglobal, k from the K clients.

Note that the conditional distributions in the definition of mglobal, k above are local performance metrics that can easily be computed by client k using its local dataset Dk.

The only non-local terms in mglobal, k are the full dataset statistics S = {Pr(Y = 1, A = 0), Pr(Y = 1, A = 1)}.

These statistics S can be aggregated at the server using a single round of a secure aggregation scheme at the start of training and then shared with the K participating clients to enable them to compute their global fairness component mglobal, k.

Fairness-aware aggregation method of the Parameters:

Equations for weighting the model based on EOD metric

In the above equation for the Kth client node, we assign the weight ωk based on the difference between the global fairness metric Fglobal and the local fairness metric Fk: where β is a parameter that controls the fairness budget, which controls the trade-off between model utility and fairness. Higher values of β result in fairness metrics having a higher impact on the model optimization.

For a highly heterogeneously distributed dataset, FairFed improved the EOD in Adult and COMPAS datasets by 87% and 13%, respectively, at the cost of an accuracy reduction(only a 1% and 2.5% decrease).

Observations:

To tackle the problem of non-IID data, we observed that both FairFed and MOON minimize the difference between one of the global model and client model attributes. In MOON Model, they tried to minimize the differences between representations of the global and client model, whereas, in the FairFed model, they tried to minimize the Equal Opportunity Difference between the global and client models.

Knowledge distillation across servers and clients seems to be a recurring theme in tackling non-IID data, fairness.

Conclusion:

Federated Learning currently has 2 main challenges, one is non-IID data and the other is unreliable devices in the network. The papers we discussed to address these challenges effectively. We observed that Non-IID data issues can be solved by equalizing the feature distributions while training.

There are several potentials for federated learning. Not only does it protect sensitive user data, but it also collects data from many users, searches for common patterns, and strengthens the model over time.

It develops itself based on user data, safeguards it, and then reemerges as a wiser individual who is once again prepared to put itself to the test with its own users! Testing and training got smarter!

Federated Learning ushered in a new age of protected AI, whether it be in training, testing, or information privacy.

The design and implementation of federated learning still present many difficulties because it is still in its infancy. A good way to tackle this challenge is by defining the Federated Learning problem and designing a data pipeline such that it can be properly productized.

References:

[1] Li, Qinbin, Bingsheng He, and Dawn Song. “Model-contrastive federated learning.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021.

[2] Ezzeldin, Yahya H., et al. “FairFed: Enabling Group Fairness in Federated Learning.”