Federated Learning Aggregate Method (1): FedSGD v.s. FedAVG

FedSGD v.s. FedAVG

Disassembly
Published in
2 min readJul 13, 2020

--

FedSGD v.s. FedAVG

FedSGD It is the baseline of the federated learning.
A randomly selected client that has n training data samples in federated learning ≈ A randomly selected sample in traditional deep learning.

There are two approach:

  1. client compute the gradient and send to server.
  2. client compute the gradient, update the model and send back to server.

If we update the model multiple time before we send the model back to the server for aggregation, this method is called FedAVG.

model average v.s. gradient average

Gradient guaranteed the convergence, while model average can’t.
Gradient information has heavy communication since it have to communicate every iteration on client, while the model average has light communication cost.

In extreme case, in which model parameters are update in each model update carried out the locally at the participants and the participant all start with the same initial model weight, the model average is the same as the gradient average.

Isn't it a naive method to average the model weight?

Yes, it is. It will have poorer performance. Therefore, there some method layer-wise aggregation such as FedMA, PFNM. You can find the paper here.

and here

I will find times to write some detial about it.

Want to know more about federated learning?

Architecture of three Federated learning

Federated Learning Aggregate Method (2)(Working on it)

reference

https://inst.eecs.berkeley.edu/~cs294-163/fa19/slides/federated-learning.pdf

https://arxiv.org/pdf/1602.05629.pdf

https://ieeexplore.ieee.org/document/8940936

--

--