
…his operation (accumulate) must be done at FP32 to preserve accuracy and is then converted to FP16. The accumulate operation performs at half speed on RTX 2080 and RTX 2080 Ti, but on full-rate on the Titan RTX. In practice, this makes the Titan RTX perform 10% to 20% faster where Tensor Cores are utilized.
The key factor for federated learning is to preserve the privacy associated with data. It appears that even when the actual data is not exposed, the repeated model weight updates can be exploited to reveal properties not global to the data but specific to individual contributors.