Secure Aggregation with Flower

Brooke Joseph
7 min readFeb 20, 2023

--

What is Flower?

Flower is an open-source framework that allows experimenting with federated models. I would first like to mention if you’re unfamiliar with what federated learning is, make sure to check out my article or my video before reading the rest :)

Photo by Max Duzij on Unsplash

Why do we need secure aggregation?

A major reason why Federated Learning is so revolutionary is that it allows clients with sensitive data to not have to worry about their data being handled by someone else. It has a lot of benefits when it comes to privacy. However, the updates from the clients might contain privacy-sensitive data. There are texts on models and clients may not want to share that with the server. Therefore, it's important that the updates themselves are masked in a way that will ensure the server is not receiving sensitive data. This is where the researcher realized the importance of securing the aggregation process. Which means securing the transmission of the updates from the client to the server. There have been many solutions proposed to solve this issue.

However, many of the current solutions are not very efficient because they typically are transmitting a lot of data and they fail when a client drops out of the process. Dropped clients refer to clients who drops out of the federated learning process. In order for a device to be able to go through this process the device must not be in use. There have been many solutions that are currently being researched and tested. In this article, I focus on paramilitary on SecAgg+ and LightSecAgg. I will also explain the general outline of what the code looks like using flowers.

SecAgg+

SecAgg+ is a more updated and improved version of SecAgg. Both act as protocols for securing the aggregation process in federated learning. Currently, methods used to secure the aggregation process have not effectively addressed what happens when clients drop out. This is because all the calculations from a dropped client are needed for aggregation to be accurate. Research is going into seeing how to preserve the calculations from a device that has dropped out in the middle of the process. It’s also crucial that the server itself doesn’t learn any of the updated parameters sent. Taking all of this into consideration, protocols have been adopted, and extensive research is still going into developing better and better protocols. The main way that this has been combatted is by

  • pairwise random-seed agreements between users. The clients will generate randomized masks that hide their models.
  • random-seed to enable the reconstruction and cancellations of masks when a client has to drop out.

SecAgg works by masking the local models by using random keys. A more fancy way of putting it is that the model's privacy is kept through a process called pairwise random masking. Through something called a key agreement, pairs of users will “agree” on a random seed. The idea here is that the clients make these masking vectors in a way that each client has a mask for every client. The pairs of the masking vectors that are created will sum to zero so that when they are sent to the model they will cancel out. This allows the clients to send a masked model, while also allowing the server to make an updated model from the appropriate values. To help better understand this I have made a youtube video with helpful visuals.

The biggest problem here is that the masking vectors were way too big for this process to be efficient and the solution didn’t address what would happen if clients dropped out. This is when SecAgg + came out. One of the main differences between SecAgg + and SecAgg was the number of clients that carried the shared vectors. Rather than every client receiving the other client's masked vector, a client would only share their masked vector with a select few clients.

The Diffie-Hellman Key agreement is also used here. Each client will have a private key and each will raise that value to a number. They then broadcast those values to the server. The server sorts through the values and then sends each client every value besides their own. The clients then raise that value with their public key as well. These values are scalar values that will be used to run the pseudorandom number generator. This can be done by plugging in the value we just talked about into this function. This will give us a very large vector. The first part described here effectively solves the issue with efficiency, by dealing with smaller values. Now, something called k-out-of-n Threshold secret sharing is used to address the issues of dropped clients. The idea behind the solution here is to find a value that would fix the problem of the vectors not cancelling out when sent to the server.

This can be done by coming up with a polynomial of degree k-1 and having the y-intercept being the value that is the missing mask. Now we want to be able to find that missing value. So, each client will create their own unique polynomials and plot points on that polynomials. They then share those values with the other clients. So now if a client drops out, the server will go back to the clients that are still there for those plotted values of the missing client. The server will then estimate the missing value by finding the y-intercept.

However, this preprocess does attempt to address each problem previously stated, but it has been found to be inefficient. It also doesn’t address clients that are delyaed in sending their updates. LightSecAgg has been found to be much more effective and efficient. We will get into that later though.

We can breakdown the overall process into 5 stages

  1. Step-up parameters: Server sends out the parameters to clients
  2. Ask Keys: As I previously mentioned each of the clients will create private and public keys. Only sharing the public key with the server.
  3. Share Keys: Each of the clients create randomly generated seeds and shares those seeds with a certain number of clients.
  4. Ask Vectors: Then the clients will now create the mask vectors for their models, using the private and randomly generated seed. It then sends that masked vector of the model to the server.
  5. Unmask Vectors: The clients are then asked to share the seeds they received from another client with the server so that the server can unmask the models and aggregate them.

Okay so let’s walk through some generalized code that flower provides on their website.

class SecAggPlusProtocol(ABC):

@abstractmethod
def generate_graph(
self, clients: List[ClientProxy], k: int
) -> ClientGraph:

Here the code is defining the class SecAggPlusProtocol, which is simply the type of secure aggregation process being used. The code is also making a list of the clients that will work with one another in the secret-sharing process. It’s telling us that each client will be sharing their masks with k clients.

@abstractmethod
def setup_config(
self, clients: List[ClientProxy], config_dict: Dict[str, Scalar]
) -> SetupConfigResultsAndFailures:

Here the code is very similar because another round of securing the aggregation is occurring. The code is basically making a list of clients that will be sharing the seeds with one another and further configuring them, which just means organizing.

@abstractmethod
def ask_keys(
self,
clients: List[ClientProxy], ask_keys_ins_list: List[AskKeysIns]
) -> AskKeysResultsAndFailures:

Here the private and public keys are being made for the models. The function runs through the list of clients that were previously made to give each of them their unique key.

@abstractmethod
def share_keys(
self,
clients: List[ClientProxy], public_keys_dict: Dict[int, AskKeysRes],
graph: ClientGraph
) -> ShareKeysResultsAndFailures:

Now that the keys were made, they need to be shared. Only the public keys are being distributed and shared in the same way that I outlined at the beginning of the aritcle.

@abstractmethod
def ask_vectors(
clients: List[ClientProxy],
forward_packet_list_dict: Dict[int, List[ShareKeysPacket]],
client_instructions=None: Dict[int, FitIns]
) -> AskVectorsResultsAndFailures:

The client is now asking for the masked vectors, from the list of clients created.

@abstractmethod
def unmask_vectors(
clients: List[ClientProxy],
dropout_clients: List[ClientProxy],
graph: ClientGraph
) -> UnmaskVectorsResultsAndFailures:

Finally, the models are unmasked, and the code is also searching through looking for any of the clients that dropped out. They will be organized in a list.

All of this code is generalized and will look different depending on what exactly you need to do. However, this will provide a basic framework as to how this can be done.

LightSecAgg

Now that we’ve talked about SecAgg in-depth, let's talk a little more about the new theoretically better way of securing the aggregation process.n There are several reasons why LightSecAgg is a much better upgrade than the previous forms of securing aggregation. The main reason why it is much better than SecAgg is that it has a much more robust way of dealing with clients that have dropped out. It completely eliminates the need to reconstruct each seed that was lost from a dropped device.

Overall, this is still a very new and researched topic that looks different depending on what framework is being used. In addition, some people are using combinations of various protocols, rather than just focusing on one.

--

--

Brooke Joseph

My name is Brooke Joseph, I am a 18 year old girl who loves math, and her dog :)