Image classification using TFF

Brooke Joseph
7 min readDec 31, 2022

--

What is Federated learning?

Federated learning (FL) is a machine-learning technique that trains something called a global model using data from a bunch of decentralized devices. Okay, lots to unpack there. Let's just start by breaking this down step by step. In FL there are local devices (typically phones, computers, etc) that store and keep all their data on their device. There is also a global model that receives updated parameters, aggregates the updates and uses them to make a new model that is more efficient for users. The cool part, is the model is trained using data from a bunch of devices, without that data ever being shared! Crazy right?

https://i1.wp.com/softwareengineeringdaily.com/wp-content/uploads/2020/10/FederatedLearning.jpg?resize=730%2C389&ssl=1

The local devices use their local data to run the model on their device, and by doing this they update the parameters (weights and biases). These updated parameters are sent to the global model that aggregates them, which is a fancy word for consuming and organising them.

All of this is actually pretty new, Google came out with this model back in 2016. So there is a lot of potential for this technology. In this article I will be examining, in-depth, the code for image classification using TensorFlow Federated.

What is TensorFlow Federated (TFF)?

Tff is an open-source framework that allows decentralized data that is spread across many devices to be trained on. It essentially provides many useful tools to work with federated data and train federated models. Federated data is different from normal data used for AI models, because it comes from different sources, and there is a lot of information.

What tools does TFF provide?

Before we dive into the code, I wanted to do a brief overview of the “tools” that tff provides that make it perfect to train federated models.

  • tff.learning.build_federated_averaging_process: This function allows users to work with federated averaging (FedAvg). FedAvg is the most commonly used federated learning model. This is where the global model sends out the current parameters to the devices, the devices run the model and update the parameters, then send them back to the global model. The global model receives all these updates and averages them, updates the model, and then sends them back to the local devices. This is done until the model meets a certain criterion and this is called convergence.
  • tff.learning.build_federated_evaluation_process: This function allows programmers to evaluate the efficiency of the federated model. The evaluation process can be done using tff’s runtime system to compute specific metrics like accuracy, precision and recall
  • tff.learning.build_federated_sgd_process: This function is crucial in making the network efficient. It builds a process for training a model using federated stochastic gradient descent, similar to its use in backpropagation in Neural Networks.

I would also like to add that it would be helpful to have a little bit of a background in TensorFlow and python.

The process

Let's break down the process

  1. Loading the TFF libraries
  2. Preprocesses the data
  3. Create the model to train data
  4. Set up a federated averaging process
  5. Analyze metrics
  6. Set up the evaluation computations
  7. Analyze the evaluation metrics
import tensorflow as tf 
import tensorflow_federated as tff
from tensorflow_federated import paillier
import numpy as np

from tensorflow.python.keras.optimizer_v2 import gradient_descent
from tensorflow_federated import python as tff

These are just the imports needed to run the program. I would like to mention that in order to use tensorflow_federated you must be using python 3.9. I’m using visual studio to code, so I downloaded it to my computer, often people use google colab as well, however, they also use the older version of python.

NUM_EPOCHES = 5
BATCH_SIZE = 20
SHUFFLE_BUFFER = 500
NUM_CLIENTS = 3

Here we are defining the number of epochs we want to train the data. This means the number of times we want the model to go through the training dataset. We then define the batch size, which is pretty intuitive, it's the size of the data processed. We also pick how many clients we want to train on.

tf.compact.v1.enable_v2_behavior()
emnist_train, emnist_test = tff.simulation.datasets.emnist.load_data()
emnist data

The first line here is enabling version 2 of TensorFlow as this is necessary for the rest of the code to work. In the second line we’re defining the data we want to use, which is the emnist dataset. Note that within tff there are already some datasets that we can play around with. For this example, the emnist datasets are being used, which are some hand-drawn pictures of numbers, with labels that are 28 x 28 pixels. I would also like to note that this is a simulated enviroment, meaning all the data is avaliable to us centrally. This process, specfically the preprocessing of the data would look different for a real federated environment. I also provided a quick picture of what the emnist dataset looks like.

def preprocess(dataset):
def element_fn(element):
return collections.OrderedDict([
('x', tf.reshape(element['pixels'], [-1])),
('y', tf.reshape(element['label'], [1])),
])
return dataset.repeat(NUM_EPOCHES).map(element_fn).shuffle(
SHUFFLE_BUFFER).batch(BATCH_SIZE)

Here the data is being preprocessed and organised so it's better to work with. Start by defining the function preprocess. The preprocess function defines an inner function element_fn. The element_fn returns a new element from the reshaped dataset. I’m reshaping the pixel tensor to a one-dimensional tensor. A tensor of -1 means that we flattened it to a single dimension with size of 784 (28*28). The same thing is done for the label tensor. This is done because the most efficient way of feeding the data to the federated model is through a python list. This is because it's easier to work with and preprocess the data. It allows us to easily access and iterate the data.

def make_federated_data(client_data, client_ids):
return[preprocess(client_data.create_tf_dataset_for_client(x))
for x in client_ids]

Now it's time to feed this data into the tff simulation. The function make_federated_data is defined and it takes in the two arguments client_data and client_ids. The function is creating a list of datasets from the preprocessed function. It’s important to note that this function here will only work because the data is locally available and would be a little different if the data were decentralized.

sample_clients = emnist_train.client_ids[0: NUM_CLIENTS]

federated_train_data = make_federated_data(emnist_train, sample_clients)
print(f'Number of clinet datasets:{len(federated_train_data)}')
print(f'First dataset:{federated_train_data[0]}')

Now the clients are chosen. It's important to note that in a real federated environment, there would only be a select few devices that would be available to train a model at a given time and it would be random. For this example, all the data is centralized and can be chosen as you would like. The training data is picked from using the emnist training dataset and a sample of the clients.

def create_keras_model():
return tf.keras.models.Sequntial([
tf.keras.layers.Input.Layer(input_shapre=(784,)),
tf.keras.layers.Dense(10, kernel_initializer= 'zeros'),
tf.keras.layers.Softmax(),
])

Now it's time to create the Keras model. Keras is an API designed by Google. In short, API’s allow two devices to communicate and also act as a model to train the data. If you’re familiar with the structure of neural networks this should be pretty straightforward for you. There is an input layer, one dense layer and then the softmax, which will provide the prediction. For the input layer, there are 784 inputs, because, remember how the pixel tensor was squished and converted to a one-dimensional tensor. For the dense layer, there will be 10 nodes, and the kernel initializer is used to initialize the weights and distribute them.

def model_fn():
keras_model = create_keras_model()
return tff.learning.from_keras_model(
keras_model,
input_spec = preprocess_example_dataset.element_spec,
loss=tf.keras.losses.SparseCategoricalCrossentropy
metric=[tf.keras.metrics.SparseCategoricalAccuracy()])

To use any model with tff, you need to run it through the tff.learning.(Model name). This will allow working with federated data. Here the model that was previously defined is being passed. An example of what the data looks like is also provided to the model. This is represented by the input_spec. This is the final step to setting up the model.

iterative_process = tff.learning.build_federated_averaging_process(
client_optimizer_fn = lambda: tf.keras.optimizers.SDG(learning_rate=0.2),
)
server_optimizer_fn=lambda: tf.keras.optimizers.SDG(learning_rate=0.5)

Now it's time for running the FedAvg algorithm. This is exactly what I previously talked about at the beginning of the article. Stochastic gradient descent (SGD) is applied and a learning rate is defined. Note that there are two different optimizer functions defined here. There is one for the client and one for the server. The client optimizer is used for the local data on the client’s device. The server optimizer takes the averages of the updates and applies them to the global model.

state = iterative_process.initialize()
state,metrics = iterative_process.next(state, federated_train_data)
print('round 1, metrics={}'.format(metrics['train']))

Now that everything is set up for the training process, it's time to train the model. Here a single round of training is being run. The next represents a single round of FedAvg. Now that we get a basic idea of how to run a single round, let's run a couple more.

NUM_ROUNDS = 11 
for round_num in range(2,NUM_ROUNDS):
state, metrics = iterative_process.next(state,federated_train_data)
print('round{:2d}, metrics{}'.format(round_num,metrics['train']))

We can go through multiple rounds with the use of a for a loop. The code here is pretty much the same as before, however, now multiple sets of the data are being run through.

evaluation = tff.learning.build_federated_evaluation(model_fn)
shuffled_ids = emnist_test.client_ids.copy()
random.shuffle(shuffled_ids)
sample_clients - shuffled_ids[0:NUM_CLIENTS]

federated_test_data = make_federated_data(emnist_test, sample_clients)

len(federated_test_data), federated_test_data[0]

Finally, the evaluation process takes place on the model that was made. We start by running our model through the federated evaluation function. Then we run in the dataset we’ve been working with and randomly shuffle through them for evaluation.

Overall, this is the code necessary to use a federated model on the emnist dataset for image classification. Again this is just a simulated use case of federated learning and looks pretty different in a real environment. I highly suggest looking into flower’s framework for a more parctical real life example of this.

--

--

Brooke Joseph

My name is Brooke Joseph, I am a 18 year old girl who loves math, and her dog :)