Taming the Hydra: How to Create a Data Pipeline for Multi-Head Classification with Tensorflow

Gernot Mueller
Jumio Engineering & Data Science
8 min readJul 6, 2022

Unleash the power of the tf.data.Dataset API

We at Jumio are always seeking to improve efficiency and performance of our machine learning models. The data pipeline for training machine learning models is a vital component of the training procedure. Multi-Core CPUs and GPU technology speed up the backpropagation, but without an efficient data pipeline, providing the data fast enough, training remains slow.

For our models running on edge devices we chose Tensorflow as our training framework because of the simplicity of deploying those models on the devices. The most efficient way to implement a data pipeline in Tensorflow is to use the tf.data.Dataset API. In our development process we found little information on implementing such a pipeline to support multi-head models.

In this guide we will show step-by-step on how to do it and share some of our discoveries along the way. We use the 11k Hands Dataset which offers several different attributes/features of human hands (skin color, gender, orientation) as an example of how to classify those categorical features with a multi-head model.

Source Code

I used Google Colab to write the code for this tutorial. Colab, or “Colaboratory”, allows you to write and execute Python in your browser, with

  • Zero configuration required
  • Access to GPUs free of charge
  • Easy sharing

Dataset preparation

We employed the 11k Hands dataset. The following example show several reference images and illustrate some of the dataset features.

pictures of hands
Some example images from the Hands11k dataset

Download and unzip the dataset to a local folder or Google Drive and load the CSV using pandas. Make sure to add the full image path as a column to the data frame for later use when loading the images in the pipeline.

This is how the data looks after importing it with the above code. This illustrates several different features like gender, skinColor, accessories etc…

Hands11k dataset after preprocessing it

One-hot encoder

We will choose two categorical features from our dataset as an example and want to create a model which can classify all of them at once. Start by defining the feature columns to indicate which are necessary for model training. You can expand this to use more features as needed.

feature_columns = ['aspectOfHand', 'accessories']

The next step is to create one-hot labels for each column. We chose sklearn’s MultiLabelBinarizer to accomplish this. It supports generating both single-labels and multi-labels. Our sample only requires single-labeling so LabelBinarizer, which doesn’t support multi-labeling, is another alternative. The Binarizer converts the categorical features to a 0–1 representation and can also convert the 0–1 representation back into the categorical string name.

The create_mlb function creates our Binarizer objects. We then save the one-hot encoded values for each sample as a column in our data frame.

Here is the output from the above code snippet.

make mlb for feature: accessories
Classes: ['accessory' 'no_accessory']
make mlb for feature: aspectOfHand
Classes: ['dorsal left' 'dorsal right' 'palmar left' 'palmar right']
y_columns: {'accessories': 'one_hot_accessories',
'aspectOfHand': 'one_hot_aspectOfHand'}

The y_columns dictionary provides the mapping from string features to one-hot encoded features. We use that dictionary later in our code.

These are our new one-hot encoded feature columns in the data frame:

Alternative

Another way to create one-hot representations of categorical labels is to incorporate the conversion inside the model using the Category Encoder layer in Keras. This requires first converting your string labels to integer labels.

Create tf.data.Dataset pipeline

Next we build a tf.data.Dataset to provide the image data and all labels for training a multi-head output model. Using tf.data.Dataset is the preferred way to create data pipelines in Keras/Tensorflow as of version 2.x. The official tensorflow documentation shows why these pipelines are so efficient. This blog post demonstrates their increased processing speed and flexibility over the baseline Keras data generators. Information on building a tf.data.Dataset pipeline for a multi-head model is sparse. The solution we adopted is described in this reference.

Load Images

The first step in our pipeline is to tell tf.data.Datasethow to load images from the 11k Hands dataset. We apply the method from_tensor_slices() to our image_path data frame column. Next we run the load_image function on this dataset to read, decode, and optionally resize the images. The resulting tf_dataset will contain all images in the 11k Hands dataset.

Lets try it out:

We can iterate over tf_image_dataset and display the images using matplotlib so we can view our resized hand images.

resulting image plot of matplotlib

Data Augmentation

We use the tf.numpy_function to add data augmentations with Albumentations as a Tensorflow operation. Albumentations is a Python library for fast and flexible image augmentations. It efficiently implements a rich variety of image transform operations. See lines 5–15 below for how we define our Albumentations image transformations. You can also refer to the Albumentations documentation for additional details on this usage.

This is how our augmented images look like:

Image Normalization

Normalization in machine learning means to modify the data so that the mean is zero and the standard deviation is one. It helps to decrease the risk of exploding/vanishing gradients during model training.

To normalize the image data we convert the RGB values in a range roughly between -1 and 1. Since the model’s initial weights are based on imagenet we will use the mean and standard deviation parameters per channel from imagenet’s calculated coefficients.

Create tf.data.Dataset for each label column

Remember we previously defined y_columns as:

y_columns: {'accessories': 'one_hot_accessories', 'aspectOfHand': 'one_hot_aspectOfHand'}

This is where the magic happens in creating a tf.data.Dataset pipeline which supports training a multi-head model. We create a dictionary where each key is one of our feature columns we want to train on and the value of the dictionary is the corresponding tf.data.Dataset. This means for each feature (accessories, aspectOfHand) we create a dataset. We then simply “Zip” together the image dataset we created in the last step with these “feature” datasets (dictionary).

If we print the resulting tf.data.Dataset structure, we see that it is a combination of the image dataset and the feature datasets. We can now compare the tensor shapes to check if they are correct:

<PrefetchDataset element_spec=(TensorSpec(shape=(None, 224, 224, 3), dtype=tf.float32, name=None), 
{'accessories': TensorSpec(shape=(None, 2), dtype=tf.int32, name=None),
'aspectOfHand': TensorSpec(shape=(None, 4), dtype=tf.int32, name=None)}

The dataset shows that the tensor shape for images is (224, 224, 3) which matches our intended model input image dimensions. The feature “accessories” has a tensor shape of (None, 2) which corresponds to the two possible class labels (accessory, no accessory). The feature “aspectOfHand” has a tensor shape of (None, 4) which corresponds to the four possible hand orientations. Check the one-hot encoder section to see the class labels/distributions again.

We implemented the different pipeline pieces, now its time to put them together into one function which we use later to get a training and validation data set.

Create Model

Next we create the model. We use MobileNetV2 as a backbone and attach our multiple classification heads on top of it. The backbone parameters are shown in the code as a dictionary which we pass to the MobileNetV2 Keras class for instantiation. You could use any other backbone architecture, custom or pre-built from Keras

Lines 20–23 are the key lines where we create a classification head for each feature. We use the MultiLabelBinarizer obejctmlb to get information about the number of classes and feature names. Every classification head has the name of the feature it predicts. This is important in the next steps of compiling and fitting the model.

The model summary looks like this now. You can see that each feature has its own head which is connected with the flattened layer output of the MobileNetV2.

Model: "model"
____________________________________________________________________
Layer (type) Output Shape Param # Connected to
====================================================================
image (InputLayer) [(None, 224, 224, 3 0 [])]

mobilenetv2_0.35_224 (None, 1280) 410208 image

flatten (Flatten) (None, 1280) 0 mobilenetv2_0.35_224

accessories (Dense) (None, 2) 2562 flatten

aspectOfHand (Dense) (None, 4) 5124 flatten

====================================================================
Total params: 417,894
Trainable params: 7,686
Non-trainable params: 410,208
____________________________________________________________________

Compile and Fit Model

Now we are ready to compile the model. For this we need to define a loss function for each classification head. Since we have the same type of classification head for all features we can use the Categorical Crossentropy.

In order to specify the loss function for each head we use a python dictionary. The same goes for the metrics. We choose to apply Categorical Accuracy as our metric.

In the code snippet you can see the resulting loss functions for each classification head in a dictionary. For compilation we use the Adam optimizer with a standard learning rate.

Finally, we will use our previously defined tf_datafunction to set up the data pipeline for training and validation. For training we pass the img_augmentation function so that our training images get augmented, for validation we don't. We add a tensorboard callback to log the metrics so we can see the progress inside tensorboard later.

While the training is running we can watch the progress. For each epoch the metrics we defined are printed in the log. We can track the loss and accuracy values for each classification head separately.

Model Evaluation

After model training has finished we can run Tensorboard within our colab script to look at our training progress and the validation metrics.

# Load the TensorBoard notebook extension
%load_ext tensorboard
%tensorboard --logdir /content/drive/logs/20220504-121019
left: loss per epoch for each head / right: accuracy per epoch for each head

The blue lines show the validation accuracy/loss and orange lines show the training accuracy/loss. The reason why validation accuracy is higher than training accuracy is simply because for training we use data augmentation which makes it harder for the model to learn and thus it is lower.

We see that the model learned to predict the features but it could use a longer training time. However, the model seems to converge. We will leave it at that because model optimization is not the focus.

Final thoughts

tf.data.Dataset are a very powerful way of implementing a data pipeline for Tensorflow. With this article we just scratched the surface of what is possible.

There is a built-in profiler in tensorboard to analyze the efficiency of your data pipeline. It also gives you recommendations on how to improve your pipeline in terms of efficiency and speed.

Here is a reference to the profiler and how to use it. It is part of the Tensorflow documentation and gives a great overview of how to implement a performance input pipeline with tf.data.Dataset.

--

--

Gernot Mueller
Jumio Engineering & Data Science

“Learning never exhausts the mind” — Jumio let’s me learn new things every day. Thanks, I love it!