Multi-Task Learning with Deep Neural Networks
Multi-Task learning is a subfield of machine learning where your goal is to perform multiple related tasks at the same time. The system learns to perform the two tasks simultaneously such that both the tasks help in learning the other task. This is a way to mimic human intelligence i.e how humans performs multiple tasks at same time. For example — say if you see a dog, you can distinguish that its a dog and not a cat and also almost instantly, you may guess the breed of the dog. Taking other example — lets say, you see a person, you may correctly identify his gender and also guess his approximate age without giving a second thought.
This all happens inside our complex brain with billions of neurons interacting and activating together to perform this complex task of classification and recognition. For years researchers have tried to mimic this approach in field of computer vision which led to creation of Neural Networks. With advancement of research works and magical capabilities of Neural Networks in performing any single task, its quite interesting to employ it for performing multiple tasks. When we want to perform near similar tasks such as predicting color and texture then multitask is way more helpful as it helps in resource and parameter sharing across tasks and also reduces training time for training two models separately.
In this blog post I would share steps about how to perform “Multi Task Learning in Deep Neural Networks”. I have used TensorFlow's Slim API for this task. Even if you are not aware of TensorFlow, then the general approach described here will be helpful to perform it in any other deep learning framework.
Steps for performing Multi Task Learning —
- Creating dataset
- Creating network architecture
- Defining multitask loss function
In order to perform this task lets take a simple problem statement. Lets say you want to predict type of a flower(Rose or Daisy) as well as its color(red, pink or white). This problem completely fits as Multi-Task Learning problem because we want to perform two things at same time using same network. At end of one inference we will have two classification results — type of flower and its color.
So, Lets begin :)
1) Creating dataset
For any training task the first and foremost requirement is Dataset. We need data to train our neural network. Since this is supervised learning task, our data would also include correct labels for each image.
For this task we would be needing images with two labels — type of flower and its color. I scraped google images for Rose and Daisy, with color red, pink and white. At the end, I had 3200 images of Roses and Daisies with three different colors.
After this step we need to split this dataset into training, validation and test sets. I chose to keep 60% of images in training set, 25% in validation and remaining 15% for test set. I created three separate texts file and stored three information — image path, flower type and color label. Flower label 0 will be associated to Rose and 1 to Daisy and color label would be 0, 1 and 2 for color red, pink and white respectively.
So at end my train.txt, val.txt and test.txt would look like this —
/data/img1.jpg, 0, 1
/data/img2.jpg, 1, 2
/data/img3.jpg, 1, 2
/data/img4.jpg, 0, 0
/data/img5.jpg, 0, 1
/data/img6.jpg, 1, 1
Once we have these files we move to next step i.e creating network architecture
2) Creating network architecture
Before defining network architecture one must be able to visualize it and our visualization would look something like this —
In the figure above we have hidden nodes and we call it as “Shared layer” because weights of these layers are common to both the tasks. Then we have “Task specific layer” where computation related to each task is carried out separately without sharing the parameters across the layers. In these task specific layers, the network learns information specific to a task. Each of these separate tasks layers produces separate outputs. In our case one predicts flower type Rose or Daisy, and other predicts flower color red, pink or white.
For defining this architecture I defined SqueezeNet_v1.1 architecture in Tensorflow and modified last layers to perform multi-task learning. I used layers till fire8 as shared layers and applied split at fire8 and replicated remaining layers in two for two tasks. So my final network layers looks like this —
net = fire_module(net, 48, 192, scope='fire7')
net = fire_module(net, 64, 256, scope='fire8')net = slim.max_pool2d(net, [3, 3], stride=2, scope='maxpool8')
## splitting network here
## ------------------ Network 1 - flower classification ------------
type_net = fire_module(net, 64, 256, scope='type_fire9')
type_net = slim.conv2d(type_net, 2, [1, 1], stride=1, padding=”VALID”, scope='type_conv10')
type_net = slim.avg_pool2d(type_net, [10, 10], padding="VALID", scope='type_avgpool10')
flower_type = tf.squeeze(type_net, [1, 2], name='flower_type')
## ------------------ Network 2 - color classification -------------
color_net = fire_module(net, 64, 256, scope='color_fire9')
color_net = slim.conv2d(color_net, 3, [1, 1], stride=1, padding=”VALID”, scope='color_conv10')
color_net = slim.avg_pool2d(color_net, [10, 10], scope='color_avgpool10')
flower_color = tf.squeeze(color_type, [1, 2], name='color_type')
Creating multi-task network is easy. I used Squeezenet as base model here because I was already familiar with its architecture and network parameters and knew its performance. So splitting it and training a network was fast and provided an added advantage.
3) Defining multitask loss function
Before actually starting to train our network, we need to define loss function. Our loss function for the tasks would look like this —
flower_type_loss = slim.losses.softmax_cross_entropy(predicted_flower_type, original_flower_type)
flower_color_loss = slim.losses.softmax_cross_entropy(predicted_flower_color, original_flower_color)
Now after defining loss functions for these tasks separately we need to optimize this in order to train our network. There is two way to do this —
- Define loss function for two tasks and optimize them separately.
- Define loss function for two tasks and optimize them jointly.
First way is suited when you want to perform alternate training and have batch of task 1 data and batch of task 2 data and you train them alternately. In it you alternately call the optimizer and optimize the network.
Second is more suited way if you want to perform the learning at same time. You simply add the loss and optimize this joint loss. This preserves separate task specific loss functions and performs training at same time. I wanted to train the network jointly at same time so I used this method.
train = optimizer.minimize(total_loss)
Now instead of optimizing both loss separately we optimize one single joint loss. We define our optimizer function which is responsible for minimizing total_loss.
Once we have defined our network architecture, its time to train. Remember we have created text files for train, validation and test sets. So our first task would be to read the files and get the information about images and labels into our network. Once its done we are all set to start our Multi Task Training.
In this blog post, we went through way of performing multi-task learning with deep neural networks using very simple problem. It can be generalized to more complex problems like recognizing facial expressions and determining person face attributes. Multi-task learning is helpful when you want to perform similar tasks using same network. This not only reduces training time but also reduces inference time as compared with performing prediction with two different models. Best use of such kind of models is in mobile devices where we need to optimize it for run time memory, battery utilization and CPU usage.
This is my first machine learning blog and I hope it was helpful. For any feedback and suggestions please feel free to comment or reach out to me at my email or find me on LinkedIn. I would love to hear from you!