The benefits of multi-tasking !

What is multi task learning ?

Published in

Better ML

3 min readJun 2, 2022

In machine learning, the general objective is to learn one model for a task given the dataset corresponding to that task. This can be seen as single task learning. If you learn a single model jointly for multiple tasks it can be termed as multi task learning (MTL).
For example, let’s say you are building a content classifier to detect emotions : laugh, hate, anger, love, … You can learn a single multi task multi label model instead of learning one model per emotion.

Commonly used : In hard parameter sharing where different heads (each for a task) share the hidden layers.

In soft parameter sharing on the other hand, each task has its own model with its own parameters. The distance between the parameters of the model is then regularized in order to encourage the parameters to be similar.

Reduce overfitting : In both hard and soft parameter sharing, the more tasks we are learning simultaneously, the more MTL model has to favor a representation that captures the objective of all of the tasks. This reduces the chances of overfitting quite a lot.
Implicit data augmentation : MTL effectively increases the sample size that we are using for training our model. As all tasks are at least somewhat noisy, when training a model on some task A, our aim is to learn a good representation for task A that ideally ignores the data-dependent noise and generalizes well. As different tasks have different noise patterns, a model that learns two tasks simultaneously is able to learn a more general representation.
Focus on relevant features : If a task is very noisy or data is limited and high-dimensional, it can be difficult for a model to differentiate between relevant and irrelevant features. MTL can help the model focus its attention on those features that actually matter as other tasks will provide additional evidence for the relevance or irrelevance of those features.

One vs many model : Instead of building and maintaining multiple models you maintain only one. This leads to reduced maintenance cost.
Foundational models : Often MTL is used as the learning paradigm for foundational pre-trained models. For example, a MTL model on content understanding for integrity applications will give you multiple signals : spam, hate, low quality, clickbait etc. You can use these signals in various downstream classifier/products.

Despite of all advantages pointed out, MTL models add a layer of complexity over single task learning (STL). The complexity is added to data reading layer, training, evaluating and serving (serve one head vs all heads and different tasks may have different cost models).
Hence, MTL should be carefully evaluated w.r.t STLs on primary quality metric and other dimensions such as serving cost.

MTL is a powerful learning paradigm which exploits similarity of single tasks to jointly learn a model capable of providing quality predictions for all the tasks.
MTL has clear advantage of reducing overfitting by learning a more robust representation.
Do compare STLs vs MTL to determine if the ROI is worth the added complexity.