A Primer on Multi-task Learning — Part 2

Neeraj varshney
Oct 9 · 3 min read

Towards building a “Generalist” model

This is part 2 of the article series on Multi-task learning (MTL) and covers the basic approaches for MTL.

Part 1 of this article series is available here and gives an introduction to Multi-task learning.

Outline Part 2:

  1. Approaches for Multi-task learning
    — Hard Parameter Sharing
    — Soft Parameter Sharing
  2. Basic Training Strategies for Multi-task Learning
    — Instance Sampling Approaches
    — Epoch Sampling Approaches

Part 3 of this article series is now available here.

Approaches for Multi-task Learning

In this section, we will look at the common ways to perform multi-task learning in deep neural networks.

Hard Parameter Sharing

Figure 1: Typical architecture for hard parameter sharing of hidden layers in MTL.

In the hard parameter sharing approach, the model shares the hidden layers across all tasks and keeps a few task-specific layers to specialize in each task.

Soft Parameter Sharing

Figure 2: Typical architecture for soft parameter sharing of hidden layers in MTL.

In the soft parameter sharing approach, each task has its own set of parameters. These task-specific layers are then regularized during training to reduce the differences between shared layers. This encourages layers to have similar weights but allows each task to specialize in specific components.

Basic Training Strategies for Multi-task Learning

In this section, we will go over the basic training strategies for the MTL problem where the output space for all tasks is the same.

Instance Sampling Approaches:

To determine the number of instances to draw from each dataset for each epoch

  1. Uniform
    — Uniformly sample instances for each task.
    — Number of instances for a task is bottlenecked by the task having the smallest dataset.
    — Tasks with large datasets suffer constrained learning as they fail to use the entire dataset for training.
  2. Size-dependent
    — Sample instances in proportion to their dataset size.
    — Favors tasks with large datasets.
    — This can result is underfitting the tasks with small datasets and overfitting the tasks with larger datasets.
  3. Uniform → Size
    — Uniformly for the first half of training and based on dataset size for the second half.
  4. Dynamic
    —Sample instances based on the gap between the performance in the current epoch and the performance of the single-task model.
    — The number of instances sampled for each task changes after every epoch as it samples from instances for tasks that need training (large performance gap from the single-task counterpart model) and fewer instances for tasks that have converged (marginal performance gap from the single-task counterpart model)

Epoch Sampling Approaches:

To determine the order of instances within an epoch

  1. Partitioned Batches
    — Train sequentially on tasks i.e train using all the instances of one task before beginning training on the next task.
    — This is bound to lead to catastrophic forgetting (forgetting the previously learned tasks as you learn the new tasks).
  2. Homogeneous Batches
    — Each batch contains instances of 1 task only but the batches are shuffled i.e the model is learning all the tasks together just that in one batch samples from only one task are present.
  3. Heterogeneous Batches
    — Combine the datasets for all tasks and shuffle the entire data collection.
    — Each batch can contain instances of many tasks.
  4. Uniform Batches(Forced heterogeneity)
    — Equal number of instances of each task in every batch.

Part 3 of this article series is now available here.

References:

  • Ruder, Sebastian. “An overview of multi-task learning in deep neural networks.” arXiv preprint arXiv:1706.05098 (2017).
  • Worsham, Joseph, and Jugal Kalita. “Multi-task learning for natural language processing in the 2020s: where are we going?.” Pattern Recognition Letters (2020).
  • Stanford CS330: Multi-Task and Meta-Learning, 2019.
  • Gottumukkala, Ananth, et al. “Dynamic Sampling Strategies for Multi-Task Reading Comprehension.” Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020.

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data…

Sign up for Analytics Vidhya News Bytes

By Analytics Vidhya

Latest news from Analytics Vidhya on our Hackathons and some of our best articles! Take a look

By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information about our privacy practices.

Check your inbox
Medium sent you an email at to complete your subscription.

Neeraj varshney

Written by

Ph.D. student in Natural Language Processing

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Neeraj varshney

Written by

Ph.D. student in Natural Language Processing

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store