Overcome model déjà vu by leveraging diverse datasets with deep multi-task learning

Published in

CognizantAI

6 min readApr 21, 2021

By Elliot Meyerson, PhD, Research Scientist

Often in real-world tasks, there isn’t enough data to take full advantage of deep learning. However, it is possible to leverage other datasets to reach a critical mass. Sharing knowledge across diverse datasets leads to more general knowledge, deeper insights and more well-informed decisions. This is especially true in domains like healthcare, where data for any particular task can be expensive or dangerous to collect.

Modeling datasets separately wastes useful structure that could be shared between them. To solve this problem, a general-purpose deep learning structure can be discovered across diverse datasets using Evolutionary AI.

Model Déjà Vu

When training a model for a new application, sometimes there is a strange feeling that the model is learning something you’ve seen other models learn before. This phenomenon, known as model déjà vu, is common among researchers and data scientists who are becoming increasingly rapid at developing diverse applications using machine learning toolsets.

This is not a spurious feeling. Your models do share common learned structure. In theory, any set of tasks share some amount of information [1]. Model déjà vu happens when such information is learned over and over again across different tasks. However, the theory also tells us that if we explicitly share learned structure across tasks, we can improve the performance in each of them, by directly exploiting the underlying regularities found across datasets.

This observation suggests a path forward for overcoming model déjà vu: Train increasingly broad sets of tasks together in a joint model to exploit the regularities between them and avoid relearning diluted copies of the same core knowledge. If this core knowledge can be successfully captured in learned modules, it functions as a new learned tool in your general problem-solving toolbox.

Ways that Core Knowledge is Captured

Substantial structure is already shared indirectly across diverse problems through common methodologies. For example, deep learning approaches are built from a relatively small set of common components: compositions and concatenations of linear maps and elementwise nonlinearities, common layer types, popular initialization and regularization schemes, SGD variants, etc. The fact that these tools are successful across diverse domains implies that the domains have a lot in common. Through trial and error and hard labor, humans refine these general tools. In other words, over the years, a large population of human experts have discovered priors that encode core knowledge in the form of the deep learning toolbox.

Beyond these tools, there is shared knowledge stored in the data scientists themselves. This is the knowledge that practitioners gain as they hone their skills across a set of related tasks, i.e., different modeling problems. They exploit this knowledge by expertly applying methodologies, and it is this same expertise that leads to the model déjà vu that keeps them up at night. These are deep skills that a data scientist may not even be able to explain; skills that are used to exploit regularities across modeling tasks.

More generally, humans develop general problem-solving skills by working at a wide variety of problems. This ability yields individuals who can adapt quickly inside a changing environment (e.g., business or organization), who are able to take on new tasks as they appear, who can seemingly do anything. These are the generalists you try to keep around you no matter what project you’re working on. Within a business, different tasks and projects are related precisely because they are part of the same business, even if they appear unrelated on a surface level. There is a trove of core knowledge in every business that generalists are able to harness, and this knowledge makes them flourish.

General problem solving has tremendous value, but both human discovery of new general methodologies and encoding general skills in humans are high-cost activities. A machine that automatically learns general functionality avoids human costs and can discover sub-symbolic regularities that would be difficult for humans to discover on their own.

Deep Multi-task Learning

Deep multi-task learning is one avenue for developing approaches that make this discovery possible. In deep multi-task learning (DMTL), the architectures for a set of tasks are aligned so that they share some subset of their parameters. The tasks are then trained jointly by minimizing their combined loss. In the standard setting, a human will identify a set of tasks that seem closely related, and then design a way to link up the architectures of these tasks. The simplest and most common approach is to share the architecture entirely until the final classifier portion of the model. Such an approach has been applied successfully within core areas of AI, including vision [2], natural language [3], speech [4], and reinforcement learning [5].

A recent line of work has developed methods for assembling shared modules automatically in different ways for different tasks [6,7,8], including methods using LEAF ENN. However, these methods can still only be applied across closely-related architectures. To address these limitations, and move towards the highly general sharing at which humans excel, our recent work extends DMTL to sharing across arbitrary architectures and task modalities, i.e., across fundamentally different kinds of data [9].

MUiR: Integrating Diverse Domains and Architectures

The approach is called Modular Universal Reparameterization, or MUiR; Modular because it encodes shared knowledge in learned parameter modules; Universalbecause the functional form of the modules enables them to be applied across any set of deep architectures; Reparameterization because the method reframes the problem of training a set of disjoint architectures as training a set of modules while optimizing how they are used.

**Image 1:** MUiR reframes the training of diverse architectures for diverse tasks by training a set of generic modules and optimizing how they are used to solve subproblems in each task. Such modules learn to encode knowledge for general problem-solving.

The idea is that any deep learning architecture can be decomposed into a set of equally-sized subproblems or pseudo-tasks. Each of these pseudo-tasks can then be solved by a learned module with a generic form. If a module solves multiple pseudo-tasks effectively, then it has learned general functionality. MUiR introduces an efficient algorithm for training modules with gradient descent, while simultaneously using evolutionary search to learn which module best solves each pseudo-task.

The power of this approach was first demonstrated in a classic multi-task learning benchmark. It was then scaled up to sharing across popular architectures for vision (image classification), NLP (language modeling), and genomics (CRISPR binding prediction). The architectures are very different: 2D-convolutional for vision, 1D-recurrent for NLP, and 1D-convolutional for DNA. Yet despite these surface-level differences, MUiR successfully discovers regularities across the domains, encodes them into sharable modules, and applies these modules to improve performance in the individual tasks. The results confirmed that sharing learned functionality across diverse domains and architectures is indeed beneficial, thus establishing this key tool for general problem-solving.

A Fresh Look at the Dataverse

Armed with a technology that allows us to exploit core knowledge across disparate application areas, we can now ask questions that previously might have seemed incoherent. These are questions like: How can prediction of sentiment on Twitter help improve disease prediction models? How can image classification improve your revenue predictions? How can earthquake prediction inform education policy? Of course, for any given problem, MUiR is just one of many orthogonal approaches to improving your models; others include better architectures, data collection, and preprocessing. However, by exploiting connections that humans would never think to look for, MUiR pushes AI another step beyond what is possible by humans, into the intelligence unknown. And if there is one thing that AI has made clear, it is that the underlying structure of the “dataverse” is much stranger than humans can comprehend on their own.

This paper was originally published here. The contents were recently updated and presented at the NeurIPS conference in Vancouver, Canada.

Feel free to comment below if you have any questions for me or click here to learn more about Cognizant Evolutionary AI.

Overcome model déjà vu by leveraging diverse datasets with deep multi-task learning

Written by Cognizant AI