Multi-Task Deep Learning with Pytorch
In Machine Learning, we typically aim to optimize for a single metric, for a single task.
Multi-task learning (MTL) has led to successes in many applications of machine learning, from natural language processing and speech recognition to computer vision and drug discovery.
The most famous case of MTL is probably Tesla’s autopilot.
You already know Tesla’s goal is to solve a multitude of tasks like object detection, depth estimation, 3D reconstruction, video analysis, tracking, and more, and you might think they’re implementing 10+ Deep Learning models that work together, but that’s not the case.
Introduction to HydraNet
Tesla’s main idea is fairly simple: one body, several heads. Leveraging a single model to solve multiple tasks.
On a superficial scale, you can see images being extracted by feature extraction models. These images are then combined into a “super image”, which are then combined again with previous super images, bringing time into the equation. That’s usually done by using 3D CNNs, RNNs, Transformers, etc.