Various Types of Supervision in Machine Learning

3 min readJun 8, 2019

In this post i’m going to review different kinds of supervision strategies for training a machine learning model. The intention of this post is to introduce the concepts and terminologies, so i won’t go deep into details for each one.

Data driven machine learning models (as you probably already knew) are categorized based on the use of labeled samples:

Supervised: the model uses a set of (x, y) for training, where x is the feature vector and y is the associated label.
Unsupervised: the model uses just the feature vectors with no label information for training.
Semi-Supervised: a combination of labeled and unlabeled samples are used for training.

Supervision Strategies based on the number of hand-labeled samples

Usually the best performances come from supervised models. However, labeled samples are often expensive to acquire. Several approaches have been proposed to address this issue.

These approaches try to reduce the cost of generating labeled samples:

Self Supervision: In this case labels are extracted from the data itself. For example, to train a language model, we need a sequence of words as the features and the next word as the label. We can use any document (which is available all around internet!) to build this kind of dataset. Another example is Emoji prediction, where we want to build a model that predicts the Emojis associated with a tweet. Any tweet that has an Emoji can be used for training this model, and since the labels are available with the data itself we don’t need to hand-label anything (yoo-hoo!).
Distant Supervision: We have a set of unlabeled data samples and an external source which can be used to infer labels. For instance, let’s suppose we have a set of documents and we want to find out which documents contain the names of married people. We can use a database of married people and tag each document as “true” or “false”. The generated dataset can then be used to train a classifier.
Weak Supervision: Using a set of heuristics, functions, distributions and domain knowledge we can provide noisy labels for our classifier. The classifier uses these noisy labels provided by each resource for training. (Snorkel is a great tool for weak supervision if you’re interested.)
Self Training: In this strategy we have a set of labeled and unlabeled samples. We use the labeled samples to train the model. The model is then used to annotate the unlabeled samples. These predicted labels alongside the labeled samples are used for training the original model. Another approach is to train different classifiers using various types of features. These classifiers then provide labels for one another. This approach is called “Co-Training”.
Active Learning: providing the machine learning models with labels is often expensive. To make the best out of minimum labeled samples, a classifier is asked to identify the best samples in terms of information they will provide if they were labeled. These samples are annotated (by a domain expert) and fed back to the model for training. This process is repeated until required performance is achieved.

Various Types of Supervision in Machine Learning

Written by Behnam Sabeti