Unsupervised Progressive Learning (UPL): A new problem for AI

Unsupervised Progressive Learning (UPL) is a problem that involves an agent that analyzes a sequence of unlabelled data vectors (data stream) and learns representations from these.

Published in

DAIR.AI

5 min readApr 4, 2020

Introduction

Unsupervised Progressive Learning (UPL) is a problem that involves an agent that analyzes a sequence of unlabelled data vectors (data stream) and learns representations from these. The data vectors are associated with a class that is not seen by the agent while learning the representations. At some point, the agent’s learned representations can then be used to perform some offline tasks that could be in the form of unsupervised, supervised, or semi-supervised. UPL poses that data is fed live to the agent which means that it will progressively learn about more classes as those are introduced across time. UPL could potentially represent how animals perform perceptual learning — learn gradually about surroundings and categorize objects as more distinctions are being made. In humans, this type of learning eventually leads to classes being associated with words or other important aspects of survival such as reward, taste, and fear. The figure below summarizes each step of Unsupervised Representation Learning:

As you can see the in the image, along the stream every time we receive a different number (which is not labeled as such, so the model only receives the image information and nothing else) the number of classes increases over time.

You might be thinking that this is similar to other problems such as self-supervised learning or continual learning. Smith et. al (2019) explains how UPL is different from these in a detailed manner.

STAM Architecture

Smith et al. (2019) propose an architecture to solve the problem called the STAM architecture which is organized in 4 main modules:

Hierarchy of receptive fields

Basically this is a group of layers defined by the user/developer and instead of neurons or hidden-layers units, each layer consists of STAM units — in the simplest form this functions as a clustering module. It’s responsible to define the prototypes stored by the architecture.

Online clustering

It obtains the results of the STAM units in layer L, let’s call them patches Xm of l (which were used to create the prototypes) and merge them into a set of centroids. This means that all STAM units of layer L share the same set of centroids Cl (t).

Novelty Detection

When an input patch Xm at layer l is significantly different than all centroids at layer L a whole new centroid is created on the set C(L) based on Xm. This is what makes the architecture capable of learning new classes after they appear in the unlabeled data stream.

Dual-Memory Organization

The memory is separated into Short-Term Memory (STM) and Long-Term Memory (LTM). Centroids are stored temporarily (set from a parameter) at the STM which has a really small capacity and a huge learning rate. Every time a centroid is selected as the nearest neighbor of an input patch, it has its values updated. If it is selected more than an X number times, it is copied into the LTM, which has a huge storage capacity and a small learning rate. This memory organization is inspired by the Complementary Learning Systems Framework.

Experiments and results

For evaluation, the authors create a data stream in which small groups of classes appear in successive phases, which they refer to as Incremental UPL. Each phase only contains two classes and these classes only appear in that phase.

For experiments, the paper uses two problem sets: one for classification and another for clustering against two baseline methods. The first is a convolutional encoder (CAE) and the second is a self-supervised method based on rotations (RotNet). The datasets used are MNIST, EMINST and SVHN. For each task, they average the results on three trials. Each trial has 1500 images.

The classification task

Given a few labeled examples for the classes that have been present in the stream up to time t, the algorithm is asked to perform object classification on the testing data. This problem focuses on the expanding case, meaning that at each phase (t) the model needs to classify all classes seen so far.

Each figure is related to the results on each one of the different datasets used, as you can see when we add more examples(images) the average accuracy of every model decreases which is ok since with every new batch of examples on the stream the task gets more difficult as the number of classes increases but the STAM architecture is the one that handles best the catastrophic forgetting aspect of the problem (which means that if the class appeared at the beginning of the stream, the STAM architecture is still able to classify it properly.

The clustering task

“ Given that we have the same number of test vectors per class, we associate each cluster with the most-represented class in that cluster. Any instances of another class in that cluster are counted as errors. The number of clusters k is equal to the number of classes seen up to that phase in the unlabeled datastream.”

This basically means the same as classifying all classes seen up to point t used in the classification task.

For MNIST, STAM still performs consistently better than the two other methods, and its accuracy stays almost constant going from 4 classes to 10 classes. For SVHN, RotNet performs significantly better. Finally, for EMNIST, STAM outperforms the two deep learning methods without experiencing significant loss of accuracy after the first 10 phases(20 classes).

Conclusion

In summary, the STAM architecture has the following characteristics that are essential to solving the UPL problem: Online Learning, Transfer Learning, Resistance to catastrophic forgetting and Expanding Learning Capacity. It achieves all of this with no direct access to its previous experience since it only stores prototypes learned.

Paper: Unsupervised Progressive Learning and the STAM Architecture — Smith, J., Baer, S., Kira, Z., & Dovrolis, C. (2019)