CORe50: a new Dataset and Benchmark for Continual/Lifelong Deep Learning

Vincenzo Lomonaco
ContinualAI
Published in
6 min readAug 13, 2017
Official Home Page of the CORe50 project.

Hi guys, what’s uuup? :-) I’m Vincenzo Lomonaco, a 2nd year PhD student @ University of Bologna and in this second story on Medium I’d like to cover our latest work on Continual/Lifelong Learning with Deep Architectures.

This work, currently under peer-review, is all about a new dataset and benchmark specifically designed for Continual Learning in the context of Vision, called CORe50.

Below you can find a 5-minutes video presentation (no math, very easy to follow) with the key aspects of our work or you can just skip to the next section!

5-minutes clip on the key concepts of CORe50.

Why CORe50?

At this point you may think: “What? Are you crazy? Why do we need another Dataset for Object Recognition? There are plenty out there!”. Well, none of them has been specifically designed for Continual/Lifelong learning.

Continual/Lifelong learning of high-dimensional data streams is a challenging research problem. In fact, fully retraining models each time new data becomes available is infeasible, due to computational and storage issues, while naïve incremental strategies have been shown to suffer from catastrophic forgetting.

In the context of real-world object recognition applications (e.g. robotics), where continual learning is crucial, very few datasets and benchmarks are available to evaluate and compare emerging techniques.

SpotMini is the new smaller version of the Spot robot from Boston Dynamics weighing around 65 lbs [1]

Think for a second about the long-dreamed domestic robot which may (very soon) be in your house, and let’s call her Puppy (Hoping her will be a little nicer than the SpotMini, from Boston Dynamics from the figure above, ah ah!). Puppy is a nice and smart dog robot which during her stay in the house will get in touch with a lot of new objects, tasks and situations. Is it clear that in this scenario, collecting data beforehand which can be representative for all the possible situations and tasks she may encounter is impossible.

Giving the high-dimensional, multi-modal and streaming nature of the perception data which will flow through the Puppy’s sensors and cameras (valued around 50GB/s for humans), could be impossible other then incredibly inefficient to think of collecting all the data during the day (~4320 TB) and re-training the entire Puppy’s brain from scratch each night with the accumulated data acquired until then (especially if we want her to stay in the house indefinitely). Indeed, just after a week you'd already have ~30240 TB of data from which to train you model the following night!

This is why Learning continually and adaptively about the external world is essential.

Indeed, what we need are scalable and efficient techniques which (as for biological learning systems) can learn online for the autonomous incremental development of ever more complex skills and knowledge.

This is where CORe50 comes in place. If we want to create and asses new Continual Learning strategies we need to start simple and deal with catastrophic forgetting first. The ideal place to start is with a simple yet real-world dataset in a well studied context (Object Recognition), and with manageable dimensions (in order to run experiments swiftly and speed-up the research cycle, as recently argued by Y. Bengio)!

Dataset

Let’s have a closer look at the Dataset now. CORe50, specifically designed for (C)ontinuous (O)bject (Re)cognition, is a collection of 50 domestic objects belonging to 10 categories: plug adapters, mobile phones, scissors, light bulbs, cans, glasses, balls, markers, cups and remote controls. Classification can be performed at object level (50 classes) or at category level (10 classes). The first task (the default one) is much more challenging because objects of the same category are very difficult to distinguish under certain poses.

The dataset has been collected in 11 distinct sessions (8 indoor and 3 outdoor) characterized by different backgrounds and lighting. For each session and for each object, a 15 seconds video (at 20 fps) has been recorded with a Kinect 2.0 sensor delivering 300 RGB-D frames.

Objects are hand hold by the operator and the camera point-of-view is that of the operator eyes. A subjective point-of-view with objects at grab-distance is the perfect context for assessing robotic applications like the interaction with our lovely Puppy! :-)

Example images of the 50 objects in CORe50. Each column denotes one of the 10 categories.

The presence of temporal coherent sessions (i.e., videos where the objects gently move in front of the camera) is another key feature since temporal smoothness can be used to simplify object detection, improve classification accuracy and to address semi-supervised (or unsupervised) scenarios [2][3] other then resembling the actual streaming visual perception data.

In the figure above you can see some image examples of the 50 objects in CORe50 where each column denotes one of the 10 categories and each row a different object. The full dataset consists of 164,866 128×128 RGB-D images: 11 sessions × 50 objects × (around 300) frames per session. For more information about the dataset take a look a the section “CORe50” in the preprint arXiv paper.

Benchmark

Now let’s see how we can use CORe50 for Continual Learning!

Popular datasets such as ImageNet and Pascal VOC, provide a very good playground for classification and detection approaches. However, they have been designed with “static” evaluation protocols in mind where the entire dataset is split in just two parts: a training set is used for offline learning and a separate test set is used for accuracy evaluation.

Splitting the training set into a number of batches is essential to train and test Continual Learning approaches. Unfortunately, most of the existing datasets are not well suited to this purpose because they lack a fundamental ingredient: the presence of multiple (temporal coherent and unconstrained) views of the same objects taken in different sessions (varying background, lighting, pose, occlusions, etc.).

With CORe50 we are able to consider instead three different Continual Learning scenarios:

  • New Instances (NI): new training patterns of the same classes becomes available in subsequent batches with new poses and environment conditions (illuminations, background, occlusions, etc..). A good model is expected to incrementally consolidate its knowledge about the known classes without compromising what it learned before.
  • New Classes (NC): new training patterns belonging to different classes becomes available in subsequent batches. In this case the model should be able to deal with the new classes without losing accuracy on the previous ones.
  • New Instances and Classes (NIC): new training patterns belonging both to known and new classes becomes available in subsequent training batches. A good model is expected to consolidate its knowledge about the known classes and to learn the new ones.
Mid-CaffeNet accuracy in the NI, NC and NIC scenarios respectively (average over 10 runs, shuffling the batches order). Colored areas stand for standard deviation.

As shown by many researchers, naïve approaches (simply continuing back-prop, with no access to previous data) cannot avoid catastrophic forgetting in complex real-world scenarios such as NC and NIC.

In our work we have designed simple baselines which can perform markedly better than naïve strategies but still leave much room for improvements with respect to the “cumulative” strategy where the model has access to previous data and it’s retrained from scratch as soon as a new batch of data becomes available (see the accuracy gap in the figure above)!

Download it, try it!

We have already released the code and a lot of additional materials which we are continuing to improve so you can easily try out new Continual Learning strategies and speed-up the research progresses in these new and hot DL research area!

Have a look at the official GitHub repository: https://vlomonaco.github.io/core50 and let me know if you what you think or if you find any issues!

If you’d like to see more posts about my work on AI and Continual/Lifelong Deep Learning follow me on Medium and on my social: Linkedin, Twitter and Facebook or join our growing comminity of CL enthusiasts at ContinualAI.com!
If you want to get in touch or you just want to know more about me and my research, visit my website vincenzolomonaco.com or leave a comment below! :-)

--

--

Vincenzo Lomonaco
ContinualAI

AI & Continual Learning Assistant Professor @ Unipi | Co-Founding President & Lab Director @ ContinualAI.org | Personal Website: http://vincenzolomonaco.com