Meta-Learning: Learning to Learn

Although artificial intelligence and machine learning are currently extremely fashionable, applying machine learning on real-life problems remains very challenging. Data scientists need to evaluate various learning algorithms and tune their numerous parameters, based on their assumptions and experience, against concrete problems and training data sets. This is a long, tedious, and resource expensive task. Meta-learning is a recent technique to overcome, i.e. automate this problem. Meta-learning aims at using machine learning itself to automatically learn the most appropriate algorithms and parameters for a machine learning algorithm.


Teaching machines how to learn is hard

Artificial intelligence and machine learning are currently extremely fashionable. In recent years, this technology has left the realm of research and universities behind and is getting more and more attention from industries, governments, and public bodies. Moreover, it is increasingly subject of public debates. In recent years it has been shown that advanced machine learning algorithms, like neural networks, have the potential to be successfully applied to many domains, e.g. machine translation, image recognition, self-driving cars, and automation. 
 
Driven by emerging technologies, like the Internet of Things (IoT), cyber-physical systems (CPS), and the so-called Industry 4.0, today more data than ever is produced and collected across many different industries. Often measured by sensors, this data captures the behaviour of a system. By detecting patterns, valuable insights and future predictions can be derived from the collected data. Therefore, it is no surprise that more and more companies, aside from the big players, start to explore how artificial intelligence and machine learning can be an asset to their businesses and how the collected data can be used to create value.

Unfortunately, without deep machine learning and mathematical expertise it is very difficult or even impossible to build suitable learning models — not to mention implementing ones own algorithms. The choice of different algorithms and their parameters are numerous and it is extremely difficult to understand their impacts. When to apply restricted Boltzmann machines, long short-term memory neural networks, deep believe networks, convolutional neural networks, auto-encoders, random forests, k-nearest neighbours? Or perhaps a Gaussian mixture model performs better for this specific case? How to choose the number of layers in a neural network, how to determine the best value for k in k-nearest neighbours? What is a ReLU or Pooling layer? What is a vanishing gradient and why it matters?

No need to understand the mechanics of a car engine in order to drive a car

While in theory, machine learning models should be applicable like black boxes, i.e. should be usable without knowing the internals, just like we don’t have to know all the internals of how a motor works to drive a car. In practice, data scientists need to evaluate various learning algorithms and tune their numerous parameters, based on their assumptions, against concrete problems and training data sets. This is known as inductive bias or learning bias and is usually a long, tedious, and expensive task. Currently, machine learning is very hard to use with its thousands knobs and parameters and most of the time default settings don’t work. It is like we have to configure thousands of parameters on our motors before we can drive a car. In addition, the field is moving extremely fast so that nearly every day new algorithms, refinements to existing ones and new findings are proposed. Today, often only big tech companies have the resources (know-how, data, and budget) to follow this and to successfully apply machine learning & AI to large-scale projects.

But even if a model is once successfully created and trained, the challenges do not stop here. While the choices made might be satisfactory today, they can be wrong tomorrow, when the conditions change. Learning algorithms are designed to converge, which naturally creates a resistance to learn changes and adapt to new situations. This is extremely challenging for domains like IoT and Industry 4.0, which need to handle continuously evolving data.

Let the machines learn themselves how to learn

The question is how this situation can be improved? Machine learning algorithms have been created to learn from data without being explicitly programmed. Recently, research from different universities and big tech companies, like Google, started to investigate if machine learning algorithms could be used to actually learn how to learn, i.e. which algorithms and which parameters would be the most appropriate ones for a given problem at hand. This process is referred to as meta-learning and gets currently much attention. With AutoML Google rolled out a commercial platform, others, like Auto-Keras are open source and free to use. While current offerings are more experimental and in a very early stage, they provide an interesting glance into what could be the future of machine learning.

The current work on meta-learning focuses around neural networks and deep learning. Neural networks are powerful, generic, and versatile tools to learn relationships between inputs and outputs. They can be used to solve a wide variety of problems, from regression to classification, to computer vision. They are suitable for both supervised and unsupervised learning. However, this versatility and generic problem solving capability comes at a high price. There are lots of possible configurations to build a neural network. In fact, neural networks maybe show most clearly how challenging it is to find the right algorithms and configuration parameters to train a model. This can be so confusing and unclear that it is sometimes referred to as “the neural zoo”. Manually designing a neural network is difficult because the search space of all possible networks can be combinatorially large. Therefore, it is no surprise that today’s efforts around meta-learning focus on this type of learning, although it is not limited to it. Meta-learning can help to reduce the burden of finding the best configuration of a neural network for a specific problem using machine learning for this process itself. This is known as neural architecture search.

The neural zoo, source: http://www.asimovinstitute.org/neural-network-zoo/

Different approaches of neural architecture search have been suggested and it has been shown that they can compare well with or even outperform the best hand-designed neural networks, both in terms of accuracy and performance.

The basic neural architecture search algorithm works as follows: it first proposes a candidate model, then it evaluates this candidate model against a data set, and finally uses the results as a feedback to teach the neural architecture search network.

How meta-learning works

Different variants of this algorithm have been purposed. For example, Google suggests a neural architecture search with reinforcement learning. Their algorithm uses a recurrent neural network to generate the model descriptions of neural networks and train this recurrent neural network with reinforcement learning to maximize the expected accuracy of the generated architectures on a validation set. Other approaches use ensemble learning. Neural architecture search algorithms largely rely on vast computational resources to optimize the accuracy of the network. To address this problem, Baidu suggested a multi-objective neural architecture search that uses a multi-objective reward function that takes network accuracy, computational resources, and training time into consideration. And it is safe to say that this is just the beginning. Over time, we will see more and more clever algorithms finding more and more accurate neural network architectures for specific problems in ever decreasing times.

The meta-learning level can be viewed from different angles, either as a hyper-parameter search or as an ensemble method, where a strong model out of several weak ones can be created. This ensemble can contain different neural networks with different parameters, but could also contain completely different machine learning algorithms, not just neural networks. Ultimately, this can allow a system to automatically learn the best learning strategy but also to adapt its own learning strategy to changing conditions.

Meta-learning: the future of machine learning?

The potential of meta-learning is huge. While a far-off future promises truly intelligent, self-learning, and self-adaptive systems, which pushes the limits of automation to a whole new level, in the near future meta-learning can provide deep learning tools to domain experts with limited data science or machine learning background. The training of high-quality custom machine learning models can become a lot more accessible. This will enable machine learning to entirely new domains and businesses.

As the domain evolves, we should transition towards a situation where machine learning can actually be used as a black box, without requiring deep computer science expertise. Besides the complexity of meta-learning, the required computational resources are still a major challenge. It is often already very costly to train a single neural network, not to mention to train and evaluate a complete ensemble. However, with the constantly improving computational power, dedicated machine learning hardware like Tensor Processing Units, and advances made in meta-learning algorithms, this will become better and better with time. While it will probably still take some time until we can see the fruits of meta-learning, this technology has the potential to be the next big thing in machine learning & AI.

Acknowledgement

This work is supported by the Luxembourg National Research Fund, FNR (https://www.fnr.lu/), under the Industrial Fellowship program (grant 12490856).