BANANAS: A new method for neural architecture search
In this post, we discuss a new state-of-the-art algorithm for neural architecture search.
Arxiv paper: https://arxiv.org/abs/1910.11858
Source code: https://www.github.com/naszilla/bananas
Neural architecture search (NAS) is one of the hottest research areas in machine learning, with hundreds of papers released in the last few years (see this website). In neural architecture search, the goal is to use an algorithm (sometimes even a neural network) to learn the best neural architecture for a given dataset.
The most popular techniques for NAS include reinforcement learning, evolutionary algorithms, Bayesian optimization, and gradient-based methods. Each technique has its strengths and drawbacks. For example, Bayesian optimization (BayesOpt) is theoretically one of the most promising methods, and has seen huge success in hyperparameter optimization for ML, but it is very challenging to run Bayesian optimization for NAS in practice. Bayesian optimization works by modeling the space of neural architectures, and then automatically telling you which neural architecture to try next. See our previous blog post for an introduction to BayesOpt for NAS. However, setting up BayesOpt for NAS requires a huge amount of human effort in creating a hand-crafted distance function and tuning a Gaussian Process.
In our new paper, we design BANANAS, a novel NAS algorithm which uses Bayesian optimization with a neural network model instead of a GP model. That is, in every iteration of Bayesian optimization, we train a meta neural network to predict the accuracy of unseen neural architectures in the search space. This technique gets rid of the aforementioned problems with Bayesian optimization NAS: the model is powerful enough to predict neural network accuracies, and there is no need to construct a distance function between neural networks by hand.
We use a path-based encoding scheme to encode a neural architecture, which drastically improves the predictive accuracy of our meta neural network. After training on just 200 random neural architectures, we are able to predict the validation accuracy of a new neural architecture to within one percent of its true accuracy on average, for multiple popular search spaces. BANANAS also utilizes a novel variant of Thompson sampling for the acquisition function in Bayesian optimization.
We tested BANANAS on two of the most popular search spaces, the NASBench and DARTS search spaces, and our algorithm performed better than all other algorithms we tried, including evolutionary search, reinforcement learning, standard BayesOpt, AlphaX, ASHA, and DARTS. The best architecture found by BANANAS achieved 2.57% test error on CIFAR-10, on par with state-of-the art NAS algorithms.
Included in the GitHub repository is a Jupyter notebook which lets you easily train a meta neural network on the NASBench dataset. Input your favorite combination of hyperparameters to try to achieve the best prediction accuracy on NASBench!