Automate Architecture Modelling with Neural Architecture Search (NAS)

Published in

Analytics Vidhya

6 min readOct 12, 2020

What’s the deal with Neural Architecture Search?

“AI that creates AI — Neural architecture search (NAS) is a technique for automating the design of artificial neural networks (ANN), a widely used model in the field of machine learning.”

Over the years, the Deep Learning paradigm has earned striking progress on a variety of tasks such as image recognition, speech recognition, and machine translation. Many leading companies like Google and individual academic researchers have been able to provide a handful of state-of-the-art deep learning models to provide some breathtaking results on these above-mentioned applications. If we narrow down this topic to image classification, very complex architectures like VGG16, GoogleNet, and ResNet perform really well on the standard datasets like ImageNet and CIFAR-10 but not quite well with some organization-specific datasets. Nevertheless, these currently employed machine learning models are mostly been designed manually by human experts like skilled teams of engineers and scientists, which is a difficult, time-consuming, and error-prone process.

Therefore a sub-discipline of AutoML called Neural Architecture Search focuses on automating the design of machine learning models while making this process more accessible. The Neural Architecture Search aims to automate the process of finding the best model architecture, given a dataset.

As I started looking into NAS for my final year research project I found this interesting survey on NAS which was very helpful by Elsken, et al 2019. In this blog, I will try to explain these theories behind NAS in a very intuitive and simplistic manner as I personally experienced a lot of difficulties when I first learned it.

Overview of NAS

Typically, there are three major components in a NAS system.

1. Search space

This component describes which neural architectures a NAS approach might discover in principle. So, it defines a set of operations (e.g. convolution, fully-connected, pooling) and how operations can be connected to form valid network architectures. Though we called NAS as an automotive task, this phase of the NAS approach requires a human intervention to design the search space specific to a given application. For instance, for a computer vision task, the search space can be a space of state-of-the-art convolutional networks and for a language modeling task a space of recurrent networks. However, this introduces a human bias to the overall process, which may prevent finding novel architectural building blocks that go beyond the current human knowledge. But still, it helps to reduce the size of the search space and to simplify the search.

Sequential Layer-wise Operations

This is perhaps the most naive and relatively simple way to design the search space for neural network architectures. It consists of a sequence of n layers, where the i’th layer L_i receives its input from layer (i − 1) and its output serves as the input for layer (i +1).

The search space is parameterized by,

The number of layers
The type of operation every layer executes ( E.g: convolution, pooling, depthwise separable convolutions, etc)
Hyperparameters associated with the operation (E.g: #filters, kernel size, stride for a convolutional layer, #units for a fully-connected layer)

Each node in the graphs corresponds to a layer in a neural network, e.g., a convolutional or pooling layer. Different layer types are visualized in different colors.

Chain-structured space

This incorporates a more complex search space with some additional layer types and multiple branches and skip connections.

Cell-based Representation

If we are to observe some state-of-the-art vision model architectures like Inception, ResNet, and MobileNet, those models are designed using repeated modules or cells. Therefore rather than hand-designing whole architecture for the search space, we can define the search space as the same cell getting repeated multiple times and each cell contains several operations predicted by the NAS algorithm. This also makes it easy to scale down or up the model size by adjusting the number of cell repeats. One of the major advantages of implementing a cell-based search space is the gained speed-up due to the reduced search space. (i.e. cells usually consist of significantly fewer layers than whole architectures). The NASNet search space learns two types of cells for network construction:

Normal Cell: The input and output feature maps have the same dimension.
Reduction Cell: The output feature map has its width and height reduced by half.

An architecture built by stacking the cells sequentially. Note that cells can also be combined in a more complex manner, such as in multi-branch spaces, by simply replacing layers with cells.

2. Search Algorithm

This component determines how to explore the search space to find good architecture. A naive approach to explore the search space is trial and error techniques. Random search is a trial and error technique that samples a valid architecture candidate from the search space at random and no learning model is involved. However, this gives the best results for a well-designed search space.

Mainly, the existing NAS search algorithms or search strategies can be categorized into two groups as black-box optimization strategies and differential architecture search strategies.

Black-box Optimization Strategies

Black-box optimization strategies have a discrete search space and therefore gradient descent optimization cannot be applied directly. Various adaptive methods such as reinforcement learning, evolutionary programming, and Bayesian optimization fall under the black-box category.

Differential Architecture Search Strategies

Unlike black-box optimization strategies, differential architecture search strategies relax the search space to be continuous, and therefore gradient descent is used for optimization. DARTS and PC-DARTS are two such differential architecture search strategies and they are more efficient and faster than earlier black-box optimization strategies.

3. Evaluation Strategy

As explained earlier, the search algorithms aim at finding a neural architecture that maximizes some performance measures such as accuracy on unseen data. Therefore, we need a strategy to estimate or predict the performance of a given architecture to guide the search algorithm.

Training from scratch

One obvious way to do this is to train the candidate model on training data independently from scratch and then evaluates its performance on validation data. However, this a highly computational task that requires thousands of GPU days to train each architecture to be evaluated from scratch. Therefore several evaluation strategies have been proposed to speed up performance estimation.

Proxy Task Performance

This approach estimates the performance of a child network generally in a cheaper and faster fashion. It includes training on a smaller dataset which is often called a proxy dataset. Thus it reduces the time to train the child network in a larger factor. Also, the training time is reduced by training for fewer epochs. Although these approximations reduce the computational cost, they also introduce bias in the estimate as performance will typically be underestimated. However, this may still not be a problem as long as the search strategy only relies on ranking different architectures and the relative ranking remains stable.

Summary

In this article, we reviewed the problem of neural architecture search by dividing in into three major components as search space, the search algorithm, and the evaluation strategy. This provides you a basic understanding of the Neural Architecture Search before diving into the deeper implementations.

I hope you found this article useful.

In a later article, we will start looking at some efficient Architecture Search strategies like DARTS in more detail.

References

[1] https://arxiv.org/pdf/1806.09055.pdf

[2] https://arxiv.org/abs/1806.09055

[3] https://arxiv.org/abs/1806.09055