# [Paper] Single-Path NAS: Designing Hardware-Efficient ConvNets in less than 4 Hours (Image Classification)

## Training Time **5,000× Faster Than **MnasNet**, 25× **Faster Than **ProxylessNAS, 11× **Faster Than **FBNet**. Outperforms MnasNet, ProxylessNAS, FBNet & MobileNetV2

In this story, **“Single-Path NAS: Designing Hardware-Efficient ConvNets in less than 4 Hours” (Single-Path NAS)**, by Carnegie Mellon University, Microsoft, and Harbin Institute of Technology, is presented.

- The NAS (Neural Architecture Search) problem remains challenging due to the combinatorically large design space, causing a significant searching time (at least 200 GPU-hours).
- In this paper, Single-Path NAS is proposed, which
**drastically decreases the number of trainable parameters and the search cost down to few epochs.** - Finally,
**hardware-efficient ConvNets can be searched in less than 4 hours.**

This is a paper in **2019** **ECML PKDD **with over **70 citations**. (Sik-Ho Tsang @ Medium)

# Outline

**Multi-Path NAS vs Single-Path NAS****Single-Path NAS: Search Space****Single-Path NAS: Differentiable Runtime Loss****Experimental Results**

**1. Multi-Path NAS vs Single-Path NAS**

## 1.1. Multi-Path NAS

- e.g., for a mobile-efficient ConvNet with 22 layers, choosing among five candidate operations yields 5²² ≈ 10¹⁵ possible ConvNet architectures.
- These techniques remain
**considerably costly**.

## 1.2. Single-Path NAS

- Without having to choose among different paths/operations as in multi-path methods, we instead
**solve the NAS problem as finding which subset of kernel weights to use in each ConvNet layer.**

Different candidate convolutional operations in NAS can be viewed as subsets of a single “superkernel”.

**2. Single-Path NAS: **Search Space

## 2.1. Single-Path NAS Search Space

**1st row**: A fixed macro-architecture is used, consist of**blocks 1 to 7.****2nd row**: In each block,**up to 4 layers (MBConv) are used.**Each layer of these blocks follows a mobile inverted bottleneck convolution MBConv micro-architecture.**3rd row**: In each MBConv, it consists of a point-wise (1×1) convolution, a*k*×*k*depthwise convolution, and a linear 1×1 convolution.**Each MBConv layer is parameterized by**, i.e., the kernel size of the depthwise convolution,*k***and by expansion ratio**, i.e., the ratio between the output and input of the first 1×1 convolution.*e*- Each MBConv is denoted as MBConv-
*k*×*k*-*e*. Mobile-efficient NAS aims to choose each MBConv-*k*×*k*-*e*layer, by selecting among different*k*and*e*.

4th row: MBConv layers considerkernel sizes {3, 5}andexpansion ratios {3, 6}. NAS also considers a specialskip-op “layer”, which “zeroes-out” the kernel and feeds the input directly to the output, i.e., the entire layer is dropped.

**2.2. Searching for Kernel Size**

- The weights of the two candidate kernels as
3×3 and*w*5×5.*w* - The weights of the 3×3 kernel can be viewed as the inner core of the weights of the 5×5 kernel, while “zeroing” out the weights of the “outer” shell.
5×5\3×3 is treated as the outer shell of*w*3×3.*w*- The NAS decision is directly encoded into the superkernel of an MBConv layer as a function of kernel weights:

- where 1(.) is the indicator function that encodes the architectural NAS choice. With the group Lasso term used:

- where
*tk*=5 is a latent variable that controls the decision (e.g., a threshold value). Then, the indicator function 1(.) is further relaxed as a sigmoid function,*σ*(.).

**2.3. Searching for Expansion Ratio**

- Similarly, MBConv-
*k*×*k*-3 layer with expansion ratio*e*=3 can be viewed as using one half of the channels of an MBConv-*k*×*k*-6 layer with expansion ratio*e*=6, while “zeroing” out the second half of channels {*w**k*,6\3}. - By “zeroing” out the first half of the output filters as well, it becomes the “skip-op” path, i.e. the residual connection.

- Hence, for input
, the output of the*x**i*-th MBConv layer of the network is:

# 3. Single-Path NAS: **Differentiable Runtime Loss**

- To design hardware-efficient ConvNets,
**the differentiable objective**should reflect both the**accuracy**of the searched architecture and its**inference latency**on the target hardware. Hence, a latency-aware formulation is used:

The first termCEcorresponds to thecross-entropy lossof the single-path model.

The hardware-related termis theRruntime in milliseconds (ms)of the searched NAS model on the target mobile platform.

- Finally, the coefficient modulates the trade-off between cross-entropy and runtime.

The total network latencyof a mobile ConvNet can be modeled asthe sum of each, since the runtime of each operator is independent of other operators:i-th layer’s runtimeRi

- In Single-Path NAS, the target mobile platform (Pixel 1) is used to record the runtime for each candidate kernel operation per layer
*i*, i.e.,*Ri*3×3,3,*Ri*3×3,6,*Ri*5×5,3, and*Ri*5×5,6. - Specifically, the runtime of layer
*i*is defined first as a function of the expansion ratio decision:

- By incorporating the kernel size decision, the total run time is:

- Again, the indicator function is relaxed to a sigmoid function
*σ*(.) when computing gradients. It is shown that the model is accurate, with an average prediction error of 1.76%. - Single-Path NAS follows MnasNet training schedule. Finally, the hardware-efficient ConvNet found by Single-Path NAS:

# 4. Experimental Results

## 4.1. ImageNet classification

**Single-Path NAS achieves better top-1 accuracy than ProxylessNAS by +0.31%**, while maintaining on par target latency of ≤ 80ms on the same target mobile phone.**Single-Path NAS also outperforms methods in this mobile latency range, i.e., better than****MnasNet****(+0.35%), FBNet-B (+0.86%), and****MobileNetV2****(+1.37%).****MnasNet****40k train epochs**.**ChamNet**trains on 240 samples and five epochs each which has a total of**1.2k epochs**.**ProxylessNAS**reports 200× search cost improvement over MnasNet, hence the overall cost is the TPU-equivalent of**200 epochs**.- Finally,
**FBNet**reports**90 epochs**of training on a proxy dataset (10% of ImageNet).

In comparison, Single-Path NAS has a total cost of eight epochs, which is 5,000× faster thanMnasNet, 25× faster than ProxylessNAS, and 11× faster than FBNet.

- By rescaling the networks using a width multiplier, Single-Path NAS model consistently outperforms prior methods under varying runtime settings.
- For instance, Single-Path NAS with 79.48ms is 1.56 faster than the MobileNetV2 scaled model of similar accuracy.

## Reference

[2019 ECML PKDD] [Single-Path NAS]

Single-Path NAS: Designing Hardware-Efficient ConvNets in less than 4 Hours

## Image Classification

[LeNet] [AlexNet] [Maxout] [NIN] [ZFNet] [VGGNet] [Highway] [SPPNet] [PReLU-Net] [STN] [DeepImage] [SqueezeNet] [GoogLeNet / Inception-v1] [BN-Inception / Inception-v2] [Inception-v3] [Inception-v4] [Xception] [MobileNetV1] [ResNet] [Pre-Activation ResNet] [RiR] [RoR] [Stochastic Depth] [WRN] [ResNet-38] [Shake-Shake] [Cutout] [FractalNet] [Trimps-Soushen] [PolyNet] [ResNeXt] [DenseNet] [PyramidNet] [DRN] [DPN] [Residual Attention Network] [DMRNet / DFN-MR] [IGCNet / IGCV1] [Deep Roots] [MSDNet] [ShuffleNet V1] [SENet] [NASNet] [MobileNetV2] [CondenseNet] [IGCV2] [IGCV3] [FishNet] [SqueezeNext] [ENAS] [PNASNet] [ShuffleNet V2] [BAM] [CBAM] [MorphNet] [AmoebaNet] [ESPNetv2] [MnasNet] [Single-Path NAS]