Simple, Powerful, and Fast— RegNet Architecture from Facebook AI

Understanding the core architecture of RegNet from Facebook AI

Shreejal Trivedi
VisionWizard
9 min readJun 18, 2020

--

Getting to know about the new 2020 version of ResNet /ResNeXtRegNet from Facebook AI.

This article will mainly focus on the architectural design of RegNet mentioned in paper Designing Network Design Spaces¹.

After finishing this blog, you will get to know the core skeleton of RegNet Architecture and its different network families viz. RegNetX and RegNetY.

TABLE OF CONTENT

Step 1 — Generic ResNet Architecture

Step 2 — Making of AnyNet Population Models

Step 3 — Making of RegNet Models: RegNetX and RegNetY

Step4 — Pytorch Implementation of RegNetX/RegNetY Models

Step 1 — Generic ResNet Architecture

Let’s quickly refresh the general structure of ResNet. This will help us to generate the different AnyNet Models that are mentioned in the paper.

Fig. 1 Baseline Architecture of ResNet
  • As shown in Fig. 1 ResNet architecture consists of a Stem Block, Layer Block, and a Head Block.

Stem —

  • It is a Convolutional + BN + ReLU Block with stride = 2 and filter size = 3. The number of output filters is 32/64 based on the requirements(Red Block in Fig 1).

Layer —

Fig.2 Chains of residual blocks in a Layer Block(Yellow color block in Fig. 1)
  • It consists of chains of residual blocks(blue color in Fig. 2). Let us denote the number of blocks in a layer as depth = d. The number of channels in each layer remains constant throughout a particular Layer Block. It is denoted by width = w, in the following paper.
  • Each layer takes an input feature map of W1×R×R(The first block of every layer converts channels W1 → W2 as shown in Fig. 2 and then all the block will output the same number of channels — W2) and outputs W2×R/2×R/2 feature map as shown in Fig. 2.
  • Residual Block(Blue Color in Fig. 2) follows either Simple or Bottleneck structure as shown in the below-given figure.
Fig. 3 Simple(Left) and Bottleneck(Right) structure representation present in any residual block(Blue color in Fig. 2)

Downsample structure is only used in the first residual block of every layer. All other layers follows identical Bottleneck Structure shown in Fig. 3(Top Right).

NOTE: We will only follow the bottleneck structure from now onwards, as they were taken into consideration for baseline architecture of AnyNet.

  • As shown in Fig. 3 (Right), two variables are listed. One is a bottleneck ratio b and convolutional group size g.
  • Bottleneck ratio is used to reduce the number of channels of input feature map and group size is used to do the parallel group convolutions(as mentioned in this paper).

Head —

  • It contains a simple structure of AveragePool2D and a fully connected layer. This is basically the classification part of the model.

NOTE: Remember these parameters i.e depth d, width w, bottleneck ratio b, and group size g. Tweaking of these parameters will help us in making different AnyNet models.

Step 2. Making of AnyNet Population Models

This step is to build-up the baseline structures of AnyNet.

  • In the paper Designing Network Design Spaces¹, they considered the generic skeleton of ResNet Architecture to understand and design the different populations of AnyNet models.
  • Four different parameters as mentioned in Step 1 can be changed to generate different architectures of AnyNet. Configuration of parameters with their input values is mentioned below.

Out of all possible combinations, they generate best N(=500 in the paper) models for each AnyNet family by log uniform sampling on given four parameters.

#Configuration input set for four different parameters.Depth d = [1, 2, 3, 4, 5, ... , 16] #Total = 16
Width w = [8, 16, 24, ..., 1024] #Multiples of 8 <= 1024; Total=128
Bottleneck Ratio b: [1, 2, 4] #Total = 3
Group Size g: [1, 2, 4, 8, 16, 32] #Total = 6

Total degrees of freedom present in AnyNetX Models is 16(Total of four layers and each layer can be tweaked with four different parameters mentioned in the snippet).

  • There are five AnyNet models that are mentioned in the paper. Let’s go through them one-by-one.

1. AnyNetXA

  • AnyNetXA is the unconstrained generic ResNet Architecture(Step 1.) with all possible values of four parameters.
  • Total number of structures possible in AnyNetXA = (16 * 128 * 3 * 6)⁴ which is ~10¹⁸ structures.

2. AnyNetXB

  • AnyNetXB can be constructed by taking the AnyNetXA following b(Layer 1) =b(Layer 2) =b(Layer 3) =b(Layer 4)
  • Total number of structures possible in AnyNetXB = (16 * 128 * 6)⁴ * 3 which is ~10¹⁶ structures.

3. AnyNetXC

  • AnyNetXC can be constructed by taking the AnyNetXB following g(Layer 1) =g(Layer 2) =g(Layer 3) =g(Layer 4).
  • Total number of structures possible in AnyNetXC = (16 *128)⁴ * 3 * 6 which is ~10¹⁴ structures.

4. AnyNetXD

  • AnyNetXD can be constructed by taking the AnyNetXC following w(Layer 1) ≤w(Layer 2) ≤w(Layer 3) ≤w(Layer 4).
  • Total number of structures possible in AnyNetXD = (16 * 128)⁴ * 3 * 6 / 4! which is ~10¹³ structures.

5. AnyNetXE

  • AnyNetXE can be constructed by taking the AnyNetXD following d(Layer 1) ≤ d(Layer 2) ≤ d(Layer3)≤d(Layer 4)(Not always for the last layer).
  • Total number of structures possible for AnyNetXE = (16 * 128)⁴ *3 *6 / (4!)² which are ~10¹¹ structures.

Keeping some constraints on the four parameters, it helped the authors to reduce the design space by O(10⁷) from unconstrained AnyNetXA model.

Step 3. Making of RegNet Models — RegNetX and RegNetY

This step will is to build the RegNetX and RegNetY architectures.

  • Authors consider AnyNetXE family of models for the generation of RegNetX and RegNetY based on some experimental basis.
  • After rigorous ablation studies of the accuracy gains by plugging in different values of four parameters in AnyNetXE, they found similar trends and set of equations that help to find the best fit RegNetX model with given input configurations mentioned below.

— Initial Width w0: This is the width of the first layer in ResNet architecture.

— Slope Parameter wa: Parameter to find the linear trend of best fit models

— Quantization Parameter wm: Parameter to quantize the linear trend.

— Network Depth D: Sum of depths of all layers di{i=1, 2, 3, 4}

— Group and Bottleneck ratio g and b

#Configuration Domain for RegNetX Model Input ParametersNetwork Depth D = {1, 2, ..., 63} OR {12, 13, ..., 28} 
Slope Parameter wa = {0, 1, 2, ..., 255}
Quantization Parameter wm = [1.5, 3]
Initial Width w0 > 0
Bottleneck Ratio b = 1
Group Width g = {1, 2, 4, 8, 16, 32} OR {16, 24, 32, 40, 48, 56, 64}

Generation of RegNetX Model

NOTE: We will not go into the details of the equation as this blog is meant to explain the basic skeleton of Regnet architecture. To get more details, you can always read the paper in more depth).

  • We need the depth d and width w of all the four layers to complete the architecture of RegNet. So how to generate these? Yes, you just have to follow three small equations for the generation of the given lists.
  • Let us calculate these values by taking one small example.
#Input Parameters ListNetwork Depth D = 13
Slope Parameter wa = 36
Initial width w0 = 24
Quantization parameter wm = 2.5
Bottleneck ratio b = 1
Group Width g = 8
  1. Finding possible widths u for the given slope wa and initial width parameter w0.
Eq. 1 Parameterized widths equation(Source: 1).
import numpy as np
u = w0 + wa * np.arange(D) # Equation 1
print(u)
# Output
[ 24 60 96 132 168 204 240 276 312 348 384 420 456 492 528]

2. Finding possible block size s based on the calculated possible width u, initial width w0, and quantized parameter wm.

Eq. 2 Parameterized blocks equation(Source 1).
s = np.log(u / w0) / np.log(wm) # Equation 2
print(s)
# Output
[ 0. 0.7564708 1.51294159 1.7564708 2.12368202 2.26941239
2.51294159 2.61695899 2.79927458 2.88015281 3.02588319 3.09204627
3.21343311 3.26941239 3.37342979]

3. Finding the quantized widths w by rounding the parameterized block sizes s. We also have to make sure whether quantized widths w are divisible by 8.

Eq. 3 Quantized widths w from possible widths u(Source 1)
s = np.round(s) #Rounding the possible block sizes s
w = w0 * np.power(wm, s) # Equation 3
w = np.round(w / 8) * 8 # Make all the width list divisible by 8
print(w)
#Output
[ 24. 64. 152. 152. 152. 152. 376. 376. 376. 376. 376. 376.
376. 376. 376.]

4. Find the depth list d. For this, just calculate the number of occurrences of each quantized width w. Also to obtain the final width list w, consider only unique values.

#Finding final width of depth list
w, d = np.unique(w.astype(np.int), return_counts=True)
print("Width list w: ", w)
print("Depth list d: ", d)
#Output
Width list w: [ 24 64 152 376]
Depth list d: [1 1 4 7]

5. Check whether the generated width list w is a multiple of group size g. Due to the bottleneck ratio b, generated widths may be incompatible. So, to correct them, just follow the given steps.

gtemp = np.minimum(g, w//b)w = np.round(w // b / gtemp) * gtemp #To make all the width compatible with group sizes of the 3x3 convolutional layersg = np.unique(gtemp * b)[0]print("Revised width list w: ", w) 
print("Revised group size g: ", g)
#OutputRevised group size w: [24 64 152 376]
Revised group size g: 8

6. Hola... You have generated the final lists of the widths w and depth d. Now plug in the values of w, d, b and, g into generic ResNet Architecture to obtain your RegNetX model.

  • As the number of width and depths increases, so will be the compute time of the model. The model which we just generated is RegNetX-200MF. Here 200MF refers to 200 Million FLOPs.
  • You can see different RegNetX-ABC MF/GF mentioned in the paper.

Generation of RegNetY Model

  • The difference between RegNetX and RegNetY model is the addition of the Squeeze and Excitation Layer. RegNetY = RegNetX + SE
  • After every 3x3 Convolutional Layers in a residual block of ResNet Architecture, SE Attention Module is attached as shown in the below-given figure.
Fig. 4 SE Module in a residual block.
  • SE Module consists of two 1x1 Convolutional layers and a new parameter se-ratio q. SE-Ratio is used to decrease the number of input filters by a factor of q.
  • The squeeze and Excitation module can be visualized from the figure given below.
Fig. 5 SE Module used in RegNetY
  • All the generation steps for RegNetY are the same as RegNetX. The only new parameter added is the se-ratio q. Range: 0≤q ≤1.
  • You can get good information about Attention Modules from my blog.

Step4 — Pytorch Implementation of RegNetX/RegNetY Models

Code Link: https://gist.github.com/shreejalt/2c499be21f45ff404f9fe964d24795cb

I have not covered the design space algorithm and ablation studies that they did to come up with the optimized sub space of the models. I would recommend that you go through the paper for getting the taste of the same.

Also, if you want to get a quick gist of paper you can go through the medium article by Chris Ha.

I hope you found the content meaningful. If you want to stay updated with the latest research in AI, please follow us at VisionWizard

--

--

Shreejal Trivedi
VisionWizard

Deep Learning || Computer Vision || AI || Editor — VisionWizard