Evolutionary framework for searching backbone architecture — Evolly

7 min readApr 14, 2023

While developing a re-identification model to find election violators who vote multiple times on different polling stations (based on outfit, without using faces), our team came to an idea of creating a flexible and an easy to use framework for searching backbone architecture of the deep neural network (NAS) using evolutionary approach.

Evolly’s design was inspired by the EvoPose2D paper where neuroevolution with weight transfer was introduced. Expanding the basic idea of neuroevolution with additional features allowed us to create a simple and easy to implement python NAS framework.

Evolly leverages NAS approach powered with evolutionary algorithm (EA) and our ideas to make running evolution as flexible as possible:

Pass custom architecture blocks
Set multiple branches (stems) of different data types
Define custom backbone depth and width
Utilize common DL frameworks (tensorflow, torch)
Choose parameters to mutate
Customize allowed values and intervals of the mutations
Run training in distributed or parallel mode
Estimate search space size
Monitor evolution via TensorBoard
Visualize results

Target metric (mAP@5) visualization during evolution

Basic concept

The idea of the neural network architecture search is to automate the model designing process. Most of the well known architectures, e.g. ResNet, were designed manually. But if we apply NAS methods for optimizing the ResNet backbone, we will get even higher accuracy results than during the manual backbone design (as shown in EvoPose2D paper).

Existing neural architecture search libraries supporting EA (NASLib, aw_nas, KerasGA and others) are built on top of a single deep learning framework which makes it difficult to customize model for your needs.

Evolly aimed to solve backbone tuning and backbone search tasks:

Currently Evolly supports image and pose data types:

Image data. 3 dimensional array in [height, width, channels] format
Pose data. 2 or 3 dimensional array of coordinates, angles, distances, etc. Probably, you can put anything in this input (not only pose data), but it must fit in 2 or 3 dimensions.

Basic concept of the framework can be visualized as follows:

First of all we should start from making a custom training pipeline that will:

Build a model from genotype
Train and evaluate model
Compute model’s fitness
Save metadata to json

All evolution and model variables should be stored in the config file. Then, we need to initialize the Evolly object and set the required arguments.

After completing the preparation stage, we can move to starting the evolution stage on any available hardware (GPUs / TPUs / CPUs) in sequential / parallel / distributed mode. Evolution progress and training logs available in the TensorBoards’s web interface.

Visualized evolution progress in TensorBoard

When evolution will be stopped, analyze_runs method should be launched to find the best model. Inside the runs directory we can retrieve that model by ID and get the cfg (backbone architecture stored here) with corresponding weights.

Evolly’s Tensorflow and Torch usage example can be found here. Follow the getting started guide to make your first evolution.

Model building

Inside your custom training pipeline you should call the “build_model” function for creating a model object. It also allows you to customize the model as you want:

Create multiple branches of any width and depth with different inputs
Use built-in or create initial and output (head) layers
Define forward pass stage (torch only)

Genotype (raw representation stored in cfg.genotype.branches) consists of single or multiple branches where each branch is a list of blocks. Each block contains the following data: block_id, depth_id, block_type, kernel_size, strides, filters_out, dropout.

Evolly puts blocks sequentially between initial and head layers in ascending order of depth_ids.

Backbone with a single branch consisting of 3 blocks. N is a batch size

There can be multiple blocks at the same depth level whose outputs will be concatenated into a single output by the last axis (filters).

Backbone with two blocks on the third depth level

Inside each default block we’ve added one convolution with same padding and changeable stride argument to control outputs dimensions:

If strides equal to 2, all dimensions except filters will be reduced by 2 times
If strides equal to 1, all dims except filters will be the same

In case you would like to create custom blocks, don’t forget to add the ability to downsample dimensions. Otherwise, you can implement a separate block type that will have this operation.

Evolly has four default image blocks and three default pose blocks.

You can implement a block of any existing architecture (CNNs, RNNs, transformers — you name it) and pass it to Evolly to enhance your model’s accuracy.

Using custom backbone’s head layers you can solve any deep learning task you want:

Object detection
Instance or semantic segmentation
GANs
Pose estimation
Action recognition
Re-Identification
Super resolution
Depth estimation
etc.

Evolution

Evolution starts from training an ancestor model. If you are going to utilize weight transfer between parent and child models, we would recommend you to train ancestor model on a larger dataset to achieve better accuracy.

When the ancestor model is trained, an infinite evolutionary cycle starts. Each cycle represents one generation:

Parse metadata json files
Find the best P parents based on fitness metric
Mutate C unique child models from these parents
Train mutated child models in specified mode:

Sequential — train models one by one on a single GPU
Parallel — train models on N different accelerators in parallel
Distributed — utilize multiple accelerators to train each model

Return to step 1

Single evolution cycle (generation) example with: parents = 2, children = 4, number of GPUs = 4, enabled parallel training mode.

Once the training of the model has been completed, fitness metric must be computed using following formula (with the built-in compute_fitness function):

Where “w” is an argument controlling trade-off between validation metric and number of parameters in the model.

Fitness function allows Evolly to control the number of model’s parameters during evolution — as it sorts and picks “top” parent models based on fitness value.

Use case

Our team used Evolly to optimize the backbone architecture of our re-identification model and achieved 4 times less training parameters than the SOTA solution (CTL).

Future improvements

To make Evolly better we would suggest following improvements:

Add new data types

Additional data types should be added to increase the framework’s applicability:

Numerical data (time series, categorical data, graphs, etc.)
NLP data (audio, text)

Utilize mutation rate and add mutation probabilities

Mutation rate will allow Evolly to manipulate the number of the mutations per child. If it’s high — multiple mutations can be applied to the child model. If it’s low — just one mutation. Mutation rate shouldn’t be constant during evolution.

It’s also possible to add probabilities to each mutation type in order to control the chance of making such a mutation.

Implement reinforcement learning

Reinforcement learning model (RL) will give Evolly ability to control mutation rate and mutation probabilities depending on:

Previous mutations
Accuracy and metadata of trained models
Overall evolution progress

We should also test hypothesis when the RL model operates mutations directly, without changing probabilities.

Upgrade branch connections

At the moment all branches are connecting after the last backbone block. But this approach has lower accuracy compared to connecting branches in earlier blocks (as we figured it out after various tests).

Current approach of merging branches outputs. N is a batch size

For example, we tested merging pose and image branches. Pose branch consisted of multiple convolutional blocks whose outputs were transformed to shape [N, 32, 16, 1] and multiplied by the outputs of the sixth image branch block. This connection approach had better accuracy than connecting pose branch outputs to the last image branch block.

General idea of this approach is that the [N, 32, 16, 1] tensor should be a mask that represents data containing in pose inputs. And by applying multiple Conv2DTranspose layers we reconstruct that mask from the embedding which we got after the FullyConnected layer.

We tested the proposed approach manually but didn’t implement it into the build_model function.

Searching for a new architecture types

Currently Evolly searches for the whole model’s backbone, but what if we will add a feature to explore new block architectures by manipulating all available in DL framework’s layers, operations, connections and arguments of each function instead of optimizing blocks?

Obviously, it will drastically increase search space, but leveraging previous improvements with reinforcement learning, this idea looks promising.

Let’s make Evolly better together!

Discussed improvements will enhance basic Evolly’s functionality. We hope that together we will achieve significant results in deep learning by exploring new block and model architectures.

If you are interested in cooperating or investing in our team, please contact us. We are open for any opportunities: revisorteam@pm.me