A sneak peek into Dynamic Neural Networks

Published in

Wassa

6 min readNov 7, 2017

Define-by-run frameworks are relatively new (2015 for Chainer) but they begin to be well known. I’m writing this post, after some questions were asked on GitHub about new imperative frameworks. With the release of Gluon (Mxnet API for dynamic neural network) a month ago, we see people asking why another deep learning API, why don’t they end their Keras v2 compliance etc…

I realize not everyone knows the difference between define-by-run frameworks and define-and-run frameworks, especially new users. With the release of Gluon API (Mxnet) a month ago and the pre-release of Eager API (Tensorflow), I think it’s time for a clarification. I will not explain define-by-run framework designs or how they work. This is practical guide on what are pros of these networks. Disclaimer, that will not be a fair comparison, I will just speak about the pros. At the beginning of my journey in deep learning (years ago), and like many new users, I didn’t see the advantages of these frameworks and I sometimes did not even understand them. Just a note in this post I differenciate Mxnet (the symbolic Mxnet API) and Gluon (the imperative Mxnet API).

So, we will first begin with an exhaustive list of deep learning frameworks, define-by-run frameworks (Chainer, Pytorch, Gluon, DyNet…) and Define-and-run frameworks (Tensorflow, Keras, Mxnet, Theano (rip)…)

From my point of view, there are four points where define-by-run frameworks are better than define-and-run. An easier way to debug neural network architecture, an easy way to understand how to define a model, a better flexibility and, in my opinion, more expressiveness. These points don’t always come from imperative vs symbolic, some of them are more API design.

The three dynamic frameworks I tested (Pytorch, Chainer and Gluon) had those advantages. Some of these points don’t come from imperative particularity but there are de facto for them. For example, the Callable network is possible in Keras, it’s one feature that allows an easy network creation.

Debugging

The first point push forward by Define-by-run frameworks is debugging. It’s easier to debug neural network with imperative logic than symbolic one. But to be honest, as a new user, I don’t correctly perceive this aspect.

If you compare debugging on small network like LeNet and only for “hello world” program class like MNIST, you will not see the advantage of debugging. This aspect, in my opinion, only appears with complex structure or when the issue was not straight forward. For example, if your network doesn’t converge it’s easier to inspect gradient or the output dimension. But these are not issues when using small or already working networks. As an end user, these issues never appeared as long as you don’t try to adjust layer structure, change hidden layer, the number of feature etc. …

With dynamic neural networks you can explore your network step by step and monitor what happens in each layer with your regular debugger.

There is no difference between network debugging and debugging the rest of your program.

Model definition

Now almost all frameworks have user-friendly network definition. The python API helps a lot with this point. But with the difference that define-by-run network can infer channel dimension of your previous layer.

Here is an example with the Gluon API. We define a simple LeNet Network.

In this definition, the number of output features don’t need to be define by the user. For example, in Keras you need to define input shape and the layer will be estimated from this. On this example, the number of features will be defined at forward computation. This allows to easily define network with less parameters.

Another thing I find awesome is network combination. Callable network is one the features that allows an easy network creation. This feature is already implemented in Keras. For define-by-run framework it is a de facto feature.

After this definition and some initialization, you can simply call them like that:

It’s easy to do Siamese network, network in concurrence and more complex structures.

Flexibility

Define-and-run frameworks use an immutable network. As the computation graph is statically defined, the Control-flow needs to be defined as part of this graph.

If you want to introduce complex structures like recursion conditions or even loops, it must be done in an indirect manner. This method is different than the standard control flow from regular imperative languages. In define-by-run you can use regular python condition, loop, control flow those you already know. Symbolic frameworks introduce a compilation step between definition and execution. This compilation can obfuscate what happens in your network and add complexity in network construction.

For example, if you try to define a shared layer, like in a Siamese network.
You need to think of where to store your weight, how to initialized them, how to update them. Otherwise, your final network will not be what you wanted.
Define-by-run doesn’t have this additional step, so network behavior is more like regular python class/function.

Expressiveness

This point is more an API design than an advantage of imperative frameworks.
But as most of them (PyTorch, Gluon) take inspiration of Chainer design, the three of them show the same level of abstraction.

An example on the Mxnet high level API. The training part in Mxnet is defined and executed by the fit function. Here is an example, the full code can be found here

The fit function is easy to use and does lots of things.
In one line you compile the network, initialize it, define the optimizer, evaluation metric etc…
But if you want to change things inside iterations, monitor the gradient or something else you need to redefine callback, metric etc. for that you need.
This can be tricky and you need to know exactly how these callbacks work.
That demands lot of work, for some time just monitoring or debugging.

On the other side Gluon API is maybe more complex at first glance but brings more expressivity.
Full code can be found here

Conclusion

In my opinion, the hype on define-by-run frameworks comes from a good balance between a good API design and a better flexibility.
A good API allows to easily understand what happens, even if you don’t know the framework.
I would say that Keras has a good API for example.

If you take one of the easiest API in symbolic frameworks, we can describe it as “easy to learn, hard to master”.
Whereas, Define-by-run frameworks are a bit harder to learn but they allow complex network creation with serenity.

This good API design is combined with a better control on what happen in training loop and what happens in the network. So, yes, imperative networks deserve the hype they’re getting. The advantages from this kind of frameworks are not visible at the first glance. The additionnal flexibility brought by Chainer, Pytorch or Gluon allow to design complex networks easily, with better debugging functionnality and with more control.

Dynamic frameworks:
- can be used as drop-in replacement for Numpy
- are fast for prototyping
- are easy to debug and use conditional flows

Define-by-run frameworks are fast-growing. And I think now is a good time to try them out! And cherry on the cake, Chainer, PyTorch and Gluon have similar API (inspired by Chainer). So, it’s simple to understand or test others if you already known one.
If you want to compare these frameworks, and much more, you can check this repository (not mine), Ilia Karmanov listed simple example (CNN on CIFAR and RNN on IMDB) with lots DL frameworks.

Find us on: