Gluon: building blocks for your Deep Learning universe

Launched in October 2017, Gluon is a new Open Source high-level API for Deep Learning developers. Right now, it’s available on top of Apache MXNet.

Yet another API? Well, not quite. Here are ten reasons why you should take a good look at Gluon.

Image for post
Image for post
Source: Quanta Magazine

1 — Extraordinary documentation

I’m not exaggerating. Calling it documentation doesn’t do it justice: Gluon actually comes with a full-fledged book on Deep Learning!

Concepts, how to implement them from scratch, how to implement them with Gluon, pretty much all network architectures from perceptrons to Generative Adversial Networks… and a ton of notebooks.

VERY impressive work by my colleague Zach Lipton. If you’d like to help him out, I’m sure he’d be happy to review your pull requests ;)

2— Plenty of pre-defined layers and loss functions

Gluon includes an extensive collection of pre-defined layers: from basic ones (Dense, Activation, Dropout, Embedding, etc.) to Convolution (2D, 3D, transposed) to Pooling (average, max and global max in 1D, 2D and 3D).

You’ll also find layers for recurrent networks (RNN, LSTM, GRU), as well as individuals cells. The latter allow you full control over your networks should you need to build them cell by cell.

In addition, you’ll find a collection of experimental features contributed by the Gluon community, such as convolutional recurrent cells.

Last but not least, Gluon also includes a nice collection of loss functions, from basic ones to more advanced ones like the Triplet Loss function used to build face recognition models.

3 — Simple definition of models

For reference, this is how we’d define a simple network with the symbolic API in Apache MXNet.

import mxnet as mx
from mxnet import sym,mod
data = sym.Variable('data')
fc1 = sym.FullyConnected(data, name='fc1', num_hidden=128)
relu1 = sym.Activation(fc1, name='relu1', act_type="relu")
fc2 = sym.FullyConnected(relu1, name='fc2', num_hidden=64)
relu2 = sym.Activation(fc2, name='relu1', act_type="relu")
out = sym.FullyConnected(relu2, name='out', num_hidden=10)
mod = mod.Module(out)

Here’s the same network defined with Gluon. All we have to do is to add layers sequentially.

import mxnet as mx
from mxnet.gluon import nn
net = nn.Sequential()
with net.name_scope():
net.add(nn.Dense(128, activation="relu"))
net.add(nn.Dense(64, activation="relu"))

A bit clearer, isn’t it? :)

4— Automatic shape of input layer

As you can see above, we don’t have to define the input shape when building a network. With Gluon, all we have to do is initialize parameters and forward data to the network.

For instance, this is how we’d apply the network above to a 256-float vector.

net.collect_params().initialize()# Define a random 256-float vector and forward it to the network
data = mx.nd.random_uniform(low=0, high=1, shape=(1,256))
[[ 1.3353475e-03 -1.1403845e-02 8.6122309e-05 1.3773030e-02
9.9888537e-03 6.7939619e-03 -1.8021716e-02 -6.2033422e-03
-1.3288442e-02 1.0132480e-02]]

This is an advantage over Keras where we’d have to build the input shape into the model definition.

from keras.model import Sequential
from keras.layers import Dense
model = Sequential()
model.add(Dense(128, activation='relu', input_shape=(256,)))
model.add(Dense(64, activation='relu'))

5—Intuitive access to network layers and parameters

Gluon makes it intuitive to explore network layers, as well as their parameters.

Here’s how we can iterate through layers.

for layer in net:
Dense(64 -> 128, Activation(relu))
sequential2_dense0_ (
Parameter sequential2_dense0_weight (shape=(128L, 64L), dtype=<type 'numpy.float32'>)
Parameter sequential2_dense0_bias (shape=(128L,), dtype=<type 'numpy.float32'>)
Dense(128 -> 64, Activation(relu))
sequential2_dense1_ (
Parameter sequential2_dense1_weight (shape=(64L, 128L), dtype=<type 'numpy.float32'>)
Parameter sequential2_dense1_bias (shape=(64L,), dtype=<type 'numpy.float32'>)
Dense(64 -> 10, linear)
sequential2_dense2_ (
Parameter sequential2_dense2_weight (shape=(10L, 64L), dtype=<type 'numpy.float32'>)
Parameter sequential2_dense2_bias (shape=(10L,), dtype=<type 'numpy.float32'>)

Reading and writing parameters is equally straightforward.

print("%s %s" % (type(params), params.shape))
<class 'mxnet.ndarray.ndarray.NDArray'> (128L, 64L)params[0][0]=0.123
[[ 0.123 -0.0177393 -0.00650402 ... -0.04026533 -0.04062188
[ 0.05647313 0.0380233 0.01031513 ... 0.0654735 0.04788432
[ 0.02013787 0.01294949 0.02260739 ... -0.0699827 0.01811036
[-0.04240721 0.01670218 0.0533151 ... 0.000951 0.05940091
[-0.00068477 0.00757013 -0.04234412 ... -0.04753195 0.01538438
[-0.01510854 -0.03736208 0.01939485 ... -0.04374463 -0.03795088

6 — Flexible data loading and transformation

The Data API provides convenient methods to load datasets stored in NDArrays (which is how MXNet stores tensors), numpy arrays, RecordIO files and image folders.

train_data =, y),

We can also download popular datasets like MNIST, Fashion MNIST, CIFAR-10 and CIFAR-100.

train_data =

Transformations can be applied at loading time by providing a transform function. For example, here’s how we would normalize pixel values for the MNIST dataset.

def transform(data, label):   
return data.astype(np.float32)/255, label.astype(np.float32)
train_data =

7 — Rich model zoo

The Gluon model zoo is more complete than its counterparts in Apache MXNet, Keras and PyTorch.

At the time of writing, you can grab pre-trained versions of AlexNet, DenseNet, Inception V3, ResNet V1, ResNet V2, SqueezeNet, VGG and MobileNet, in multiple depths and configurations.

All of these will come in handy for transfer learning and fine-tuning. Downloading a model couldn’t be simpler.

from mxnet.gluon.model_zoo import vision
net = vision.squeezenet1_1(pretrained=True)

8 — Imperative-style execution

In traditional Deep Learning frameworks like Tensorflow and Apache MXNet, network definition and training run in symbolic mode (aka define-then-run).

Here’s a typical example using Apache MXNet.

# Define network with symbolic API
mod = mod.Module(out)
# Train network
label_shapes=iter.provide_label), num_epoch=50)

There are good reasons for doing this! Since a symbolic network is pre-defined, its execution graph can be optimized for speed and memory prior to training, then run with highly-efficient C++ primitives: all of this makes it more efficient than its imperative counterpart written in Python. However, it comes at the expense of flexibility (networks cannot be modified) and visibility (networks are hard / impossible to inspect).

In contrast, Gluon relies exclusively on imperative (aka define-by-run) programming: network definition and training loop are based on Python code, allowing us to use all language features (loops, conditional execution, classes, etc.) for maximal flexibility.

To illustrate this, here’s a typical training loop.

for e in range(epochs):
cumulative_loss = 0
for i, (data, label) in enumerate(train_data):
data = data.as_in_context(model_ctx)
label = label.as_in_context(model_ctx)
with autograd.record():
output = net(data)
loss = softmax_cross_entropy(output, label)
cumulative_loss += nd.sum(loss).asscalar()

Thanks to imperative programming, it’s possible to debug every step of the training process: inspecting parameters, saving them to disk, tweaking them if certain conditions happen, etc. Even inside of Jupyter notebooks, we can use the Python debugger by inserting a single line of code. This is invaluable when trying to understand why training goes wrong.

import pdb; pdb.set_trace()

9 — Combining custom objects and built-in objects

Gluon makes it very easy to define your own objects. Here’s a class for a multi-layer perceptron. Once again, the imperative programming style allows us to define the forward() operation exactly the way we want it: we could apply conditional processing based on network parameters, number of epochs, etc.

class MLP(Block):
def __init__(self, **kwargs):
super(MLP, self).__init__(**kwargs)
with self.name_scope():
self.dense0 = nn.Dense(128)
self.dense1 = nn.Dense(64)
self.dense2 = nn.Dense(10)

def forward(self, x):
x = nd.relu(self.dense0(x))
x = nd.relu(self.dense1(x))
return self.dense2(x)

We can also define custom layers, as highlighted by this example taken from the Gluon documentation. As you can see, they can be seamlessly integrated with the rest of the Gluon API, so we still rely on existing objects to make our life easier.

class CenteredLayer(Block):
def __init__(self, **kwargs):
super(CenteredLayer, self).__init__(**kwargs)
def forward(self, x):
return x - nd.mean(x)
net = nn.Sequential()
with net.name_scope():
net = nn.Sequential()

10 — Flexibility and speed: pick two

We discussed earlier the benefits of imperative programming while noting that performance would be inferior to symbolic programming.

Let’s run a quick test by predicting 1,000 MNIST images with this simple multi-layer perceptron (for the sake of brevity, I’ll just show the network definition)

net = nn.Sequential()
with net.name_scope():
net.add(nn.Dense(256, activation="relu"))
net.add(nn.Dense(128, activation="relu"))
# initialize the parameters
return net

Total prediction time is 0.37 second.

Now, let’s change replace the Sequential object with its hybrid equivalent. This will allow Gluon to compile the network to symbolic form and to use optimized lower-level primitives.

net = nn.HybridSequential()
with net.name_scope():
net.add(nn.Dense(256, activation="relu"))
net.add(nn.Dense(128, activation="relu"))
# initialize the parameters
return net

This time, total prediction time is 0.21 second, almost 2x faster. Is there a catch? Well, yes: you lose the flexibility to write a custom forward() function as well as the ability to debug it. Still, once you’ve successfully built and trained a network, hybridizing it is a easy way to improve inference performance.

For reference, let’s run the same test with the symbolic API of Apache MXNet.

data = mx.sym.Variable('data')
data = mx.sym.Flatten(data=data)
fc1 = mx.sym.FullyConnected(data=data, name='fc1', num_hidden=64)
act1 = mx.sym.Activation(data=fc1, name='relu1', act_type="relu")
fc2 = mx.sym.FullyConnected(data=act1, name='fc2', num_hidden = 64)
act2 = mx.sym.Activation(data=fc2, name='relu2', act_type="relu")
fc3 = mx.sym.FullyConnected(data=act2, name='fc3', num_hidden=10)
mlp = mx.sym.SoftmaxOutput(data=fc3, name='softmax')

Prediction time is 0.16 second, more than 30% faster than the hybridized version. When top speed is required — for inference and even more so for training — the highly-optimized primitives of MXNet remains the best option.


Gluon has a lot going for it. I think it improves on symbolic MXNet and even on Keras in several respects . The documentation and the model zoo alone are worth the price of admission, especially if you’re beginning with Deep Learning. Go try it out and tell me what *you* think :)

As always, thanks for reading. Happy to answer questions here or on Twitter.

Subatomic particles, gamma rays, black holes, lightspeed. Proper Metal material \m/

Written by

Hacker. Headbanger. Harley rider. Hunter.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store