Neural Networks Building Blocks

Neural networks are made of smaller modules or building blocks, similarly to atoms in matter and logic gates in digital circuits.

Once you know what the blocks are, you can combine them to solve a variety of problems. Many new building blocks are added constantly, as smart minds discover them

What are these blocks? How can we use them? Follow me…

Weights

Weights implement the product of an input i with a weight value w, to produce an output o. Seems easy, but addition and multiplications are at the hearth of neural networks.

weights implement the product of an input i with a weight value w to give an output o

Neurons

Artificial neurons are one basic building block.

neuron, inputs i, output o, weight w

They sum up some inputs i and use a non-linearity (added as separate block) to output (o) a non-liner version of the sum.

Neurons use weights w to take inputs and scale them, before summing the values.

Identity layers

Identity layers just pass the input to the output. Seem pretty bare, but they are an essential block of neural networks.

identity: i=o

See below why. You will not be disappointed!

Non-linearities

http://pytorch.org/docs/master/nn.html#non-linear-activations

They take the inner value of a neuron and transform it with a non-liner function to produce a neuron output.

non-linear module

The most popular are: ReLU, tang, sigmoid.

Linear layers

http://pytorch.org/docs/master/nn.html#linear-layers

These layers are an array of neurons. Each take multiple inputs and produce multiple outputs.

linear layer: input i and output o are vectors of any and different length. A weight matrix (left) and how it may be fully connected (right) show different version of the same linear layer.

The input and output number is arbitrary and of uncorrelated length.

These building blocks are basically an array of linear combinations of inputs scaled by weights. Weights multiply inputs with an array of weight values, and they are usually learned with a learning algorithm.

Linear layers do not come with non-linearities, you will have to add it after each layer. If you stack multiple layers, you will need a non-linearity between them, or they will all collapse to a single linear layer.

Convolutional layers

http://pytorch.org/docs/master/nn.html#convolution-layers

They are exactly like a linear layer, but each output neuron is connected to a locally constrained group of input neurons.

convolution is a 3d matrix operation: an inputs cuboid of data is multiplied by multiple kernels to produce an output cuboid of data. These are just a lot of multiplication and additions

This group is often called receptive-field, borrowing the name from neuroscience.

Convolutions can be performed in 1D, 2D, 3D… etc.

These basic block takes advantage of the local features of data, which are correlated in some kind of inputs. If the inputs are correlated, then it makes more sense to look at a group of inputs rather than a single value like in a linear layer. Linear layer can be thought of a convolutions with 1 value per filters.

Recurrent neural network

http://pytorch.org/docs/master/nn.html#rnn

Recurrent neural networks are a special kind of linear layer, where the output of each neuron is fed back ad additional input of the neuron, together with actual input.

RNN: outputs or an additional state are fed back (values at time t-1) as an input together with input i (at time t)

You can think of RNN as a combination of the input at instant t and the state/output of the same neuron at time t-1.

RNN can remember sequences of values, since they can recall previous outputs, which again are linear combinations of their inputs.

Attention modules

Attention modules are simply a gating function for memories. If you want to specify which values in an array should be passed through attention, you use a linear layer to gate each input by some weighting function.

Attention module: an input i vector is linearly combined by multiplying each value by a weight matrix and producing one or multiple outputs

Attention modules can be soft when the weights are real-valued and the inputs are thus multiplied by values. Attention is hard when weight are binary, and inputs are either 0 or passing through. Outputs are also called attention head outputs. More info: nice blog post and another.

Memories

A basic building block that saves multiple values in a table.

memory, input i, ouput o, optional addressing a

This is basically just a multi-dimensional array. The table can be of any size and any dimensions. It can also be composed of multiple banks or heads. Generally memory have a write function writes to all locations and reads from all location. An attention-like module can focus reading and writing to specific locations. See attention module below. More info: nice blog post.

Residual modules

Residual modules are a simple combination of layers and pass-through layers or Identity layers (ah-ah! told you they were important!).

residual module can combine input with a cascade of other layers

They became famous in ResNet: https://arxiv.org/abs/1512.03385, and they have revolutionized neural networks since.

LSTM, GRU, etc.

http://pytorch.org/docs/master/nn.html#recurrent-layers

These units use an RNN, residual modules and multiple Attention modules to gate inputs, outputs and state values (and more) of an RNN, to produce augmented RNN that can remember longer sequence in the future. More info and figures: great post and nice blog post.

Neural networks

Now the hard part: how do you combine these modules to make neural networks that can actually solve interesting problems in the world?

So far it has been human smart minds that architected these neural network topologies.

But why, did you not tell me that neural network were all about learning from data? And yet the most important things or all, the network architecture is still designed by hand? What?

This list will grow, stay tuned.

PS1: I do wish some smart minds can help me to find automatic ways to learn neural network architectures from data and the problem they need to solve.

If we can do this, we can also create machines that can build their own circuits topologies, and self-evolve. This would be a blast. And my work would be done. Ah, I can finally enjoy life!

About the author

I have almost 20 years of experience in neural networks in both hardware and software (a rare combination). See about me here: Medium, webpage, Scholar, LinkedIn, and more…