Neural Networks Building Blocks
Neural networks are made of smaller modules or building blocks, similarly to atoms in matter and logic gates in digital circuits.
Once you know what the blocks are, you can combine them to solve a variety of problems. Many new building blocks are added constantly, as smart minds discover them
What are these blocks? How can we use them? Follow me…
Weights implement the product of an input i with a weight value w, to produce an output o. Seems easy, but addition and multiplications are at the hearth of neural networks.
Artificial neurons are one basic building block.
They sum up some inputs i and use a non-linearity (added as separate block) to output (o) a non-liner version of the sum.
Neurons use weights w to take inputs and scale them, before summing the values.
Identity layers just pass the input to the output. Seem pretty bare, but they are an essential block of neural networks.
See below why. You will not be disappointed!
They take the inner value of a neuron and transform it with a non-liner function to produce a neuron output.
The most popular are: ReLU, tang, sigmoid.
These layers are an array of neurons. Each take multiple inputs and produce multiple outputs.
The input and output number is arbitrary and of uncorrelated length.
These building blocks are basically an array of linear combinations of inputs scaled by weights. Weights multiply inputs with an array of weight values, and they are usually learned with a learning algorithm.
Linear layers do not come with non-linearities, you will have to add it after each layer. If you stack multiple layers, you will need a non-linearity between them, or they will all collapse to a single linear layer.
They are exactly like a linear layer, but each output neuron is connected to a locally constrained group of input neurons.
This group is often called receptive-field, borrowing the name from neuroscience.
Convolutions can be performed in 1D, 2D, 3D… etc.
These basic block takes advantage of the local features of data, which are correlated in some kind of inputs. If the inputs are correlated, then it makes more sense to look at a group of inputs rather than a single value like in a linear layer. Linear layer can be thought of a convolutions with 1 value per filters.
Recurrent neural network
Recurrent neural networks are a special kind of linear layer, where the output of each neuron is fed back ad additional input of the neuron, together with actual input.
You can think of RNN as a combination of the input at instant t and the state/output of the same neuron at time t-1.
RNN can remember sequences of values, since they can recall previous outputs, which again are linear combinations of their inputs.
Attention modules are simply a gating function for memories. If you want to specify which values in an array should be passed through attention, you use a linear layer to gate each input by some weighting function.
Attention modules can be soft when the weights are real-valued and the inputs are thus multiplied by values. Attention is hard when weight are binary, and inputs are either 0 or passing through. Outputs are also called attention head outputs. More info: nice blog post and another.
A basic building block that saves multiple values in a table.
This is basically just a multi-dimensional array. The table can be of any size and any dimensions. It can also be composed of multiple banks or heads. Generally memory have a write function writes to all locations and reads from all location. An attention-like module can focus reading and writing to specific locations. See attention module below. More info: nice blog post.
Residual modules are a simple combination of layers and pass-through layers or Identity layers (ah-ah! told you they were important!).
They became famous in ResNet: https://arxiv.org/abs/1512.03385, and they have revolutionized neural networks since.
LSTM, GRU, etc.
These units use an RNN, residual modules and multiple Attention modules to gate inputs, outputs and state values (and more) of an RNN, to produce augmented RNN that can remember longer sequence in the future. More info and figures: great post and nice blog post.
Now the hard part: how do you combine these modules to make neural networks that can actually solve interesting problems in the world?
So far it has been human smart minds that architected these neural network topologies.
But why, did you not tell me that neural network were all about learning from data? And yet the most important things or all, the network architecture is still designed by hand? What?
This list will grow, stay tuned.
PS1: I do wish some smart minds can help me to find automatic ways to learn neural network architectures from data and the problem they need to solve.
If we can do this, we can also create machines that can build their own circuits topologies, and self-evolve. This would be a blast. And my work would be done. Ah, I can finally enjoy life!