The building blocks of a Deep Learning Neural Network.

Marlon Tabone
Bite-Sized Deep Learning
5 min readSep 15, 2020

Un-crumpling paper balls is what machine learning is all about, it seeks to find neat representations for complex, highly folded data manifolds.

Deep Learning takes the approach of incrementally decomposing a complicated geometric transformation into a long chain of elementary ones, which is essentially the strategy a human would follow to uncrumple a paper ball.

A deep-learning model is like a sieve for data processing, made of a succession of increasingly refined data filters — the layers.

What are layers?

The core building block of a neural network is the layer, a data-processing module that you can think of as a filter for data. Some data goes in, and it comes out in a more useful form. Most of deep learning consists of chaining together simple layers that will implement a form of progressive data distillation.

Each layer in a deep network applies a transformation that disentangles the data a little — and a deep stack of layers makes tractable an extremely complicated disentanglement process.

Some important glossary before we move on…

Tensors: At its core, a tensor is a container for data — almost always numerical data. Tensors are a generalisation of matrices to an arbitrary number of dimensions (often called an axis). In general, all current machine-learning systems use tensors as their basic data structure. Tensors are fundamental to the field — so fundamental that Google’s TensorFlow was named after them!

A tensor is defined by three key attributes:

1. Number of axes — For instance, a 3D tensor has three axes. This is also called the tensor’s ndim in Python libraries such as NumPy.

2. Shape — This is a tuple of integers that describes how many dimensions the tensor has along each axis. For example, the a matrix (2D tensor) could be of shape (3, 5) and a 3D tensor could be of shape (3, 3, 5). A vector has a shape with a single element, such as (5,), whereas a scalar has an empty shape, ().

3. Data type (usually called dtype in Python libraries) — This is the type of the data contained in the tensor; for instance, a tensor’s type could be float32, uint8, float64, and so on. Note that string tensors don’t exist in NumPy (or in most other libraries). This is because tensors live in preallocated, contiguous memory segments and strings are of variable length, hence they would preclude the use of this implementation.

The Dot Operation: The dot operation (also called a tensor product) is the most common and useful tensor operation. Contrary to element-wise product operations it combines entries in the input tensors.

Tensor Reshaping: Before feeding data into a network for training, it is advised that we pass the data through a pre-processing step. Often this involves reshaping the tensor which means rearranging the rows and columns to match a target shape. Naturally, the reshaped tensor has the same total number of coefficients as the initial tensor. A special case of reshaping that’s commonly encountered is transposition. Transposing a matrix means exchanging its rows and its columns, so that x[i, :] becomes x[:, i]:

Neural networks consist entirely of chains of tensor operations and all of these tensor operations are just geometric transformations of the input data. It follows that you can interpret a neural network as a very complex geometric transformation in a high-dimensional space, implemented via a long series of simple steps.

Training, Learning, Weights and a brief introduction to Loss Functions, Optimisers and Metrics.

Preparing a network for training:

To make a network ready for training, we need to pick three important things, as part of the compilation step.

1. A loss function — How the network will be able to measure its performance on the training data, and thus how it will be able to steer itself in the right direction.

2. An optimizer — The mechanism through which the network will update itself based on the data it sees and its loss function.

3. Metrics to monitor during training and testing — An example of a metric is accuracy.

We’re now ready to train the network, which in Keras is for example done via a call to the network’s fit method — we fit the model to its training data.

However, before training, we’ll pre-process the data by reshaping it into the shape the network expects and scaling it so that all values are in the [0, 1] interval. (A process called normalisation)

Keep in mind that in training, two important quantities are displayed:

  1. The loss of the network over the training data
  2. The accuracy of the network over the training data.

output = relu (dot (W, input) + b)

In the above expression, W and b are tensors that are attributes of the layer. They’re called the weights or trainable parameters of the layer (the kernel and bias attributes, respectively). These weights contain the information learned by the network from exposure to training data.

Initially, these weight matrices are filled with small random values (a step called random initialisation). Of course, there’s no reason to expect that relu (dot (W, input) + b), when W and b are random, will yield any useful representations. The resulting representations are meaningless — but they’re a starting point.

What comes next is to gradually adjust these weights, based on a feedback signal from the loss function. The process within which this learning happens is called a training loop.

The Training Loop in Steps:

  1. Draw a batch of training samples x and corresponding targets y.
  2. Run the network on x (a step called the forward pass) to obtain predictions y_pred.
  3. Compute the loss of the network on the batch, a measure of the mismatch between y_pred and y.
  4. Update all weights of the network in a way that slightly reduces the loss on this batch.
  5. Repeat until model starts to show results.

After repeating steps 1 to 4 for a previously defined epoch number, you’ll eventually end up with a network that has a very low loss on its training data: therefore a low mismatch between predictions y_pred and expected targets y. This would confirm that the network has “learned” to map its inputs to correct targets.

--

--

Marlon Tabone
Bite-Sized Deep Learning

Marlon Tabone is a Maltese Computational Artist based in the United Kingdom.