Linear layers explained in a simple way

A part of series about different types of layers in neural networks

Assaad MOAWAD
Oct 1, 2019 · 5 min read
Image for post
Image for post

Many people perceive Neural Networks as black magic. We all have sometimes the tendency to think that there is no rationale or logic behind the Neural Network architecture. We would like to believe that all we can do is just to try a random selection of layers, put some computational power (GPUs/TPUs) to it, and just wait, lazily.

Although there is no strong formal theory on how to select the neural network layers and configuration, and although the only way to tune some hyper-parameters is just by trial and error (meta-learning for instance), there are still some heuristics, guidelines, and theories that can still help us reduce the search space of suitable architectures considerably. In a previous blog post, we introduced the inner mechanics of neural networks. In this series of blog posts we will talk about the basic layers, their rationale, their complexity, and their computation capabilities.

y = b //(Learn b)

This layer is basically learning a constant. It’s capable of learning an offset, a bias, a threshold, or a mean. If we create a neural network only from this layer and train it over a dataset, the mean square error (MSE) loss will force this layer to converge to the mean or average of the outputs.

For instance, if we have the following dataset {2,2,3,3,4,4}, and we’re forcing the neural network to compress it to a unique value b, the most logical convergence will be around the value b=3 (which is the average of the dataset to reduce the losses to the maximum. We can see that learning a constant is kind of learning a DC value component of an electric circuit, or an offset, or a ground truth to compare to. Any value above this offset will be positive, any value below it will be negative. It’s like redefining where the offset 0 should start from.

Image for post
Image for post
Learning a bias = learning a threshold or an average
y = w*x //(Learn w)

A linear layer without a bias is capable of learning an average rate of correlation between the output and the input, for instance if x and y are positively correlated => w will be positive, if x and y are negatively correlated => w will be negative. If x and y are totally independent => w will be around 0.

Another way of perceiving this layer: Consider a new variable A=y/x. and use the “bias layer” from the previous section, as we said before, it will learn the average or the mean of A. (which is the average of output/input thus the average of the rate to which the output is changing relatively to the input).

Image for post
Image for post
A linear curve without a bias = learning a rate of change
y = w*x + b //(Learn w, and b)

A Feed-forward layer is a combination of a linear layer and a bias. It is capable of learning an offset and a rate of correlation. Mathematically speaking, it represents an equation of a line. In term of capabilities:

  • This layer is able to replace both a linear layer and a bias layer.
  • By learning that w=0 => we can reduce this layer to a pure bias layer.
  • By learning that b=0 => we can reduce this layer to a pure linear layer.
  • A linear layer with bias can represent PCA (for dimensionality reduction). Since PCA is actually just combining linearly the inputs together.
  • A linear feed-forward layer can learn scaling automatically. Both a MinMaxScaler or a StandardScaler can be modeled through a linear layer.

By learning w=1/(max-min) and b=-min/(max-min) a linear feed-forward is capable of simulating a MinMaxScaler

Image for post
Image for post
Learning a MinMaxScaler through a linear feed-forward layer

Similarly, by learning w=1/std and b=-avg/std, a linear feed-forward is capable of simulating a StandardScaler

Image for post
Image for post
Learning a StandardScaler through a linear feed-forward layer

So next time, if you are not sure which scaling technique to use, consider using a feed-forward linear layer as a first layer in the architecture to scale the inputs and as a last layer to scale back the output.

Image for post
Image for post
A linear feed-forward. Learns the rate of change and the bias. Rate =2, Bias =3 (here)
  • These three types of linear layer can only learn linear relations. They are totally incapable of learning any non-linearity (obviously).
  • Stacking these layers immediately one after each other is totally pointless and a good waste of computational resources, here is why:

If we consider 2 consecutive linear feed-forward layers y₁ and y₂:

Image for post
Image for post
Stacking several linear feed-forward layers

We can re-write y₂ in the following form:

Image for post
Image for post

We can do similar reasoning for any number of consecutive linear layers. A single linear layer is capable of representing any consecutive number of linear layers. Basically, for example, scaling and PCA can be combined with one single linear feed-forward layer at the input.

Stacking Linear Layers is a waste of resources! It won’t add you any benefits to stack them.

If you enjoyed reading, follow us on: Facebook, Twitter, LinkedIn

DataThings

DataThings blog is where we post about our latest machine…

Assaad MOAWAD

Written by

Interested in artificial intelligence, machine learning, neural networks, data science, blockchain, technology, astronomy. Co-founder of Datathings, Luxembourg

DataThings

DataThings blog is where we post about our latest machine learning, big data analytics and neural networks experiments. Feel free to visit our website: www.datathings.com

Assaad MOAWAD

Written by

Interested in artificial intelligence, machine learning, neural networks, data science, blockchain, technology, astronomy. Co-founder of Datathings, Luxembourg

DataThings

DataThings blog is where we post about our latest machine learning, big data analytics and neural networks experiments. Feel free to visit our website: www.datathings.com

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store