Sometimes you have to take important decisions …

Why Are Neural Nets Non-linear?

Dr. Christian Wiele
The Startup
Published in
4 min readOct 23, 2020

--

For a long time I struggled getting a good semantic understanding of why neural nets have to be non-linear, or — more specific — what the computational role of the ReLU layers is. But there is actually a pretty straight forward explanation that might help others getting their heads around this abstract topic.

When you try getting into deep learning you are confronted with a lot of abstract concepts and math. This is so different to classical programming.

In classical programming you can read other peoples code, and understand what it is supposed to do (at least with some experience). There is a close connection between the semantics (what the code is for) and the technical implementation. You can even deepen this connection by giving your variables, classes, and methods speaking names.

Deep learning is different

Looking at the pure neural net implementation does not tell you much about which problem the system is supposed to solve. You can do some guess work by looking at the input and output shapes, but in-between the different layers are just shuffling numbers around. You need context information outside the code to make sense out of it.

The most puzzling thing about neural nets is the apparent contradiction between the simple mathematical operations performed on the data (mostly multiplying and adding numbers), and the complex tasks neural nets are able to perform (like classifying or generating images).

The most simple layers

And then there are these ReLU layers that perform the most simple mathematical operation of all: replacing all negative values with zeros.

And yet, neural nets would not be able to perform any meaningful task without these simple operations.

Unfortunately, with these little operations the neural net becomes what mathematicians call non-linear. An often considered example to explain why neural nets have to be non-linear is the XOR gate that cannot be solved by a linear system. But this rather technical example does not give a semantic justification (in an algorithmic sense) for the ReLU layer (or corresponding non-linear layers).

So, here is my take on the non-linear layers:

The non-linear layers enable neural nets to learn making conditional decisions for controlling the computational flow.

To understand this we have to go back to classical programming

In classical programming we implement an algorithm for solving a problem. An algorithm is basically a set of rules and instructions that lead to a solution of our problem. There are two main categories of instructions

  1. Instructions that manipulate data.
  2. Instructions that control the program flow by taking decisions (like if … then…)

(We also move data around, store or load data, etc. But we are not concerned with this kind of utility tasks here.)

An important property of an algorithm is that we are able to explicitly write down the rules and instructions in plain prose (not just in code).

So, why do we employ deep learning?

There are two main reasons we are turning from classical programming to deep learning for solving problems (see also my blog post):

  1. We are not able to explicitly write down the rules (algorithm) for solving the problem. For instance, we are not able to write down the rules for detecting objects in images.
  2. We know the rules, but implementing them is too computationally expensive. So, deep learning is employed as approximation.

So what we do is implementing an learning algorithm, but not an algorithm for solving the actual problem.

The Neural Net requires both kinds of instructions

Now, as the neural net replaces the classical algorithm, it is required to implement both data manipulating, and decision making instructions.

  1. The data manipulation is done by the linear layers. They can for instance emphasize patterns in the data (like the convolutional filters do).
  2. The decisions are taken by the non-linear layers by dropping data points that are less relevant than others.

How is it done?

For sure, ReLU or other non-linear layers are not instances of classical if/then statements. The decision making is more subtle. With the non-linear layers the system is able to learn parameters in the linear layers that force important values to become positive, and less important ones to become negative.

In this sense the system is able to learn making conditional decisions, and controlling the computational flow.

Decisions are inherently non-linear

It should be noted that any decision you can take is essentially non-linear in nature. Taking a decision means comparing different options, and pursuing only one of them. The other options are abruptly discontinued at the decision point.

So, it is no wonder that neural nets are required to be non-linear, as they have to take decisions in controlling the computational flow.

What’s the difference to classical machine learning?

Classical machine learning algorithms are often considered linear, but this is only partially true. The difference is that the decision points (and thus the non-linearities) are programmed explicitly.

For instance, splitting the data in a decision tree, or assigning data points to certain clusters in a k-means algorithm are non-linear acts. But the decision points and criteria are made explicit, and can thus be understood. So the complete control flow of the data is fixed.

As a consequence, these algorithms are limited, as they are not able to learn new, or different decisions for changing the computational flow.

What else?

Hope this helped you getting a deeper understanding of neural nets. If you are interested in other aspects of AI / machine learning, consider subscribing to my YouTube channel.

--

--

Dr. Christian Wiele
The Startup

Founder and CEO Atlantic Tech & Candy | Deep Learning | Game Dev | Theoretical Physicist | Northern Germany | Kitesurfing | https://atnc.ai | @christian_wiele