The architectures of neural networks. Part 1.

HyperQuant
hyperquant
Published in
7 min readJun 13, 2018

The new sub-series of our articles is devoted to the architectures of neural networks — the basis of AI. In each article we will focus on several architectural types and study their applicability.

Neural networks form a class of models in the overall process of machine learning. These models can be described as defined sets of algorithms, which revolutionized machine learning. These networks are built on the principles of organization and functioning of biological neural networks — the neural cells networks of living organisms. The term ‘neural network’ appeared during the studies of the processes taking place in the brain, more specifically, when the attempts to model such processes were made.

In our series of articles eight architectures of neural networks are presented. These can be divided into three categories:

1. Feed-forward neural networks with the reverse error distribution.

This is the most common type of neural networks in practical applications. The first layer is an entry, the last one — an exit. If there is over one hidden layer, we call such networks “deep” neural. A neuron activity in each layer is a non-linear function. Convolutional neural networks are included in this group.

2. Recurrent neural networks or the networks of reverse distribution.

These have directed cycles. This means that you can occasionally return where you started from. The dynamics are complex, which makes their learning quite challenging. Recurrent networks are realistic biologically and are the natural method to model the sequential data. They are equivalent to the deep networks with one hidden layer per time frame. Recurrent networks possess the ability to remember information in a hidden condition throughout a long period of time, but it’s hard to teach them to use this potential.

3. Symmetrically connected networks.

These networks are similar to recurrent ones but the relationships between them are symmetrical (they have identical weight in both directions). Symmetrical networks are much easier to analyze than the recurrent ones. Symmetrically connected networks without hidden layers are called Hopfield networks, while those with hidden layers — Boltzmann machines.

Let’s look into the first two neural network architectures out of the eight:

Perceptron

This is the first generation of artificial neural networks. The first article, which attempted to model the work of the brain, was written by Warren McCullock in 1943. These ideas were continued by the neurophysiologist Frank Rosenblatt. He proposed a diagram of a device that simulates the process of human perception and called it a “perceptron” (from Latin “perception” — perception). In 1960, Rosenblatt introduced the first neurocomputer, Mark-1, which was able to recognize certain letters of the English alphabet. Thus, perceptron is one of the first models of neural networks, and Mark-1 is the first neurocomputer in the world.

Perceptrons began to be very actively explored. Many had high hopes for them. However, as it turned out, they had serious limitations. A fellow student of Rosenblatt, Minsk, did not approve of the worship of perceptrons and wrote a whole book (1971), in which he conducted a detailed analysis of them, simultaneously showing that they do not have much potential due to severe limitations.

Since then, the enthusiasm of scientists in the study of perceptrons and artificial networks has subsided. Although Minsk later said that he regrets his book has dealt such a blow to the concept of perceptrons. Other areas of them became promising, while the initial concepts forgotten. But then new types of neural networks were discovered, as well as the algorithms for their training, which again revived interest in this area.

The perceptron is based on a mathematical model of information perception by the brain. Different researchers define it differently. In its most general form (as described by Rosenblatt) — it represents a system of elements of three different types: sensors, associative elements, and reactive elements.

The first in work include S-elements. They can be either at rest (the signal is 0), or in the excitation state (the signal is 1). Further, the signals from the S-elements are transmitted to the A-elements via the so-called S-A bonds. These links can have weights that are only -1, 0 or 1. Then the signals from the sensor elements pass through the S-A bonds and fall into the A-elements, which are also called associative elements. It is worth noting that one A-element can correspond to several S-elements. If the aggregated signals arriving at the A-element exceed a certain threshold θ, then this A-element is in excitation state and gives a signal equal to 1. Otherwise (the signal from the S-elements did not exceed the threshold of the A-element) — a zero signal is generated.

A-elements are called associative for the following reason. The fact is that A-elements are aggregators of signals from sensory elements. For example, we have a group of sensors, each of which recognizes a piece of the letter “D” on the picture under study. However, only their totality (i.e., when several sensors output a signal equal to 1) can excite the A-element entirely. The A-element does not react to other letters, only to the letter “D”. That is, it is associated with the letter “D”. Hence the name.

Then the signals that produced the excited A-elements are sent to the adder (R-element), the action of which you already know. However, to get to the R-element, they pass along the A-R connections, which also have weights. However, here they can already take any values ​​(unlike S-A bonds).

The R-element adds to each other the weighted signals from the A-elements and, if a certain threshold is exceeded, generates an output signal equal to 1. This means that in the general information flow from the eyes — we recognized the person’s face.

If the threshold is not exceeded, the perceptron output is -1. That is, we did not distinguish a person from the general flow of information. Since the R-element determines the output of the perceptron as a whole, it was called reactive.

Convolutional Neural Network

Studies in the field of machine learning over the course of time have sufficiently illuminated the problems of object detection. There are various things that make it difficult to recognize objects. In particular, various real objects can be located next to many other objects. It is difficult to determine which objects are independent, and which objects are parts of other objects. Also, objects can be deformed. Besides, the change in the viewing angle causes changes in images, whose recognition is not handled by standard teaching methods.

Convolutional neural networks are a specific type of neural network that revolutionized computer vision and pattern recognition. They are also used for speech recognition, processing of audio signals and time series and for analyzing the meaning of texts. It can be said that at the moment it is the most successful model, an innovation within the framework of what is called deep learning.

A convolutional neural network is a special kind of neural networks of direct propagation. By direct propagation it is understood that the variable neurons in this network are divided into groups called layers. And when such a layered neural network is applied to the data, then the activation of the layers — the value of these variables — is calculated sequentially: first the activation value of the first layer, then the activation value of the second layer, and so on to the last layer. The activation of the last layer is the output of the neural network, and there are many parameters in this network, while each layer has some parameters that determine how the activation of the next layer depends on the activation of the previous layer. And what is even more important, activations within one layer can be counted in parallel, simultaneously, they do not depend on each other, and this leads to the fact that such neural networks can be very conveniently and efficiently computed on modern processors, including graphics coprocessors.

In the second type of calculation, the activation of neurons at the next level simply repeats the activation of neurons at the previous level, but the image becomes smaller due to the fact that the activation of a number of located neurons is replaced by their maximum or their mean, the so-called pulling procedure. Such an additional structure makes convolutional neural networks very suitable for working with images. It, for example, guarantees that if two images differ by a small shift, then the neural network at the output will get a very similar result. For conventional, non-trivial neural networks, this is generally not true.

In addition, in convolutional neural networks, the number of parameters is small relative to the number of neurons. Generally speaking, in absolute numbers this can be a very large quantity, millions and tens of millions in modern convolutional neural networks. But conventional neural networks for the same number of neurons would have hundreds of billions of parameters, and we would never have been able to type training sets to train such a large number of parameters. At the same time convolutional neural networks with as many neurons can be trained on existing samples, which makes them so successful.

More articles on the topic of neural network architecture types will follow — stay tuned!

HyperQuant Social Media

--

--