The architectures of neural networks. Part 2.

HyperQuant
hyperquant
Published in
6 min readJun 19, 2018

In the previous article we considered two architectures of neural networks: a perceptron and a convolutional neural network. Let’s look into the next two from the list.

Recurrent neural network (RNN)

Last time we looked at the perceptron network architecture. With this network we can recognize fixed-size data, such as images. However, if the task requires recognizing a conventionally infinite sequence, for example, music, it is important to evaluate not only the content, but also the order of the information. For these tasks recursive neural networks were invented. In recurrent neural networks neurons exchange information among themselves: for example, in addition to a new piece of incoming data, the neuron also receives some information about the previous state of the network. Thus, the network perceives a “memory”, which fundamentally changes the nature of its work and allows the operator to analyze any data sequences with the values in order — from sound recordings to the quotations of financial instruments.

The scheme of a single-layer recurrent neural network is as follows: in each work cycle, the inner layer of neurons receives a set of input data X and information about the previous state of the inner layer A, on the basis of which a response h is generated.

One of the main areas of RNN application nowadays is the work with language models, in particular — the analysis of the context and the general connection of words in the text. For the RNN, the structure of the language is long-term information that must be remembered. It includes grammar, as well as stylistic features of the body of texts on which the training is conducted. In fact, the RNN remembers what is the usual word order and can finish the sentence after having received some priming. If the seed is random, it can result in a completely meaningless text stylistically reminiscent of the template on which the RNN studied prior to that. If the source text was meaningful, the RNN will help to stylize it, however in the latter case, one RNN will be small, since the result should be a “mixture” of random, but stylized text from the RNN and a meaningful, but “unpainted” original part.

The RNN successfully copes with writing music. Writing and mixing melodies with arbitrary style has already been successfully solved with the help of neural networks. However, unlike music, the writing of texts in the RNN raises problems. The fact is that in instrumental music there is no sense in the same meaning as it is in most texts. That is — for the neural network if music does not have lyrics, it does not carry the information load. RNN has issues with giving meaning to their works: the networks can excellently learn the grammar of the language and remember how the text should look in a certain style, but they cannot create and convey ideas or information without a learned template. At least for the time being.

Another of the applications for RNN is an image analysis. It would seem that this area is usually perceived in the context of convolutional neural networks, but for RNN these tasks are trivial: their architecture allows you to quickly recognize the details, based on the context and the environment.

Similarly, RNNs work in the areas of analysis and text generation. Speaking of more challenging tasks — one may recall the attempts to use early RNN for the classification of the carbon spectra of nuclear magnetic resonance of various benzene derivatives, and from modern ones, the analysis of the appearance of negative feedback about products.

Long / Short Term Memory Network (LSTM)

The LSTM network is a special type of PHC capable of learning long-term dependencies. LSTM networks were represented in the work of Hochreiter and Schmidhuber in 1997. In their work, the authors described a modification that solved the problem of long-term memory of simple RNNs: their neurons “remember” the recently received information, but do not have the opportunity to permanently store something that has been processed many cycles back, no matter what that information is. In LSTM internal neurons are “equipped” with a complex system of so-called gates, as well as with a concept of cellular state (cell state), which is a kind of long-term memory. The gate determines what information gets into the cellular state, which information will be erased from it, and what will affect the result.

Any recurrent neural network has a form of repeating modules chain, shaping a neural network. In the usual RNN (recurrent neural network) the structure of one such module is very simple, for example, it can be a single layer with the activation function (hyperbolic tangent).

The repeating module in the standard RNN consists of a single layer

The LSTM structure also resembles a chain, but the modules look different. Instead of one layer of a neural network, they contain of as many as four, and these layers interact in a special way.

The repeating model in the LSTM network consists of four interacting layers

The key component of LSTM is the state of the cell (cell state) — a horizontal line that runs along the top of the circuit. The state of the cell resembles a conveyor belt. It passes directly through the entire chain, participating in only a few linear transformations. Information can easily flow through it without being subject to change.

However, the LSTM can delete information from the cell state. This process is regulated by structures called filters (gates). Filters allow you to skip information based on certain conditions. They consist of a layer of a sigmoidal neural network and a pointwise multiplication operation.

The sigmoidal layer returns numbers from zero to one that indicate, which fraction of each information block should be passed further along the network. Zero in this case means “do not miss anything”, unit — “skip all”. In LSTM networks three such filters allow you to protect and monitor the state of a cell.

Stay up to date on the continuation of our article series about the architectures of neural networks, artificial intelligence and algorithmic trading!

HyperQuant Social Media

--

--