What is a Recurrent Neural Network — Elman Net (Part 1)

Hadar Sharvit
3 min readFeb 26, 2024

As of now, we only considered inputs that were fixed in size and matched the dimension of the matrices that made up our architecture (either in a fully connected MLP or in Conv NN). Let us relax that assumption, and discuss new types of inputs.

we consider a Sequence signal as any signal that has the following properties

  • the prediction at some time t depends on history (say, time tτ)
  • the input may have varying length

for example, Google Translate receives its input as a sequence signal, as the translation of some sentences relies on the words that were previously inserted, and the sentence provided could be of any length.

Recurrent Neural Network (RNN)

the main idea would be that we can receive an input, perform some algorithm to it, and then output it to a future version of our model, which would also be able to receive future input, as well as the prediction of the previous model. That is, for every time stamp t, we:

  • receives an input xt
  • receive a hidden state h_t−1​ that represents an output of the previous timestamp
  • produces a prediction yt
  • produce a memory hidden state ht​ that can be transferred to future states

generally speaking, we can think of this model as some generic predictor ℎh that is constantly receiving inputs and constantly predicts outputs, or (unfolding) as a sequence of predictors, as the below figure describes. This model is known as the Elman network.

In the diagram above, U,V,W are some matrices that we multiply xi​,oi​ and hi​ with (aka the network parameters).

notice that every input xi​ and every output oi​ are of constant size, yet the number of such inputs and outputs is unknown and isn’t necessarily constant.
usually, we will look only at the output at the very end, and back to our Google Translate example, this is intuitively clear as a portion of a sentence may not have any meaning up until it is finished.

formal definisions

Variable input/output size

One key feature of Elman’s network model is that we have the freedom to play with the number of inputs/outputs. In most cases, we distinguish between the following designs

  • one-to-one: the input and the output are of fixed size, this is similar to models we have already seen before.
  • one-to-many: fixed size input, and arbitrary size output. for example, we input an image and output its textual representation
  • many-to-one: exactly the opposite. for example — trying to predict the stock price at time t given history prices t0​,t1​,…,tn​<t
  • many-to-many} for example, translating Hebrew to English.
Four possible architectures

--

--