Building a Signal Modeler with RNNs | Towards AI

Signal Modeling Using Recurrent Neural Networks

Anish Agarwal
Towards AI
Published in
10 min readJun 11, 2019

--

In my previous article, I mentioned that one of the best ways to understand something is to take it apart, see how it works and then to build something with it. In that article, we took apart a Recurrent Neural Network (RNN) and understood its operating principles. This article will focus on building something with RNNs and thus get deeper into our understanding of recurrent neural networks (RNNs). We will construct a Recurrent Neural Network in TensorFlow to model a Fourier series.

The Basic Groundwork

The base project is available here with setup instructions. I will not go through the basics of TensorFlow, for that is covered extensively elsewhere. I will, however, go through how an RNN is structured in TensorFlow.

Before we get into the details of code lets just get a high-level understanding of how our signal modeler will work. The signal that we are modeling is simply a set of coordinates (x,y). We want to use the previous values of y to model/predict the oncoming values of y using an RNN. As we feed the values of y into the RNN, the RNN will create a probability distribution mapping the possible oncoming values of y to the previous values of y.

There is an important concept to consider here. What if y is real-valued? In this case such there is an infinite number of possible outcomes for the oncoming values of y. For example, if we are trying to model a function y=sin(x). We know that y is constrained between [1,-1]. However, y can have an infinite number of values between [1,-1]. Therefore, how do we create a probability distribution of a probability space that has an infinite number of outcomes? To keep it simple we won’t. Instead, we will artificially discretize the probability space (i.e. the values of y) to the amount of resolution we require.

In the case of y = sin(x), we would split up [1,-1] into sets of regions. The size of each region will dictate the resolution by which the RNN can model the signal. For example, if we have 2 regions of size 1. The first region would contain all values between [1,0] and the other region contains all values between [0,-1]. The resolution of our model will be 1. If we modeled y=sin(x) with our RNN using a resolution of 1, the results would not be very accurate as the model would only predict 2 values. Instead, we would want a much higher resolution so that the model will be accurate. Keep in mind, however, that if the resolution is too high the size of the RNN will increase and as such the computation time for forward/backpropagation will increase. We want a balance in resolution where our RNN generates an accurate model without requiring too much computational size.

The core idea to understand is that a real-valued function can be modeled by an RNN through discretization. This process of discretization specific to neural networks is known as one-hot encoding. Once the real-valued function is discretized, an RNN can create a probability distribution modeling the previous values of y to the new values of y. Now that we have a high-level understanding of the project, let’s get into the code.

The Source

The RNN is implemented in the file RNN_Signal_Modeler.py. The rest of the files are for support functionality: reading CSVs, plotting data, logging. Within the RNN_Signal_Modeler.py file, there is a class called RNN_Signal_Modeler which contains three methods: initialize (abbreviated as init), train and run. As the RNN is structured in the init() we will primarily study that method.

There are a couple of critical parameters that must be specified to define the RNN. These are x_m, x_n, y_m, w_m, w_n, b_m, x_r, x_s, rnn_cell, outputs, states, pred, cost and optimizer. Note that all of these parameters are prefixed with “self” in the source. It’s important to understand these parameters as they effectively represent the RNN.

The network is constructed as a grouping of RNN cells. The number of RNN cells is equal to the batch size. The batch size is defined as x_m (batch size is explained in the next section). This grouping of RNN cells is run at each iteration. The input parameters, x_m, and x_n are further explained in the image below.

The label matrix, which is standard for most neural networks is defined as y_m. The label matrix is what you use to compare the prediction with the actual values. The weight matrix is defined as (w_m,w_n) and the bias matrix as (b_m,1). The bias matrix is not a multi-dimensional matrix. As mentioned above the number of columns of the input matrix (x_n) will affect your weights and bias matrices. Specifically, it will affect the number of columns in your weight matrix. The condition is that w_n = x_n = b_m. The parameter w_m is very important. It decides the size of your RNN. You can set w_m to be very high and this will increase the complexity of your RNN, by increasing the number of neurons within an RNN cell. Note a cell is not a neuron, a cell has a large number of neurons within it. An RNN will higher complexity will be able to model more complex signals however, it will require more memory and time to train and run.

The parameters x_r and x_s are used to split the input matrix x, into may small matrices, each of which will be fed separately into the RNN. The outputs parameter as the name implies contains the output of the RNN from the current iteration. States are the parameters that represent is the inner state of the RNN cell. It will be passed on to the next RNN cell. This is where the temporal retention of data is done which is the core functionality of RNNs (explained here). The rest of the parameters, pred, cost, optimizer, etc. are standard for most other networks (Convolutional Neural Networks, etc.) in TensorFlow.

Its all in the Details, Calibration

Before running the RNN, it is important to calibrate the hyper-parameters of the network. How effectively an RNN models a piece of data depends on the hyper-parameters. You can always just have over-powered hyper-parameters so that the RNN can model almost any signal, however, the cost of that is memory and time. In order to do more with less, that is, accurate modeling with low memory resources and computation, we must calibrate the RNN.

What are these hyper-parameters? These are the batch size, one-hot encoding, RNN size, learning rate, num epochs and downsample factor. We will try out many different combinations of hyperparameters and see how the neural network responds to data.

Batch size refers to the number of data points that the RNN will be fed at each iteration to produce an output. The more data points we feed into the network (the larger the batch size), effectively the farther back the network is looking to create the prediction. The larger the batch size, the slower the network per iteration. This is because the batch size is directly proportional to the number of RNN cells that are being run at each iteration. More RNN cells equate to more computation and memory.

One-hot encoding as mentioned above refers to the discretization of the signal which is being modeled as explained above. Effectively the size of the one-hot encoding refers to the resolution of the RNN. A higher resolution will create a more accurate model but will require mode memory and time to train.

RNN size refers to the complexity within each cell. As mentioned above RNN, by complexity I am referring to the neurons within the RNN cell. A higher complexity will allow the RNN to model more complicated signals but will also result in more memory and time required to train and run.RNN size can be referred to as part of the network architecture, however, I prefer to have it be a part of the hyper-parameters as it more significantly affects, accuracy, computation time and memory required than the other hyper-parameters. Thus it is important to calibrate RNN size to your data.

Learning rate refers to the size of the correction that the RNN will make whenever it gets something wrong. If the learning rate is too high the RNN will always over-correct and never get to the correct prediction. On the other hand, if the learning rate is too low then the RNN will take a very long time (maybe never) to get to the right prediction. In our code, we will start with a high learning rate and then rapidly decrease it by a factor of ten. This allows the RNN to over-correct in the initial stages of training and then when it is predicting in a good range of values, then the learning rate will be reduced to allow the RNN to learn the nuances of the signal.

Num epochs refer to the number of times the RNN will repeat training on data. A higher number of epochs will give the RNN more chances to try, fail and correct its predictions. A higher number of epochs also means that it will take a longer time to train.

The down-sample factor is actually related to the dataset and not the RNN directly. Down-sampling refers to reducing the resolution of the dataset. If we are down-sampling the dataset by a factor of 10, then we will take every 10th value in the dataset and ignore the rest of the 9. Down-sampling allows the RNN to look at the data from a more general perspective. Down-sampling and batch sizes go hand-in-hand. A high down-sample factor and high batch size, allows the RNN to look back in time very far but loses resolution (temporal resolution loss for time series data).

At this point, we’ve architected an RNN. What does that mean? Well, it means you have effectively created an artificial neural network that can see through time. Now let’s feed our neural network data, and see it come to life!

Data Generation

What kind of data will we be feeding? Arbitrary signals. Where are we going to get these arbitrary signals? We will make our own! Using a Fourier series, we will generate our own signals which have varying magnitudes, frequencies and complexities. Then we will use these generated signals to train and test our RNN. For your convenience, a quick primer on the Fourier series is below.

A Fourier series was first theorized by a french mathematician and physicist by the name of Joseph Fourier. The basic idea of a Fourier series is that a function of time can be approximated with an infinite summation of various harmonics, the harmonics being sines and cosines as shown below.

Fourier Series (source)

So essentially we will be generating a signal via a Fourier series and then modeling it via an RNN. We can add increasing complexity and size to challenge the RNNs limits! Thank you, Mr. Fourier!

Here Goes Nothing…

I have chosen an arbitrary signal defined by the Fourier series below to model with the RNN. It has a varying range of frequencies and magnitudes which make it sufficient to prove our system.

y = 2sin(0.08x)+0.02cos(2x)+sin(0.02x+2)+sin(0.001x)

Starting with a Batch Size of 10, RNN Size of 50, Downsample Factor of 2 and a One-hot Encoding resolution of 0.01, we get the following result after 45 epochs of training.

As observed, the network is not modeling the data accurately enough. It might be the case that the data has too much resolution and that the RNN needs to look at data farther back in time. We can either increase the batch size or increase the downsampling factor. For the sake of experimenting, let's increase the down-sample factor to 10.

The predictions improved slightly, the network can still be sharpened further. It is still not able to model the higher frequency components accurately. We are only running 5 epochs at any specific learning rate. Let's give the network more time to train at each learning rate. Let's increase the number of epochs per learning rate from 5 to 50.

That’s looking good! Increasing the number of epochs gave the RNN more opportunity to learn the nuances of the signal and so it did!

So by simply changing the down-sample factor, batch size and the number of epochs we were able to calibrate the network to model this signal.

Voila!

Well there you have it, we have figured out how to build a signal modeler using RNNs! I hope this makes RNNs more accessible. It might be a good exercise to generate more complex signals and try to model them with the RNN. It might be even better to modify the RNN code provided in this article to model a multi-variable signal (hint: line 66 in rnn_signal_modeller.py).

RNNs are a core building block in the field of neural networks. As CNNs are effective in modeling space, RNNs are effective in modeling time. CNNs are able to create efficient and effective models of data which is inherently large at every iteration (e.g. videos). RNNs, on the other hand, are effective in modeling data where there is an underlying relationship across iterations. It will be interesting to see how RNNs will develop and be applied alongside the other fundamental building blocks in the field of artificial neural networks. I for one am excited to see the future evolution of RNNs.

--

--