Hypertuning a LSTM with Keras Tuner to forecast solar irradiance

Amin Asbai
Analytics Vidhya
Published in
10 min readJul 20, 2021
Photo by Nuno Marques on Unsplash

Project Overview

Most of you already know that one of the main issues with photovoltaic energy, and renewable energy in general, is the associated uncertainty with the production in those technologies. This issue makes it hard to integrate them into the grid. If you have any background in electrical engineering you would know that to avoid issues on the grid, we have to match the production and the consumption of energy at every moment in time. If it does not happen, the grid will become unstable.

When we use traditional power plants, we can dispatch energy as we want. Then the grid operators just have to forecast the power consumption and adjust the power plants accordingly. But, when we introduce renewable energy we cannot dispatch them as we want. Hence, to schedule and coordinate all the different energy sources to supply energy in a reliable manner we need to estimate the renewable energy production.

Forecasting Solar Irradiance

In my project I used a dataset provided by PVGIS that contains meteorological data from a whole year that has been taken on an hourly basis.

I will use the data from the last 10 days (240 steps) to forecast the solar irradiance for the next day (24 steps).

LSTM Overview

In recent years, LSTM networks had become a very popular tool for time series forecasting. There are people that argue that aren’t that good, and that tend to overfit. As everything in life I think there is no silver bullet, so you will always have to try different models to choose the right one.

Photo by Alina Grubnyak on Unsplash

What is LSTM?

Long Short-Term Memory is a kind of recurrent neural network (RNN) architecture. The RNN are mainly used in processing sequential data (text, natural language or image captioning) and in time series forecasting. Their main difference with feedforward or convolutional networks is the fact that they have some sort of ‘memory’. RNNs feed the output back as an input, making the output dependent on prior events.

Why LSTM?

The idea behind RNN was to build a NN that was able to learn to use past information. When the useful information is close in time, RNN can do the job. But If we need to go further back in time RNN fails, and here is where LSTM comes into play. LSTMs are capable of keeping the important information, doesn’t mind of back in time it is, and forget the useless one.

On the other hand, if we compare it to traditional statistical techniques. LSTM allows to capture complex patterns that traditional techniques would have a hard time to. For example, solar irradiance has multiple seasonalitys (a daily and a yearly one). Capturing that multiple seasonality with, let’s say, ARIMA can be very problematic.

If you want to learn more about LSTM, see that article.

What is hypertuning and why it is important?

In every machine learning algorithm we have the parameters and the hyperparameters. Parameters are learnt by the algorithm itself by training. On the other hand, hyperparameters are set manually by the user.

If you are familiar with Neural Networks you will realize there is no clear guidelines or any formal procedure to design and choose the neural network hyperparameters. So, usually trial and error techniques and intuition are used.

To make that process a bit more efficient Keras has developed a hypertuner, which basically allows you to easily configure a space search where you will deploy a search algorithm to find the best hyperparameter combination.

Disclaimer

In this post, I will focus just on building the neural network. But, don’t forget to analyze your data before you start building a model. It’s always good to know the data you are working with.

Let’s code

First of all, we have to import all the modules and packages we will use. It may change according to your project needs.

Data loading

Then, we will load the dataset and clean the data. We won’t talk about data cleansing because it misses the point of the tutorial.

Data splitting

Once we have the data, we will split it into a training set and a validation set. As stated before, we want to do 24 hours ahead forecast. So, we will split the data in a way that both sets will be made up of whole days.

Transform it into supervised learning

Then, we have to transform the data into a supervised learning form. That concept is a bit tricky.

In a supervised learning problem we have an input (X) and an output (Y) and the task of our machine learning algorithm is to find a function that maps the input to the output.

When we are working with time series, we have a discrete collection of data. So, we have to reframe that data to a collection of input-output pairs. To do that we will use the previous time steps as input variables and the next step as the output variable.

Let’s use an example to make it clear, we have the following time series:

We can reframe it into a supervised learning problem by doing the following:

That was a simple case, It gets more complicated when we work with multivariate and multi-step forecasting, as was the case in my project.

If you are not familiar with the concept I highly recommend you to read this article before going further.

To reframe the dataset I used the following function:

Scale the data

Neural networks work better and converge faster with scaled data. There are plenty of articles out there explaining the math behind it. For my application I used a MinMax scaler provided by sklearn, but you can use more sophisticated scalers.

In the function scale_data we get the training set and the validation set as arguments. Then, we create an scaler for the features and other for the target. We scale the data and we return as output the scaled datasets and the scalers we used.

Please, don’t scale the training and the testing sets together. It will leak test information back to the train test. The right way to do this is to use only the training set to build the scaler and use the same scaler for scale the test data.

Be aware, of saving your scaler because on the algorithm deployment phase you will need it to scale the streaming data.

Put it all together!!!

Now, that we have built all the functions we need. It’s time to put it all together to start training our neural network.

Keras Tuner

Until now, we have done nothing special. We have just prepared our data to be trained by a neural network. But, now we are going to apply the Keras Tuner magic!!!

First, we have to create a function where we will define our model space search.

Here, I will try to break the code down to make it as understandable as possible.

The first thing we have to do is instantiate Sequential(), it will be our stack of layers.

I used a n_layers x LSTM+ Dropout + Dense architecture, but you could use any other. As you may have noticed, selecting the architecture is tricky an requires hyperparameter tuning.

Whats is ‘hp’?

Hp is an object that we pass to the model-building function, that allows us to define the space search of the hyperparameters.

First LSTM layer

Setting the first LSTM layer is a bit tricky. When we reframed our time series to a supervised learning problem, we have created a 3D array. The first dimension is the number of pairs input-output and it is inferred by the layer itself. The second dimension is the number of backward steps, for example, I used the data of the 10 previous days (240 steps) and the third dimension is the number of labels used, in my case 8 (temperature, irradiance, pressure, etc.). The second and third dimensions have to be set explicitly on the first LSTM layer (line 3).

Another important thing is to set return_sequences to True, False is the default value. When we set return_sequences to True, it makes the output usable to another LSTM. If the next layer is not an LSTM, just leave return_sequences by his default value.

If you want to learn more about it, check that post.

Chossing the number of neurons

There is no way of knowing the right number of neurons a layer should have. That’s why we use the hp object to define a range of values the hyperparameter can take.

In the code above we are telling the Tuner to use values between 32 and 512 with a step of 32.

Choosing the number of layers

As with the number of neurons there is no way of knowing the optimal number of layers, that’s why we insert a hp.int() in a for loop.

Here every trial range() will take a new value. For example, the first iteration ‘n_layer’ may take the value 1, which means the loop will have range(1), so we will add 1 LSTM layer, or could take a value of 4 and add 4 layers.

Dropout layer

To avoid overfitting the neural network, we add a dropout layer. The dropout layer hides neurons randomly, and the number of neurons hidden is set by the dropout rate. If the dropout rate is 0.5, the dropout layer will hide half of the neurons every iteration.

To find the right dropout rate, we will use hp.float():

Intuitively, you may notice it works similar to hp.int(), we set a range of values and we set a step, then the Tuner will chose values from that range according to the chosen search algorithm.

Dense Layer and activation function

In most of LSTM deep learning architecture there is a final dense layer. In my case, I had to choose an activation function that will ensure positive values (it doesn’t make sense to have negative irradiance). To ensure positive values I chose to use a relu or a sigmoid activation function. As usual, there is no way of knowing which one will perform better. But luckily for us, Keras Tuner provides us with hp.choice(), it allows us to set a list of possible choices from where values can be taken.

In the code above, I’m telling the dense layer the number of outputs I want (Y_train.shape[1]) and I’m setting activation function to be a relu or a sigmoid. Remember Y_train.shape[1] is 24, i.e. a whole day.

Compile the model

Once, we have built our model we have to compile it. We have to set 3 parameters:

  • Loss: It defines the metric we will use to measure the error of the model during the training phase
  • Optimizer: It defines the optimization technique we will use to change the inputs weight. It’s a tricky concept, look at this article if you want to know more.
  • Metric: It is the metric we will use to evaluate our model is similar to loss, but is not used during the training.

Create a Tuner object

Now, that we have defined our model-building function, we can create a tuner object. When we create a tuner object, we will set the search algorithm we will deploy.

I decided to use a Random Search because of simplicity, but you can choose other (the code may change a bit) but the documentation is pretty clear. You shouldn’t have problems.

Then, inside the class we have to set some parameters.

We have to pass a hypermodel (the model-building function we just created). We have to set the ‘objective’, that’s the metric the searching algorithm will use to evaluate every model trial. As we are dealing with a forecasting problem I’m using MSE as a metric.

Now, come the two parameters I struggled more to understand at the beginning, but are actually pretty simple.

  • max_trials: The number of hyperparameters combinations the search algorithm will try. As you may notice, trying every possible combination is unrealistic, so you set how many combination you want the search algorithm to try.
  • execution_per_trial: As you may be aware of, the neural networks have a remarkable stochastic component, i.e. you can train the same neural network with the same data and get slightly different models. So, if you are looking for the best accuracy possible you may want to retrain the same neural network architecture several times until you find the best one.

Once, we have created the tuner object, we can use the method search to find the best model.

If you are familiar with neural network you should know what epochs and batch_size is, if not look at this article.

Getting the best model

After the search has been done (it may take a long time). We are ready to get the best model.

Now that we will have our model, I will show you how to use the predict method.

X_test[0] is a 2D array, that has 240 values (10 days data) and 8 features (the meteorological features). As you remember, LSTM just takes 3D arrays, therefore we have to reshape X_test[0] to get a 3D array.

Save the model and the scalers

If you have built and developed a neural network you may want to keep the model to use it afterward or for deploying it.

I will recommend to save it as .h5 file.

As I mentioned before, you should save the scaler you used in the training phase to use it after with streaming data. I will recommend saving it as a pickle.

Conclusion

My main objective with this post was to give an idea of how to use Keras Tuner and how to use LSTM layers in a deep learning context.

--

--

Amin Asbai
Analytics Vidhya

Electronic engineer specialized in controls systems. I mainly worked on offshore industries, first on the marine industry and now on offshore wind.