Forecasting the Wind Energy Production From Wind Farms to Maximize Profits of the Wind Energy Producer Using Machine Learning on Historical Energy and Wind Forecast Data

Yonathan Wijaya
The Startup
Published in
9 min readJul 26, 2020

--

Photo of Wind farm from ssuaphotos/shutterstock

Introduction

Many renewable energy sources like wind and solar power are dependent on the environment. Wind energy specifically is dependent on wind speed and direction. “Wind energy can only be produced when the wind blows”, is one objection to the adoption of wind power. The truth is modern society depends on steady, predictable sources of energy, favoring fossil fuels, nuclear energy, etc.

Objective

The objective of this project is to play the role of an energy trader. Energy traders are companies that help to predict and trade many aspects of energy, including the expected production of wind energy. In this project, the role of the energy trader is to help solve the shortfall through financial instruments to maximize profits for the client using our energy production forecast and the given trading algorithm.

The goal is to get a T+18 hour (18 hours into the future) energy forecast, every hour.

Trading Algorithm

  • Your client is paid 10 euro cents per kWh sold to the grid. You can only sell to the grid what you have forecasted for that date.
  • If the actual energy production exceeds the forecast, this excess is absorbed by the grid, but your client is not compensated for this excess.
  • If the actual energy production is below the forecast, you must buy energy from the spot market (at 20 euro cents per kWh) to supply to the grid. You are given a cash reserve at the start of 10,000,000 euro cents to buy energy from the spot market. The amount you can buy is limited by the cash you have at hand. You must have a positive balance to buy.
  • If you have less cash in hand than required to purchase the shortfall, you will be fined 100 euro cents per kWh for the amount you can’t buy. This is recorded as a negative value (debt) and added to your cash-at-hand. Debt is cumulative.

Datasets

There are 2 main datasets: Wind Energy Production and Wind Forecasts.

Wind Energy Production

The source of wind energy production comes from Réseau de transport d’électricité, (RTE) the French energy transmission authority. Near-realtime wind energy actuals are derived from RTE’s online database, which we averaged and standardize to a time base of 1 hour. The dataset is called energy-ile-de-france, since it contains the consolidated energy wind production for the Ile-de-France region surrounding Paris.

  • Data is provided from 01 Jan 2017 to the present.
  • The energy production values are in kWh.

Wind Forecast

We also used the wind forecasts for 8 locations in the Ile-de-France region. Each location represents a major wind farm in Ile-de-France.

The data is provided by Terra Weather. The data is divided into 2 (wind speed & wind direction). Therefore, there are 8x2 = 16 forecasts in all. The forecasts are available from 01 Jan 2017 to the present.

Each forecast has 2 variables: wind speed (in m/s) and wind direction as a bearing (degrees North). Wind directions are “From”, eg a wind direction of 45 degrees means the wind blows from the northeast

Data Fusion

Since the datasets are divided into 17, we need to fuse them into 1 dataset for our model to train. Therefore, we will fuse the extracted databases into one database with 3 columns, which are: Energy produced (1), the average of wind speed from all 8 locations(2), and the average of wind direction from all locations(3) every hour.

All data have been interpolated to a time-base of 1 hour.

Figure 1. Sample of the data after data fusion

Data Pre-processing

Normalize the data

Looking at the data from figure 1 above, it is obvious that we need to normalize them. Normalizing the data generally speeds up learning and leads to faster convergence. It ensures that the magnitude of the values that a feature assumes is more or less the same. If the inputs are of different scales, the weights connected to some inputs will be updated much faster than other ones and this generally hurts the learning process.

To summarize, normalization helps because it ensures (1) that there are both positive and negative values used as inputs for the next layer which makes learning more flexible and (2) that the network’s learning regards all input features to a similar extent.

Normalization

After we normalize the data, we then split the dataset into training set and test set with a ratio of 70:30. The mean and standard deviation were used for our normalization.

The formula used to normalize all column values is : (Xdata-Xmean) / Xσ

Persistence

The persistence will be used as our benchmark for the model. We used MAE (Mean Absolute Error) because it roughly scales with loss, but it doesn’t capture the asymmetry. It is more useful for a financial problem like this case as it helps to train the network for profits instead of just minimize the loss because “being off by ten is just twice as bad as being off by 5”.

Data Windowing and Features Engineering

We are using the energy at T+18 (i.e. 18 hours into the future) as the actual output that needs to be predicted and the energy at T+0 for the difference -network. For the input features, we selected these for our current model:

  1. The mean, difference, max, and standard deviation values of the energy generated (column 1) from period T+0 to T-18 (18 hours into the past)
  2. Values of the forecasted wind speed (column 2) at T+0 and T+18
  3. The mean values of forecasted wind speed from period T+18 to T+0
  4. The maximum values of forecasted wind speed from period T+18 to T+0
  5. The difference of forecasted wind speed from period T+18 to T-18
  6. Values of the forecasted wind direction (column 3) at T+0 and T+18
  7. The difference of forecasted wind direction from period T+18 to T-18
  8. The spatial average of forecasted wind direction from period T+18 to T+0

The idea of using (1) is to learn the pattern and possibilities of having a spike of energy at a certain time. The use of difference is to reduce the lag as it’s useful and usually gives better results for time series data.

The idea of using the wind forecast data is to take into account the element of wind condition to the wind energy at a certain time and to know the condition of wind that is blowing in a day as a whole to predict whether it will give a spike of increasing or decreasing energy. On (5) and (7) we use the data from period T+18 to T-18 in order for the network to better understand the pattern of change from the wind condition as it will not change abruptly.

Model Building

Configuration and Network

We ran the program with multiple configurations to find the best parameter that will give us the best result. The parameters that are currently suited us the best are stated below:

[dropouts=0.1; bottleneck-size=10; neural network size=256; iters=10000; solver-type=Adam]

As for the network, we use Autoencoder and input scaling in order to get the current best result. The autoencoder is used because it is good for a large input dimension and input scaling gave us a better result than not using it. We did not use clamping as we didn’t see its need to use it for our case.

The bottleneck-size is used for our autoencoder’s middle layer and for its activation, we used tanh instead of relu because by our observation, using tanh gave us a much better result than using relu.

For the neural network model, we used a 4 layer network that will run after the autoencoder and input scaling. We used dropouts in every layer and a decreasing number of perceptron for every layer by 2/3 of its previous layer. The activation that we used is relu, and ended with 1 inner-product with 1 output. We also used a weight decay and regularization which have proven their function by giving us a better result than not using them. We used 0.0001 as our weight decay value and L2 as our regularization type.

Figure 2. Best config’s training and testing losses

Looking at the graph from figure 2 above, we can see that our model definitely can improve further. From 25 repeats that we have done to ensure that our configuration is consistent and that we got the best result, the test losses that we have is 0.492406.

By attaining this test loss, we have ensured that our model has beaten the MAE persistence from the pre-processing step that is our benchmark which is 0.6481707.

We ran this configuration that is our current best result to get the prediction results as shown below:

Figure 3. The training prediction
Figure 4. The test prediction

From the graph of figure 4 above, we can see that our spike is not perfect. However, it is not a bad spike either as we did not get any nasty spikes both at the bottom or above. It shows that our model can predict the novel data and is actually learning, not memorizing.

Figure 5. Actual VS prediction plot

As we can see from the picture in figure 5 above, it is not perfect and can be improved even further. However, we can see that we got linear-shaped plots, which means that our approach is on the right track and we were moving in the right direction.

Figure 6. The lagged correlation

The graph in figure 6 above is the lag correlation of our model. Currently, we still got a peak of 1 for the test which is not perfect as we want it both to have a 0 peak. However, it is by any means not a bad result as we got it close to 0.

Final Profit

We ran our model with the trading algorithm that is given to us in order to test whether our prediction could actually give the clients any profit. This simulation was using provided data and not live data.

The final result and profit that we got are shown in the picture below:

Figure 7. Final profit

Conclusion and Review

Overall, we think that with our current understanding, we got a quite good and satisfying result. One thing is that we did not end up in a minus profit even though it was not high, we still got some profit. This shows that we are on the right track and this model can still be improved even further to get better results.

From my observation and guesses, I believe that using a much more comprehensive input features and a more suitable normalization method can bring us a better prediction model as I do not have the knowledge with my current understanding. The data of the wind forecast are tricky as we have to handle the wind direction which are consists of degrees and such.

Nevertheless, we have developed a neural network model that is capable of predicting the wind energy production from wind farms and can be used by energy traders to help their clients regarding this problem.

Written by: Yonathan Wijaya

--

--

Yonathan Wijaya
The Startup

A Computer Science student, majoring in Intelligence System who has interest in Data Science. Currently, in an internship program as a Data Scientist.