Using Deep Learning to Forecast a Wind Turbines Power Output

Implementing a Long Short-Term Memory (LSTM) model on time series data with a walk-forward validation approach

Published in

The Startup

9 min readJan 28, 2021

Wind or sunshine, there is always an opportunity to harness energy from the elements. This could not be more accurate for the renewable energy sector; however, predicting the intermittency of renewable technology can be challenging.

This post will focus on how to make predictions on the power output of a wind turbine. I will go into detail about the data, additional features added to the dataset and the deep learning model logic to make two-weeks of hourly predictions.

Scroll down to the bottom to see the final result!

The Data

With any data science project, it is essential to have a sound understanding of the data. However, as the purpose of this article is to apply a deep learning model on a time-series dataset, I will give a brief overview of the data, in addition to the feature engineering process and a few insights through Exploratory Data Analysis (EDA).

The original dataset is from the Kaggle website. It can be found here.

The dataset includes observations of a wind turbine located in Turkey throughout 2018. The dataset has no missing values, and the features include:

Date/Time: in 10 min intervals
LV Active Power (kW): the power generated by the turbine
Wind Speed (m/s): speed of the wind at hub height
Theoretical Power (kW): given by the turbine manufacturer
Wind Direction (°): wind direction at hub height

Brief EDA & feature engineering

Even though maintenance is a key requirement to keep wind turbines running efficiently, the purpose of the model is to predict the intermittency of renewable technology; therefore observations meeting the criteria: Wind Speed ≥ 3.3 m/s and LV Active Power = 0 are an indication the turbine is under maintenance and are dropped from the dataset.

A Loss feature is added to the dataset. This is the difference between the Theoretical Power and LV Active Power. The x component (x_com) and y component (y_com) of Wind Direction and Wind Speed are appended to the dataset. The equations below show how both components are calculated.

W_s and W_d represent Wind Speed and Wind Direction, respectively. With the addition of x_com and y_com as new features, Wind Direction is dropped from the dataset.

As explained in the problem statement, the model will predict 2-weeks with hourly intervals of the target variable (LV Active Power). Therefore, the dataset is resampled into hourly intervals. The figure below shows the plot of LV Active Power resampled hourly without maintenance.

Hourly LV Active Power in 2018 for the wind turbine. (photo by Amit Bharadwa)

The Autocorrelation Function (ACF) defines how data points in a time-series are related, on average, to the preceding data points. The Partial Autocorrelation Function (PACF) identifies a relationship between an observation and with an observation at a prior time step. The relationship between intervening observations is removed.

The figures below show the ACF and PACF for LV Active Power over 30 days.

ACF and PACF over 30 days. (photo by Amit Bharadwa)

The ACF plot indicates a clear positive correlation between delayed time lags of LV Active Power. The PACF plot shows a time lag of T-1 as the most significant. Due to this test, an additional feature (T_1) is appended to the dataset.

Using the Augmented Dickey-Fuller test, all series in the dataframe prove stationarity.

The figure below shows the dataframe after the feature engineering process, with Wind Direction dropped. This dataframe is used for building the model.

Dataframe used for modelling. (photo by Amit Bharadwa)

You can find my code on data wrangling, feature engineering and EDA for this project here.

The resampled dataset used for modelling can be found here.

Now let’s move on to the fun stuff!!

Modelling with Deep Learning

Long Short Term Memory (LSTM) is an artificial Recurrent Neural Network (RNN) used in deep learning. LSTM ability to learn long sequences of data is a popular choice for time series forecasting.

The typical RNN experiences a phenomenon called the “vanishing gradient problem”, more on this here. In short, LSTM does not encounter this problem, which makes it a great solution for capturing long term patterns in data.

Preprocessing

As mentioned earlier, the dataset contains Date/Times throughout the year 2018. The first step is to split the data into training and testing datasets. The training data includes all Date/Times from January to November inclusive. December is used for testing.

import pandas as pd df = pd.read_csv('data/hourly_nm.csv',index_col='Date/Time')start_test = '2018-11-31'train, test = df.loc[:start_test],df.loc[start_test:]

Secondly, as the default LSTM activation function is tanh, all values from the test and train dataset are scaled from -1 to 1.

from sklearn.preprocessing import MinMaxScalerSCALER = MinMaxScaler(feature_range=(-1,1))scaler = SCALER.fit(train.to_numpy())train_scaled = scaler.transform(train.to_numpy())
test_scaled = scaler.transform(test.to_numpy())

Lastly, an LSTM layer requires a 3-dimensional input shape [samples, timesteps, features]. The function below does exactly this. The predictor variables (Wind Speed, Theoretical Power, Loss, x_com, y_com, T_1) are returned as one array, following the required input shape of an LSTM layer, with a time step of 2-weeks (2 x 7 x 24) observations. The target variable (LV Active Power) is returned as another array.

import numpy as nptimestep = 24*7*2 # 24hours,7days,2weeksdef create_dataset(dataset, timestep=timestep):
    
    X, y = [], []
    for i in range(len(dataset)):
        target_value = i + timestep
        if target_value == len(dataset):
            break
        feature_chunk, target = dataset[i:target_value, 1:],   dataset[target_value, 0]
        X.append(feature_chunk)
        y.append(target)
    
    return np.array(X), np.array(y)

The diagrams below illustrate how the create_dataset() function works, how a sample (2-weeks of observations) of the predictor variables is mapped to a single observation of the target variable.

Visual representation of the **create_dataset()** function. (photo by Amit Bharadwa)

Following the data splitting convention (X_train, y_train, X_test, y_test) by transforming train_scaled and test_scaled using the create_dataset() function, the code below shows the output shape of X_train, y_train, X_test, y_test.

X_train, y_train = create_dataset(train_scaled)
X_test, y_test = create_dataset(test_scaled)print(X_train.shape)
print(y_train.shape)
print(X_test.shape)
print(y_test.shape)

(7679, 336, 6)
(7679,)
(408, 336, 6)
(408,)

Walk-forward validation logic

To make the most accurate forecast at a time (t), a model would need the latest time step (t-1). A walk-forward validation method follows this approach. The steps behind this method are as follows:

The first step is to build an LSTM model with the training data.
The second step is to make a single prediction for the next timestep (t+1).
The third step is to store the prediction so it can be compared with the real value.
The fourth step is to add the actual value to the training data.
The last step is to repeat this process from step 2, for the length of the predictor variable testing data (X_test).

Putting it all together

Firstly, an LSTM neural network is built and trained. The diagram below illustrates the LSTM model architecture.

The architecture of the LSTM neural network. (photo by Amit Bharadwa)

The hyperparameters for the model are as follows:

Number of Epochs = 35
Batch Size = 14
ADAM Optimizer (learning rate = 0.0005)
Dropout = 0.05
Validation split = 30%

import keras
from keras.layers import LSTM, Dense, Dropout
from keras.callbacks import EarlyStoppingdef create_model(X_train, y_train):
        
        units = 32
        dropout = 0.05
        epochs = 35
        batch_size = 14
        optimizer = keras.optimizers.Adam(learning_rate=0.0005)
        early_stopping = EarlyStopping(patience=7, monitor='loss')        model = keras.Sequential()        model.add(LSTM(units=units, dropout=dropout,      return_sequences=True,input_shape=(X_train.shape[1],    X_train.shape[2])))
        
        model.add(LSTM(units=units, dropout=dropout))
        
        model.add(Dense(units=1))        model.compile(optimizer=optimizer,loss='mean_squared_error')
        
        history = model.fit(X_train, y_train, validation_split=0.3, shuffle=False, epochs=epochs, batch_size=batch_size, verbose=1, callbacks=[early_stopping])
       
        return model, history

The function create_model(), builds an LSTM model with the specified hyperparameters and architecture. It takes in the predictor variables (X_train) and target variables (y_train) and returns a trained model with a history object (more on this during the validation loss section).

def single_prediction(model, history, timestep=timestep):
        
        history = np.array(history)
        history = history.reshape(history.shape[0]*history.shape[1],   history.shape[2])
        
        input_value = history[-timestep:]
        input_value = input_value.reshape(1, input_value.shape[0], input_value.shape[1])
        
        yhat = model.predict(input_value, verbose=0)
        return yhat

The function single_prediction(), returns a prediction for the next time step (t+1) using the trained LSTM model and all data points before time (t) inclusive.

def walk_forward_prediction(X_train, y_train, X_test, timestep):
    
    MODEL, history = create_model(X_train=X_train, y_train=y_train)
    hist_train = [i for i in X_train]
    predictions = []
    
    for i in range(len(X_test)):
        test = X_test[i]
        yhat = single_prediction(model=MODEL, history=hist_train, timestep=timestep)
        predictions.append(yhat) 
        hist_train.append(test)
    
    return predictions, history, MODEL

The walk_forward_prediction() function follows the logic explained in the walk-forward validation logic section. The function returns a list of predictions (values between -1 and 1), the trained LSTM model and the history object.

def prior_inverse(features, targets):
   
    dataset = []
    
    for i in range(features.shape[0]):
        last_row, target = features[i][0], targets[i]
        appended = np.append(last_row, target)
        dataset.append(appended)
    
    return np.array(dataset)

As all the data points have been scaled between -1 and 1. The function prior_inverse() prepares the data into the necessary format to be scaled into a meaningful value.

def experiment(X_train, y_train, X_test, timestep):
    
    pred_seq, history, MODEL = walk_forward_prediction(X_train, y_train, X_test, timestep)
    
    pred_seq = np.array(pred_seq).reshape(-1)    pred = prior_inverse(X_test, pred_seq)
    real = prior_inverse(X_test, y_test)    inv_pred = scaler.inverse_transform(pred)
    inv_real = scaler.inverse_transform(real)    power_pred = inv_pred[:,-1]
    power_real = inv_real[:,-1]
    
    return power_real, power_pred, history, MODEL

The last function experiment() wraps up the logic of all functions together. walk_forward_prediction() creates a model and a list of predictions. The data is then reformated using prior_inverse() and transformed into meaningful values.

Validation loss

Running experiment() starts training the model and returns the predicted power out, actual power output, trained LSTM model and the history object.

power_real, power_pred, history, MODEL = experiment(X_train, y_train, X_test, timestep)loss = history.history['loss']
val_loss = history.history['val_loss']

The end of the training epochs should look like this.

End of training epochs. (photo by Amit Bharadwa)

In this case, the returned history object records the training metric, Mean Squared Error (MSE), at the end of each training epoch. This includes the MSE loss for the training and validation data (30%), for the specified length of training epochs. A visual representation can be seen with the history objects learning curves.

A learning curve is to have an optimal model fit if:

The training loss decreases to the point of stability.
The validation loss decreases to the point of stability.
The generalization gap (the gap between the training and validation loss curve) is minimal.

A plot of MSE loss for the training and validation data can be seen in the figure below. The training and validation set converged at 0.063 and met the requirements for an optimal model.

Training and validation loss (photo by Amit Bharadwa)

Final results

As December is used for testing, 2-weeks of observations for the predictor variables are used to make the first prediction. For this reason, the first prediction is midway through December. The figure below shows a comparison between the true LV Active Power and the LSTM neural network prediction.

2-weeks prediction from LSTM model. (photo by Amit Bharadwa)

Not so bad right!

The model can predict the highs and lows of power production over 2-weeks; however, the model cannot predict periods the turbine is at a complete standstill.

I hope you enjoyed reading this post as much as I have enjoyed writing it. Thank you for your time.

To Get in Touch

Linkedin: https://www.linkedin.com/in/amit-bharadwa123/