Derive Insights using Deep Learning to Optimize Energy Footprints of a Combined Cycle Power Plant

Rohit Malhotra
7 min readMar 11, 2022

--

Photo by Karsten Würth on Unsplash

Introduction

After recent commitments made at COP 26 summit , we are witnessing a tremendous rush among various stakeholders to chalk out policies , reforms & strategies to decarbonize the environment . This can be done by gradually reducing dependence over fossil fuels and parallely expanding the use of various forms of renewable energy like Solar , Wind , Green Hydrogen. Considering the fact that new green technologies calls for huge capital expenditure , therefore it is imperative especially for developing countries to also explore avenues to reduce carbon footprints by optimising the existing operation practices of usage of fossil fuel. This optimisation can be achieved through deployment of Artificial Intelligence tools to their operation.

This paper illustrates how about how we can predict power produced by a Combined Cycle Power Output using Deep Learning ANN Model . Predictions and insights from the model can be leveraged to optimize & take beforehand action to control Key process indicators affecting Efficiency of a power plant.

Problem Statement

We are having a dataset containing logged vital parameters of a Combined Cycle Power Plant. Parameters are :

  1. AT : Atmospheric Temperature
  2. V: Steam Turbine Exhaust Pressure
  3. AP: Atmospheric Pressure
  4. RH: Relative Humidity
  5. Power Produced by Plant

The dataset contains 9568 data points collected from a Combined Cycle Power Plant over 6 years (2006–2011), when the power plant was set to work with full load. You can find data here

A combined cycle power plant (CCPP) is composed of gas turbines (GT), steam turbines (ST) and heat recovery steam generators.

In a CCPP, the electricity is generated by gas and steam turbines, which are combined in one cycle, and is transferred from one turbine to another. While the Vacuum is collected from and has effect on the Steam Turbine, the other three of the ambient variables effect the GT performance

A Schematic Of A Combined Cycle Power Plant

Now we will start steps to Build ANN Model to Predict Power Output.

  1. Importing the libraries
import numpy as np
import pandas as pd
import tensorflow as tf

2. Data Reading & Pre-Processing

dataset = pd.read_excel('Folds5x2_pp.xlsx')
dataset.head()

Dataset looks like this :

Dataset
dataset.info()# We can see that there are no null values in the dataset.

Now we will separate dependent & Independent variables.

# Seperating Dependent and Independent Variables
X=dataset.drop(['PE'],axis=1)
y=dataset['PE']

3. Data Visualization

3. 1 Ambient Temp Vs Power Output

Key Insight:
# Strong negative relation between Power Output and Ambient Temperature

3.2 Exhaust Vacuum Pressure and Power Output

Key Insight:

Strong negative relation between Exhaust Vacuum Pressure and Power Output.

3.3 Ambient Pressure and Power Output

Key Insight :Strong positive relation between Ambient Pressure and Power Output.

3.4 Relative Humidity Vs Power Output

Key Insight: Strong positive relation between Relative Humidity(RH) and Power Output.

4. Splitting the dataset into the Training set and Test set

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

5. Scaling of Input Features

from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
# Performing scaling of variable is important to reach to global minimum point at an earliest.

6. Building ANN Model

Initializing the ANN Model

ann = tf.keras.models.Sequential()# Creating instance ann of sequential class.
# Initialising the ANN Model

Adding the input layer

ann.add(tf.keras.layers.Dense(units=15, activation='relu'))# Adding input layer having 15 neaurons/nodes. No need to specify the number of features as it will automatically infer from the dataset

Adding the first hidden layer

ann.add(tf.keras.layers.Dense(units=15, activation='relu'))# Adding the second hidder layer having 15 neurons.

Adding the Output Layer

ann.add(tf.keras.layers.Dense(units=1,activation='linear'))# Adding final output layer whhich will predict Power produced.

7. Training the ANN Model

Compiling the ANN Model

ann.compile(optimizer = 'adam', loss = 'mean_squared_error')nn.compile(optimizer = 'adam', loss = 'mean_squared_error')
  • Optimizer : An optimizer is a function or an algorithm that modifies the attributes of the neural network, such as weights and learning rate. Thus, it helps in reducing the overall loss and improve the accuracy.
  • Loss Function : The loss function in a neural network quantifies the difference between the expected outcome and the outcome produced by the machine learning model. From the loss function, we can derive the gradients which are used to update the weights

Training the ANN Model on Training Set

history=ann.fit(X_train, y_train,validation_split=0.33,epochs = 100)
Glimpse of Training Results
  • Accessing Model Training HIstory : Keras provides the capability to register callbacks when training a deep learning model.One of the default callbacks that is registered when training all deep learning models is the History callback. It records training metrics for each epoch. This includes the loss and the accuracy (for classification problems) as well as the loss and accuracy for the validation dataset, if one is set.

The history object is returned from calls to the fit() function used to train the model. Metrics are stored in a dictionary in the history member of the object returned.

For example, you can list the metrics collected in a history object using the following snippet of code after a model is trained:

  • Validation Split: Float between 0 and 1. Fraction of the training data to be used as validation data. The model will set apart this fraction of the training data, will not train on it, and will evaluate the loss and any model metrics on this data at the end of each epoch.
  • Epoch is one full iteration of forward and backward propagation.
import matplotlib.pyplot as plt
# summarize history for loss
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train','test'], loc='upper left')
plt.show()
Model Loss Vs Epochs

Above graph shows Prepared Model Loss when checked on Training & Validation Data. We can see that loss for both train & validation data is gradually decreasing which means that our model has performed very well and there is no issue of Overfitting or Under-fitting.

8. Predicting the results of the Train set

from sklearn.metrics import mean_squared_error, r2_score
#X_train, X_test, y_train, y_test
y_pred_train=ann.predict(X_train)
r2_score(y_train,y_pred_train)
# Getting around 93% R2 score on train dataset.

9. Predicting the results of the Test Set

from sklearn.metrics import mean_squared_error, r2_score
#X_train, X_test, y_train, y_test
y_pred_test=ann.predict(X_test)
r2_score(y_test,y_pred_test)
# Getting around 94% R2 score on test dataset

10. Create a Dataframe containing Predicted Output Vs Actual Output Chart

y_pred_train=pd.Series(np.ravel(y_pred_train))# Converting 2d array to 1d array and then to Pandas Series object.# Creating a Dataframe of Pred Power Vs Actual Power
X1=pd.concat([y_train,y_pred_train],names=['Actual Power','Predicited Power'],axis=1)
X1.columns=['Actual Power','Predcited Power']# Creating Error functionX1['Error']=(X1['Predcited Power']-X1['Actual Power'])/X1['Actual Power']*100
Actual Power Vs Predicted Power Vs Error

11. Calculating Feature Importance or ranking Importance of Feature using Permutation Importance Module of Sklearn Library

# permutation feature importance with knn for regression
from sklearn.datasets import make_regression
from sklearn.neighbors import KNeighborsRegressor
from sklearn.inspection import permutation_importance
from matplotlib import pyplot
# define dataset
#X, y = make_regression(n_samples=1000, n_features=10, n_informative=5, random_state=1)
# define the model
model = KNeighborsRegressor()
# fit the model
model.fit(X_train, y_train)
# perform permutation importance
results = permutation_importance(model, X_train, y_train, scoring='neg_mean_squared_error')
# get importance
importance = results.importances_mean
# summarize feature importance
for i,v in enumerate(importance):
print('Feature: %0s, Score: %.5f' % (dataset.columns[i],v))
# plot feature importance
pyplot.bar([x for x in range(len(importance))], importance)
pyplot.show()
Feature Importance

RANK 1 : Ambient Temperature

RANK 2: Vacuum Pressure

RANK 3: Ambient Pressure

RANK 4: Relative Humidity

12. CONCLUSION

So we have performed all the steps required to build and predict result from ANN Model. R2 value (metric to measure performance of our model)we are getting for both train & test data is around 94% which is very good. This can be further enhance using various other tuning tools for ANN.

Model results showing how the process variable are related to efficiency of a power plant. These variable can be properly monitored and regulated in order to maximize the output. This concept can be used to harness results at any other Industrial plant say refinery , Petrochemical etc.

Thanks for reading.

Keep Learning.Keep growing.Keep on trying to make things better then present.

You can contact me via my Linkedin.

Loved reading the article? Become a Medium member to continue learning without limits. I’ll receive a small portion of your membership fee if you use the following link, with no extra cost to you.

--

--

Rohit Malhotra

Passionate to Utilize Capabilities of Data Analytics to Improve Performance of Industrial Assets. https://www.linkedin.com/in/rohitmalhotra67/