Using Classification Analysis to Improve Efficiency and Sustainability in Conventional Power Plants

Shivang Kumar
4 min readNov 11, 2023

--

Optimising the efficiency and sustainability of power plants is a top priority in the volatile energy sector. Introducing Combined Heat and Power (CHP) technology has opened up opportunities for concurrently generating electricity and usable heat, improving overall energy utilisation. In this blog article, we look at an in-depth classification analysis of a conventional power plant dataset, focusing on CHP.

Problem Statement

Our primary goal is to create a predictive model that categorises power plants based on their CHP capabilities. We hope to find trends and extract actionable insights to boost operational efficiency, strategic decision-making, and sustainability activities.

About Dataset

This dataset contains data on conventional power plants for Germany and other selected European countries. The data includes individual power plants with their technical characteristics. These include installed capacity, primary energy source, type of technology, CHP capability, and geographical information. The geographical scope primarily focuses on Germany, Austria, Switzerland, and Luxembourg.

Data Cleaning

Before we start the classification analysis on this dataset, we must do some basic data cleaning, like removing null values. We removed the ID column because it was all unique values except for four values( so it will not be suitable for analysis). We removed the block_bnetza, name_uba, street, capacity_gross_uba, chp_capacity_uba, retrofit, shutdown, type, eic_code_plant, eic_code_block, efficiency_data, efficiency_source, energy_source_level_3, merge_comment, and comment columns because they have more than half data as null values so these variables are not suitable for analysis. We removed the commissioned_original column because it was an extension of the commission column.

PredictEasy Analysis

We Select all independent variables as x (except for the chp column) and the chp column as the dependent variable. PredictEasy analysis showed that country and latitude aren’t contributing to the model from the below chart.

So, to fine-tune it, we remove the country and latitude from the independent variables, and we also remove the longitude variables because latitude and longitude come in pairs.

Feature Rank from PredictEasy

After fine-tuning, we see that efficiency_estimate, EEG, energy_source_level_2, postcode, state, and city aren’t contributing. We compare the Feature Rank chart and SHAP chart. Also, there was another reason for removing the EEG variable: there were many no values in it, so it was not a balanced variable.

After further fine-tuning the model using the same method as above, we got the final model with an Accuracy of 82% and Precision of 82%.

What-If Analysis in PredictEasy

We can see that from what-if analysis, and if energy_source is Biomass and biogas, then it has CHP status, which means that this power plant has cogeneration property.

What-If Analysis by PredictEasy

But if we change the energy_source to Hydro and technology to Pumped storage, it doesn’t have a CHP status.

What-If Analysis by PredictEasy

Also, if we change the commissioned date to early 1900 of the power plant and use older technology, it doesn’t have CHP status.

What-If Analysis by PredictEasy

Key Takeaways and Practical Applications

Our analysis emphasises the necessity of considering energy source technology when evaluating a power plant’s CHP capability. Power plant operators can use this information to prioritise improvements and investments in CHP systems, improving energy efficiency and contributing to sustainable energy practices.

Source of Dataset:

https://data.open-power-system-data.org/conventional_power_plants/2018-12-20

--

--