Introduction to Energy Modelling — A ML Approach

Published in

LSEG Developer Community

6 min readSep 29, 2022

Read Jason Ramchandani’s full article on the Developer Portal.

This article will introduce a simple approach to begin to model energy markets. In our case we will be looking to see if we can ascertain any impact on German energy prices when the wind blows. Energy markets are quite complex to model nevertheless we will try to have a go at this using some of the tools available in the Python scientific computing ecosystem. I hope to demonstrate that it is possible with a little effort.

We will be using a mix of data from different sources combined with data from both Eikon and the Refinitiv Data Platform as well as our Point Connect service. We will be using some APIs and also some file-based sources. I will include any data files in the code repository so you can replicate.

Imports and Connections

Start by importing some libraries and setting our credentials.

API Calls

Our code uses the new RDP Search service which is an excellent tool for finding content, in our case Physical Assets such as windfarms — but there are many other search views to help you out such as instruments, people, entities and many more (reference guide). In our code, we provide a filter which contains the query syntax we need to narrow down the Physical Asset universe to the ones we want — wind farms located in Germany.

Visualization Workflow

We will be creating some visualisations using geopandas and shape files

Open Source Wind Data

Next we wish to download some wind data. There is plenty of open source data available — the data used here is sourced from Deutscher Wetterdienst (see further resources section for more detail) — this is from various weather monitoring stations in Germany — which record wind speed at a height of 10 meters. Maybe you have store of such or similar information in your own company. Here we are just reading it into a dataframe, wind. Data basis: Deutscher Wetterdienst, aggregated.

We also add two additional features Total Wind Speed (measuring across all weather stations) and Mean Wind Speed.

Locating Weather Monitoring Stations

For more detailed modelling, I have included a section of code where we can download the geographic locations of each of our wind monitoring sites.

Calculate nearest weather station for each windfarm

Here we will calculate the closest weather monitoring station for each of our windfarms using thee sciipy.spatial.distance cdist routine which calculates the distance from one point to a range of coordinates — then we just select the minimum (closest) one and store it to our dataframe.

Visualizations using GeoPandas, Shapefiles and Matplotlib

We will now do some visualisations working with GeoPandas and shapefiles.

In this section, we also use hourly air temperature and sum of solar incoming radiation data.

Getting Prices Data (our Labels)

Thus far we have been building our feature matrix (our wind, temperature and solar features) and now we wish to build our label dataset — in our case we wish to look at hourly EPEX Spot energy prices. All price data on Refinitiv Platforms is in GMT — whereas all the ther data is in local (CET) — therefore we need to shift our price data forward by 1hr to align it correctly. The source is our Point Connect repository — again have included this file in the github repo so you can replicate.

Creating z-scores

We also wish to create some z-scores to normalise variation across the dataset.

We want to compare the same hourly data for each day — so we also need to create a daily series for each hour in the day as it is these series we want to model.

Implement Machine Learning Model

For our modelling we will use an xgboost model — which a gradient-boosted decision tree class of algorithm — which is quite popular in the literature.

Split our dataset into train, evaluation and test sets

We need to split our dataset into a training set, an evaluation set and a test set.

Create feature and label sets

We now need to seperate our training (pri_train), evaluation (pri_eval) and test (pri_test) dataframes into their feature and label set components. Here we create a small function to do this for us.

Define our model to test and hand it our train, eval and test sets

So we are now ready to hand each of our 3 datasets for each hour from 0 to 23 to the XGBoost routine to generate a model for each hour. We define our XGB model using the Regressor class and a squared error objective function. We then pass the training feature and label sets to the fit function along with our evalution set. Once the modeel is generated, we will then store the feature importance graph and then generate a prediction for each hour and then calculate the Mean Squared Error (MSE) for each prediction versus the observed label.

Generate Visual Outputs

We have just created 24 hourly xgboost models, we now want to plot our feature importance (the higher the F-score the more important the feature) for each hourly series, as well as the accuracy of our ML generated predictions — using in our case the mean squared error (MSE) calculation we created above. We can easily do this is 2 steps.

Initial Observations

From the line charts above, we can see that the generated models are able to predict some of the downward spikes in spot energy prices. We could expand our model with features that might help us with this demand side modelling for example. Another next step might be to use only a subset of the highest F-score features and re-run the models to see if that can improve things, or indeed look at a suite of different classes of algorithm — we have only used xgboost but there are many other that we could implement within the main iterative routine I have given above.

Summary

A lot of ground was covered in this article so let’s summarize. We retrieved a set of physical asset (windfarms in our case) RICs from our excellent new RDP Search API. We used these to request some reference-type data for these windfarms using our Eikon Data API. We then created a new geometry ‘Point’ datatype where we stored locational co-ordinates for our windfarms. We followed this by loading some open source wind data, wind output data from our Point Connect repository and joined these together, creating 2 additional features along the way. We also loaded some open source solar energy and air temperature data. We then retrived the locations of our weather monitoring stations and as before, created a new geometry datatype for those. We were then able to calculate the closest weather monitoring station for each of our windfarms using the scipy.spatial cdist routine — though we didn’t use it in ths introductory model other than for visualisation — I included it as it might prove useful were you to develop more complex models. We then downloaded a shape file for Germany, converted our pandas dataframes to GeoPandas dataframes and ploted overlays for both windfarms and wind monitoring stations. Next we read in some price data (EPEX Spot) to use for our label data, shifted it forward one hour to align with our local time (CET) features and we then started to normalise our datasets using z-scores. After joining our feature and label dataframes, we then added a feature called hour which tells us which hour the timestamp refers to. We then split our data into training, evaluation and test sets and defined our feature and label sets for each of the 24 hours in the day. Next we defined 24 different XGBoost ML models (one for each hour of the day), handing it the appropriate training and evaluation sets and ploted our feature importance F-scores. Finally, we used our model to generate a label prediction based on our hold-out or test feature matrix and plotted that against the observed label values.

Whilst a simple and far from complete example, I hope this article can provide you with an idea of how to approach this area and a practical codebase to experiment and explore further.

Further Resources

Further Resources for Eikon Data API

For Content Navigation in Eikon — please use the Data Item Browser Application: Type ‘DIB’ into Eikon Search Bar.

Further Resources for Refinitv Data Plaform (RDP) APIs

Introduction to Energy Modelling — A ML Approach

Published in LSEG Developer Community

Written by Jonathan Legrand

No responses yet