using pandas to figure out daily energy price spreads

This was an assignment that came to me via a friend (let’s say), who was an energy trader and needed a way to figure out the price spread in daily marginal energy prices. The prices themselves are extracted from daily files that are supplied by energy wholesale suppliers called LMP files (i.e. locational marginal pricing). And they are the typical dense .csv files that one would expect. Energy traders make money by going through these files and figuring out “the spread” between the highest and lowest hourly congestion prices, daily. So lets get to it…

  1. Read the file into Pandas:
import pandas as pd 
LMP = pd.read_csv(‘LMP.csv’, sep=’,’, header=7, index_col=False, skipfooter=1, parse_dates=True, engine=’python’)
the pandas dataframe (congestion and marginal loss prices each numbered 0–23, as hourly rates)

2. Interesting columns: Our friend isn’t really interested in all 80 columns in this dataset and asks for just specific columns that she wants to see. Easily acomplished in pandas (with a dash of python list comprehension):

# extract [date, nodeID, Tpe] columns
LMP_left = LMP[[‘Date’,’PnodeID’, ‘Type’]]
# extract all columns that contain congestionPrice and TotalLMP 
dataCols = [col for col in LMP.columns if 'CongestionPrice' in col or 'Total' in col]
LMP_right = LMP[dataCols]
# combine the two tables to get the final dataframe 
modLMP = pd.concat([LMP_left, LMP_right], axis=1)
the head() of the new datafame with interesting columns only

3. Find the highest congestion prices: we can make a new dataframe for just congestion prices and ask for the highest prices.

# set up a dataframe that contains congestion prices for every hour 
congestion_columns = [col for col in modLMP.columns if ‘CongestionPrice’ in col]
congestion_table = modLMP[congestion_columns]
# find the maximum congestion price by hour 
table.max() shows maximum congestion by hour

4. Find the node ID and time of day of the highest and lowest congestion prices: Pandas doesn’t have a handy function for finding the index of the highest value in the entire table, so we have to write something slightly clever here:

# index of maximum congestion price in main table 
df = congestion_table
s = df.max()[df.max() == df.max(index=1).max()].index
s = str(s[0])
abs_max_index = df.idxmax()[s]
# Node ID of the highest congestion price found 
nodemax = modLMP.loc[abs_max_index, 'PnodeID']
print('max node = '+ str(nodemax))
# the hour that the highest congestion price occured 
high_time = congestion_table.max()[congestion_table.max() == congestion_table.max(axis=1).max()].index
high_C = str(high_time[0])
high_C = high_C[16:] + "hr"
print('maximum congestion occured at '+ high_C)
when and where the maximum congestion price was found

The same logic also works for finding the node ID and time of day for the lowest congestion price:

when and where the minimum congestion price was found

5. Visualize: In order to make sure that we really did find the correct daily spread, we can plot the congestion price dataframe. Here I use boxplots for the hourly prices, fit a spline curve to the highest and lowest hourly prices, and use code to automatically place labels for the daily highest and lowest prices calcualted above.

import seaborn as sns
from matplotlib import pyplot as plt
plt.rcParams['figure.figsize'] = (15.0, 11.0)
maxes = congestion_table.max().tolist()
mins = congestion_table.min().tolist()
from scipy.interpolate import spline
old_x = np.linspace(0,23,24)
new_x = np.linspace(old_x.min(),old_x.max(),300)
maxes_curve = spline(old_x,maxes,new_x)
mins_curve = spline(old_x,mins,new_x)
hi_x_loc = high_C
hi_x = int(hi_x_loc[:-2])
lo_x_loc = low_C
lo_x = int(lo_x_loc[:-2])
plt.plot(new_x,maxes_curve, linewidth=1)
plt.plot(new_x,mins_curve, linewidth=1)
plt.text(hi_x,max_congestion+4, 'high: ' +high_C + '| Node: '+ str(nodemax), color='r')
plt.text(lo_x,min_congestion-4, 'low: '+low_C + '| Node: '+ str(nodemin), color='r')
plt.savefig('spread_chart.png')hourly price
hourly price boxplots show that the result was indeed correct