Petroleum Investment Strategies: Leveraging Production-Consumption Gap for a 32% Alpha

Double L
TradeMind_Analytics
7 min readJul 3, 2024

My strategy harnesses the dynamics of supply and demand in petroleum markets for investment timing. From January 2019 to June 2024, it consistently outperformed traditional petroleum securities, achieving a Sharpe ratio of 1.18 compared to crude oil futures’ 0.39. The strategy delivered an impressive annualized return of 40%, significantly surpassing the 8% annualized return of crude oil futures during the same period.

I. Proposal

Petroleum securities are intricately tied to supply and demand dynamics, a well-established relationship in financial markets. To analyze these dynamics, I intend to leverage supply and demand data sourced from the EIA’s short-term energy outlook. Evaluating the supply-demand balance can be approached in two ways:

1). Firstly, by tracking the moving average of production-consumption differentials. A positive value indicates surplus supply, suggesting potential price declines, and vice versa.

2). Secondly, examining the percentage change in these production-consumption differentials from the previous month offers insights; a positive percentage signifies a deteriorating supply-demand scenario, typically associated with price declines, and vice versa.

Aligning these two indicators guides our decision to take either short or long positions in the market; otherwise, we remain sidelined.

II. Data Source

EIA Short-term Energy Outlook Data Browser

  • 3d. Total Crude Oil Production
  • 3e. World Petroleum and Other Liquid Fuels Consumption

Yahoo Finance

  • Crude Oil Futures (CL=F)
  • United States Oil Fund, LP (USO)

III. Python Codes

# import required packages
import pandas as pd
import numpy as np
import yfinance as yf
import matplotlib.pyplot as plt
import datetime

Step 1: Retrieving Data

I’m simulating a real-world scenario where I adjust my position immediately after the release of the EIA report. Typically, the release day is the first Monday of the month. To ensure that the exchange is open and transactions can be made on that day, I use the closest trading day to the release date as a reference for each month.

def find_release_day(year, month, dates):
first_day = datetime.date(year, month, 1)

# Find the first Monday
offset = (7 - first_day.weekday()) % 7
first_monday = first_day + datetime.timedelta(days=offset)
first_monday = datetime.datetime.combine(first_monday, datetime.datetime.min.time())


# Filter dates in the list that match the specified month and year and are not earlier than the first Monday
dates_in_month = dates[(dates.year == first_monday.year) & (dates.month == first_monday.month) & (dates.day >= first_monday.day)]

if not dates_in_month.empty:
# Find the closest date in the filtered dates
return min(dates_in_month, key=lambda x: abs(x - first_monday))
else:
return -1
# Download the adjusted closing prices for Crude Oil futures (symbol 'CL=F') from Yahoo Finance starting from '2019-01-01'
CLF = yf.download('CL=F', '2019-01-01')['Adj Close']
# Convert the index of the CLF DataFrame to datetime format
CLF.index = pd.to_datetime(CLF.index)

# Initialize an empty list to store the release days
release_days = []
# Loop through the years 2019 to 2023
for yy in range(2019, 2024):
# Loop through each month from January to December
for mm in range(1, 13):
# Append the release day for the given year and month to the release_days list
release_days.append(find_release_day(yy, mm, CLF.index))
# Loop through the months from January to July in 2024
for mm in range(1, 8):
# Append the release day for 2024 for the given month to the release_days list
release_days.append(find_release_day(2024, mm, CLF.index))

# Select the rows in the CLF DataFrame that correspond to the release days
CLF.loc[release_days]
# Read the Total World Petroleum Consumption data from the CSV file
consum = pd.read_csv("/Users/levilan/Desktop/Portfolio Management/All-Weather Porfolio/Total_World_Petroleum_Consumption.csv")
# Convert the 'Month' column to datetime format with the specified format
consum['Month'] = pd.to_datetime(consum['Month'], format='%b-%y')
# Filter the consumption data for dates between '2019-01-01' and '2024-07-31'
consum = consum.loc[(consum.Month >= '2019-01-01') & (consum.Month <= '2024-07-31')]
# Replace the 'Month' column with the reversed list of release days
consum['Month'] = release_days[::-1]
# Set the 'Month' column as the index of the DataFrame
consum.set_index('Month', inplace=True)

# Read the Total World Petroleum Production data from the CSV file
prod = pd.read_csv("/Users/levilan/Desktop/Portfolio Management/All-Weather Porfolio/Total_World_Petroleum_Production.csv")
# Convert the 'Month' column to datetime format with the specified format
prod['Month'] = pd.to_datetime(prod['Month'], format='%b-%y')
# Filter the production data for dates between '2019-01-01' and '2024-07-31'
prod = prod.loc[(prod.Month >= '2019-01-01') & (prod.Month <= '2024-07-31')]
# Replace the 'Month' column with the reversed list of release days
prod['Month'] = release_days[::-1]
# Set the 'Month' column as the index of the DataFrame
prod.set_index('Month', inplace=True)
# Join the consumption and production DataFrames, then join the resulting DataFrame with the USO DataFrame
df = consum.join(prod).join(CLF)
# Rename the columns of the DataFrame to 'Consumption', 'Production', and 'USO'
df.columns = ['Consumption', 'Production', 'CLF']
df.head()

Step 2: Processing Data and Generating Position Signal

# Calculate the gap between production and consumption
df['Gap'] = df['Production'] - df['Consumption']
# Calculate the moving average (MA) of the Gap, reversed, with a window of 2
df['MA'] = df['Gap'][::-1].rolling(2).mean()
# Calculate the percentage difference of the Gap compared to the previous period
df['diff'] = df.Gap / df.Gap.shift(-1) - 1
# Determine the trading position based on conditions of MA and diff
df['Position'] = np.where((df['MA'] < 0) & (df['diff'] < 0), 1,
np.where((df['MA'] > 0) & (df['diff'] > 0), -1, 0))
# Calculate the return of CLF shifted by 1 period
df['Return'] = df.CLF.shift(1) / df.CLF - 1
# Set the return of the last row to 0 to avoid NaN
df.iloc[-1, df.columns.get_loc('Return')] = 0
# Calculate the rebased CLF (Cumulative Logarithmic Frequency) as the cumulative product of returns, reversed
df['rebasedCLF'] = (1 + df['Return'][::-1]).cumprod()[::-1]
# Calculate the performance of the strategy as the position multiplied by the return
df['Performance'] = df['Position'] * df['Return']
# Calculate the cumulative performance of the strategy, reversed
df['CumPerformance'] = (1 + df['Performance'][::-1]).cumprod()[::-1]

# Display the first 5 rows of the DataFrame
df.head()

Step 3: Performance Metric & Visualization

# Calculate the maximum drawdown of the strategy
strategy_running_max = np.maximum.accumulate(df['CumPerformance'][::-1])
strategy_running_max[strategy_running_max < 1] = 1
strategy_drawdown = df.CumPerformance / strategy_running_max - 1
strategy_max_dd = strategy_drawdown.min()
# Add the strategy's maximum drawdown to the DataFrame
df['strategy_MaxDD'] = strategy_drawdown[::-1]

# Calculate the maximum drawdown of the rebased CLF
CLF_running_max = np.maximum.accumulate(df['rebasedCLF'][::-1])
CLF_running_max[CLF_running_max < 1] = 1
CLF_drawdown = df.rebasedCLF / CLF_running_max - 1
CLF_max_dd = CLF_drawdown.min()
# Add the CLF's maximum drawdown to the DataFrame
df['CLF_MaxDD'] = CLF_drawdown[::-1]
print(f"maximum drawdown: {round(strategy_max_dd,2)} (Strategy); {round(CLF_max_dd,2)} (CL=F)")

# Calculate the Sharpe ratio for the strategy
strategy_sharpe_ratio = np.mean(df.Performance) / np.std(df.Performance) * (12 ** 0.5)
# Calculate the Sharpe ratio for CL=F
CLF_sharpe_ratio = np.mean(df.Return) / np.std(df.Return) * (12 ** 0.5)
# Print the Sharpe ratios for the strategy and CL=F
print(f"Sharpe ratio: {round(strategy_sharpe_ratio, 2)} (Strategy); {round(CLF_sharpe_ratio, 2)} (CL=F)")

# Calculate the difference in days between the start and end dates of the DataFrame
diff_day = df.index[0] - df.index[-1]
# Calculate the holding period in days
holding_period = diff_day.days
# Calculate the cumulative return of the strategy
strategy_cum_return = df.CumPerformance.dropna()[0] - 1
# Calculate the annualized return of the strategy
strategy_annual = (strategy_cum_return + 1) ** (365 / holding_period) - 1
# Calculate the cumulative return of CLF (rebased CLF)
CLF_cum_return = df.rebasedCLF.dropna()[0] - 1
# Calculate the annualized return of CLF
CLF_annual = (CLF_cum_return + 1) ** (365 / holding_period) - 1
# Print the annualized returns for the strategy and CL=F
print(f"Annualized return: {round(strategy_annual, 2)} (Strategy); {round(CLF_annual, 2)}")
# Create a figure and a set of subplots with 4 rows and 1 column, sharing the x-axis
fig, axs = plt.subplots(4, 1, figsize=(10, 12), sharex=True)

# Plotting the CLF and the gap (MA) on the first subplot
axs[0].plot(df.index, df['CLF'], label='CLF', color='Purple')
axs[0].set_title('United States Oil Fund LP & Petroleum Supply Demand Gap')
axs[0].legend()
axs[0].grid()

# Creating a secondary y-axis for the gap (MA) on the first subplot
ax2 = axs[0].twinx()
ax2.bar(df.index, df.MA, width=7, alpha=0.7, color='purple', label='Gap')
ax2.legend(loc='lower right')
ax2.axhline(y=0, color='gray', linestyle='--', linewidth=1)
ax2.set_ylabel('Gap')

# Plotting the Position on the second subplot
axs[1].plot(df.index, df['Position'], color='grey')
axs[1].set_title('Position')
axs[1].grid()

# Plotting the rebased CLF and strategy cumulative performance on the third subplot
axs[2].plot(df.index, df['rebasedCLF'], color='red', label='Rebased CLF')
axs[2].plot(df.index, df['CumPerformance'], color='orange', label='Strategy')
axs[2].set_title(f'Cumulative Performance (Most Recent {round(df.CumPerformance.dropna()[0], 2)})')
axs[2].fill_between(df.index, df['CumPerformance'], color='orange', alpha=0.3)
axs[2].legend()
axs[2].grid()

# Plotting the strategy and CLF maximum drawdowns on the fourth subplot
axs[3].plot(df.index, df['strategy_MaxDD'], color='orange', label='Strategy')
axs[3].plot(df.index, df['CLF_MaxDD'], color='red', label='Rebased CLF')
axs[3].set_title(f'Drawdown (Maximum {round(strategy_max_dd, 1) * 100}%)')
axs[3].fill_between(df.index, df['strategy_MaxDD'], color='orange', alpha=0.3)
axs[3].legend()
axs[3].grid()

# Set common labels for the entire figure
fig.text(0.5, 0.04, 'Date', ha='center', va='center')
fig.text(0.04, 0.5, 'Value', ha='center', va='center', rotation='vertical')

# Adjust the layout of the figure
plt.tight_layout(rect=[0.05, 0.05, 1, 1])

# Save the figure as "Full Performance.png"
plt.savefig("Full Performance")

# Display the plot
plt.show()

IV. Additional Comparison

Part 1. Using the United States Oil Fund (USO) as the trading security for Long positions only (no Short). Experimenting with different windows to calculate the moving average of the production-consumption gap.

Part 2. Using the Crude Oil Futures (CL=F) as the trading security, exploring both Long and Short positions. Testing various windows to calculate the moving average of the production-consumption gap.

V. Conclusion

Petroleum securities typically exhibit price fluctuations within a defined range, making a buy-and-hold strategy less advisable. Conversely, timing long or short positions effectively can potentially outperform index performance. This article presents a fundamental approach to timing decisions based on production-consumption differentials, which significantly enhances returns while mitigating risks.

--

--

Double L
TradeMind_Analytics

Macro Economics, Algo. Trading, Quantitative Portfolio Management