An attempt to predict oil futures prices during a global crisis

Benjamin Claverie
Analytics Vidhya
Published in
11 min readNov 24, 2020

--

This year has been particularly surprising and turbulent. In less than 10 months, we saw the breaking of an unknown disease that stopped the global economy and put the world back into recession. Workers and students were forced to adapt and work from home. Major shifts in a relatively short period of time changed dramatically our modern society. This unprecedented situation leaves plenty of room for research to better understand the key drivers behind these changes. As a final year student in exchange at Imperial College, we conducted a research project this summer on a topic linked to quantitative finance. Students are always learning from past events and, in particular, previous crisis are often periods from which we learn the most. Financial markets, this year, are a textbook case. Indeed they experienced an all-time high in January 2020 then fell to their lowest since 2008 in March. In parallel, oil prices were engaged in a fierce price war that led, in addition to a vanishing demand worldwide, to negative WTI Futures Prices on April 21st.

In partnership with MyDataModels, we led a project focused on the prediction of oil futures for Q3 and Q4 2020 and the correlations of WTI Futures with the spread of COVID-19. On the one hand, predictions were first developed according to time-series model ARIMA and then with machine learning provided by MyDataModels.

Oil price war and COVID19: an explosive cocktail

In March, OPEC and Russia, the world’s biggest oil producers, were supposed to renegotiate their deal regulating their oil production worldwide. Earlier this year, in January, the COVID-19 began spreading in China and quickly throughout Asia. This virus was long underestimated by developed countries, especially in Europe and the US where other past pandemics such as the SRAS were thought to be similar. We witnessed a very different scenario from what we were all expecting as occidentals. As European countries were badly hit by the virus, governments imposed severe restrictions on their populations opening the way for total lockdowns.

Our world almost exclusively relies on oil to keep the engine running in our race to unlimited growth. Therefore, black gold has always been thought as a vital resource with a relatively high price that would almost exclusively fluctuate according to the political climate around its producers. Demand shocks on the other hand had never been seen as a threat to oil prices. During lockdowns, with less to no reasons for people to use their car, plane travels being banned to control the virus and goods transportation slowed dramatically over the world, oil consumption was meant to disappear. Having noticed, Saudi Arabia, OPEC leader, tried to accelerate the deal’s negotiation with Russia to lower oil production in order to resist the crisis to come. Russia on the other hand, thought this approach would only benefit US rival shale producers and refused the deal. This is when rationality stepped down and gave the crown to individualism and self-interest. As a response, Saudi Arabia started a price war on oil prices against Russia increasing its daily production by 2.6m barrels a day even though the world would need less oil in the following months. On March the 13th, 3 days before the markets collapsed, the Financial Times released a short video explaining the price war giving insights on what people were expecting then: “What now? Oil prices have recovered somewhat but no one knows how bad this is going to get. Major oil companies are preparing for a prolonged period of low prices.[…] Again the oil market is preparing for the worst case scenario.” Was it indeed the case?

Figure 1: WTI Futures prices since December 2019, source Yahoo Finance.

The chart in 1 shows downturns between March and April linked to the price war between Saudia Arabia and Russia. On April the 2nd, US President Donald Trump pressured Saudi Arabia to secure a deal with Russia to cut production to prevent oil prices to keep falling as the damage to US oil producers would be fatal. On April the 9th, a deal is signed between oil producers. They agreed to reduce production by 10m barrels per day(positive peak before the oil collapse). Oil futures that were to be delivered in May (expiring on the 21st of April) entered in a very high contango situation where their prices were higher than the actual expected price of the commodity at delivery. Due to a nonexistent oil demand in addition to overwhelmed storage capacities, May WTI Futures prices became negative on April the 20th. Among all the financial crisis that hit the oil market, this one steps out as a textbook case where it is a mix of external factors leading to a dead-end. In particular, two things are to be pointed out: first, storage capacities are built on an expected activity that never totally stops. When capacities are full, oil is not only worthless, it is a net loss for producers as it must be thrown away. Second, in opposition to previous oil crisis where investors were building reserves of barrels to wait for the prices to rise up again, lockdowns prevented such action. Even speculation on the products was not possible making WTI Futures a burden.

Why aiming at predicting oil prices in the second half of 2020?

In March, we witnessed a behaviour our society is not used to: panic due to a lack of control. Governments in most developed countries had, and still have, no better solution than to shut down their entire economy to prevent the virus from spreading. Health systems worldwide are not built to sustain a very high shock in patients flow happening almost instantly. In addition, hospitals professionals must be careful and rigorous not to catch the virus themselves. With a physical economy that vanished for 2 months this year, many companies will also face issues. As Figure 2 below points out, high levels of debts have been noticed, especially for oil producers which the oil crisis hit hard.

Figure 2: Bankruptcies in the Energy sector, sourcevisualcapitalist.

Our view to the current problem is that the world is holding on while every health company compete to commercialize a vaccine. Therefore, as a lockdown for almost a year is not economically sustainable for any country, we expected - during the Summer - a period where restrictions were to be loosened. Consequently, we expected a new rise in COVID 19 cases in these countries afterwards leading to a situation where lockdowns could be considered again (and here we are again in Europe!). In this case, we could be again in a situation where oil could be in a tough position.

In parallel, market finance relies heavily on the Priced-In assumption that states that most investors share a large pool of information that shall guide their decisions. Consequently, prices of financial products should reflect every information available publicly. What we believed was then that most intuitive COVID-19 consequences were supposed to be priced-in within financial products and especially in oil.

Oil futures prediction in practice:

This project was built to try to conduct three different analyses on oil prices. First we wanted to predict WTI Futures crash last april with 1-year data (at close, from Yahoo finance) using time-series ARIMA model. The idea here was not to create a magical model but to see if there was a short term trend that could have given a hint on a downward trend with a different source than the price war. ARIMA model is a class of models that helps predicting future data points. It has the advantage of providing predictions that rely on their past values as well as on their lagged errors. This combination provides more robust predictions. In addition, it is a model that is particularly convenient to non-stationary data sets, which is more suited for crisis time where such hypothesis is false. Second we used the same model to provide predictions for the second half of 2020 to see whether the market was, unconsciously or not, tabulating for a second wave of difficult times for oil. Our results in this part have little robustness as this model is not complex enough to understand all the complexity of the market. Furthermore, time-series models share also the burden of putting too much weight into recent past values. A dramatic recent fall such as a market crash will influence too much the predictions on the short term. What one could consider to improve this study would be to add a non-constant weight factor evolving through time and the spread of the virus. Also, one may add dummy variables to focus on sectors in case of specific events such as bankruptcies, aborted IPOS, etc.

Finally, as we faced the inconsistency in our predictions through time-series models, we wanted to use more complex models and other variables in order to better understand the movements behind oil futures. MyDataModels’ product, TADA, is a machine learning algorithm designed to perform on small data set. This specificity is an ideal advantage in our case where we focus on a short time window to conduct our research. As an input, we considered the evolution of cases of COVID-19 (provided by JohnHopkins University) per day from the 22nd of January by country. We gathered the data into geographic areas: Europe, Middle East, North America, South America, Asia and Oceania. In addition, we also considered the VIX index on the same time period as a proxy to investors’ view of the global situation each day. The fear index is regularly considered as an indicator to trend reversal as it increases when the market becomes bearish or volatile. We provided clean data to TADA in order to predict WTI prices and it should choose the inputs that best explain the moves in oil prices during the period we considered. To test the accuracy of the algorithm as well as the rational behind the results we obtained, we decided to set different time windows for the training sets: from January the 22nd until the crash of March, then until the crash of April and finally from January to July. As expected, the choice of the explaining variables becomes more logical as the size of the training set increases.

On the other hand, we also noticed that, like time-series models, the size of the training set was not sufficient enough to allow TADA not to weight too much recent crashes in its predictions. At the end of the document, on figures 3 to 5 and 6 to 8, we provide some of the estimations we computed through our model. First, one main difference between the two models is the robustness of the results. In ARIMA, the prediction curve is too smooth to even approximate WTI prices moves whereas TADA seems to incorporate stochastic behaviour within its predictions. ARIMA and TADA rely heavily on past results and weight strongly the short term past data. In ARIMA, the trend is weaker but remains present during all the predictions. For instance in figure 3 we see that the predictions for prices from the 4th of August until the 19th were actually downward slopping where actual prices went up during the same time period. In TADA’s case, the algorithm weights more recent data and figure 6 points it out perfectly. If prices fall then the model predicts a continuous fall that brings prices to −∞. Even in figures 7 and 8 we see that the impact of the crash still weights strongly in the algorithm’s predictions. In our view, we suppose that TADA has been built to behave properly with bio/genetic models which are different from finance ones. We believe that apart from modifying the source code -which we did not have access to- we could solve or at least soften this issue by improving the input data. For instance, one may try to implement a time-varying coefficient that should add less weight to short term trend and try to show TADA a long-term trend. In other words, we could add to the model a sort of momentum-trend filter that could help us improve the forecasts predictions. Furthermore, figure 8 provides promising results as, in opposition to figure 4, the predictions are this time in the same direction as the actual prices.

Figure 3: 15days prediction from 4th of August with ARIMA
Figure 4: Traning set ending after the crash, predictions with ARIMA model
Figure 5: Traning set ending before the crash, predictions with ARIMA model
Figure 6: Training set ending before the crash, predictions with TADA model
Figure 7: Taining set ending shortly after the crash, predictions with TADA model
Figure 8: Training set ending long after the crash, predictions with TADA model

A humble study that can further be improved

A month ago European countries began once more to impose restrictions on public locations, restaurants, pubs and gyms. Countries like Germany and the UK imposed restrictions for tourists coming from countries like France and Spain. Most European countries are now under strict curfew or lockdown. The situation we imagined back in March is conceivable which highlights the intuitions behind our work. However, we underestimated countries’ governments’ will to maintain their economy in order to attenuate as much as possible the economic crisis that is to come. In this sense, we thought that a new negative shock in oil demand would happen but until now, the world has managed to avoid it.

This project is part of the completion of our academic year at Imperial College but the approach we took can be improved and, we believe, provide interesting results. This subject also allowed us to believe that we could decompose the impact information have on financial markets. For example, on March the 16th, the market priced the impact of COVID-19 on developed economies. We could study further the impact on stock prices of some companies. In particular on tech stocks where the rally that took place from March until now highlighted a shift now well-known of an historical physical economy to a more virtual one. It is highly likely that many companies will face bankruptcy in the following months as levels of debts in corporations reached alarming levels. Predictions are vitals especially on small data sets where their flexibility would improve our adaptability to crises we will certainly meet in the future. TADA is an ideal example of a product that shows the potential to be developed even further in this sense.

--

--