Oil Price Forecasting Using Conditional Generative Adversarial Networks (GANs) with Sentiment Analysis

M Alruqimi
4 min readJul 3, 2024

--

Part 1: Introduction

The code and data are available on my GitHub:
https://github.com/Med-Rokaimi/GAN_synthetic_data_time_series

Note: This model is built based on the ForGAN model https://arxiv.org/abs/1903.12549

You can view the relevant articles from via these links:

Brent Oil price: exploratory data analysis (EDA)
Part2: Dataser preparation

Part3: Build and train the model.

Introduction

In this article, I will guide you through the process of building a Conditional Generative Adversarial Network (GAN) step by step to forecast oil prices. By leveraging historical data and incorporating sentiment analysis scores as additional conditions, we aim to enhance the accuracy and reliability of our predictions. This comprehensive tutorial will cover everything from data preprocessing to model training and evaluation, providing a clear path for replicating and understanding the forecasting methodology.
But first, let’s define some concepts and mention some related works.

GANs

Generative Adversarial Networks (GANs) are a class of deep learning networks invented by Ian Goodfellow and his colleagues in 2014. GANs have revolutionized the fields of computer vision and natural language processing by enabling the generation of realistic and contextually appropriate synthetic data.

GAN models consist of two neural networks: a generator and a discriminator, which are trained simultaneously through adversarial processes. The generator creates synthetic data that resembles real data, while the discriminator evaluates the authenticity of the generated data. The goal is for the generator to produce data so realistic that the discriminator cannot distinguish it from real data.

Conditional GANs

is an extension of GAN, allowing the model to be conditioned on additional information, denoted as y. CGANs build on regular GANs by allowing extra information to influence the generated outputs. This information can be any kind of data such as labels or any data from different sources. In this case, we use time series data of crude oil prices and a sentiment score. Conditioning is achieved by incorporating y into both the discriminator and generator as an extra input layer. The modified value function V(G,D) for this configuration is:

Conditional GAN network https://arxiv.org/abs/1411.1784

GANs for time series forecasting

In recent years, few approaches have been proposed for utilizing GANs in time series forecasting. Most of these models employ RNN-based neural networks because of their effectiveness in capturing temporal dependencies within time series data.

Examples include TimGAN, ForGAN, FinGAN, TSGAN, and Time-Variant GAN.

CGANs for time series forecsting (Brent Oil Price Forecastin)

Next, we will build our CGAN model to predict the next day’s price of crude Brent oil.
Our dataset includes historical Brent prices from 2012 to 2021, as well as a cumulative sentiment score derived from oil market sentiment analysis.
The preprossed data is available in my GitHub:
https://github.com/Med-Rokaimi/GAN_synthetic_data_time_series

The main idea is illustrated in these two figures:

Figure 1 shows the generator network. The generator network takes two inputs:

  • Noise: This is a random noise vector(Normal distribuation). The use of noise in GANs is crucial for training a robust and effective generator capable of producing realistic and varied data samples. The noise vector introduces randomness, allowing the generator to create a wide variety of outputs. This helps in generating diverse samples rather than repeating the same output.
  • Condition window: This is a window of historical observation points, including price and sentimental score. We will discuss this more later in the dataset prepration section.
Figure 1: The generator
Figure 2: The discriminator

The objective is to model the probability distribution of one step ahead value xt+1 given the historical data

The Generator G and Discrimniator D are trained simultaneously in an adversarial network. The generator G learns to transform a known probability distribution ρz to the generators distribution ρG which resembles ρdata. While the discriminator receives (xt+1) and determines if (xt+1) is real or generated by the Generator. Hence, the model function is expressed as follows:

The core component of the generator is the GRU (Gated Recurrent Unit), this unit is followed by two dense layers. The discriminator archticture is not so different.

Conclusion

In this part we intoduced the model we are going to build in the next parts. We showed the model components and defined some necessary concepts. In the following sections, we will prepare our dataset, build the model and finally train and evaluate the model.

You can view the relevant articles from via these links:
Part 2: Dataset prepration
Part3: Build and train the model.
Brent Oil price: exploratory data analysis (EDA)
The full code and dataset (GitHub)

--

--