EUR/USD Forecasting Simplified: an LSTM User’s Guide
LSTM, or Long Short-Term Memory, is a specialized type of Recurrent Neural Network (RNN) designed to recognize patterns in sequences of data. Unlike traditional RNNs, LSTM is adept at learning long-term dependencies thanks to its unique architecture, which includes memory cells governed by mechanisms called gates. These gates regulate the flow of information, ensuring that the network retains crucial data from earlier in the sequence, making LSTMs particularly resistant to the vanishing gradient problem that plagues vanilla RNNs.
Forecasting financial time series data, like stock, options, bond, and forex rates is challenging due to the inherent noise, non-stationarity, and the influence of numerous unpredictable external factors. However, LSTM tend to perform better for this kind of data than others due to the following reasons:
The Pros
- Temporal Dependencies: LSTMs are explicitly designed to handle sequences, making them well-suited for time series data.
- Complex Patterns: Financial data can have non-linear patterns that are challenging for traditional models. LSTMs can capture these patterns.
- Flexibility: LSTMs can be designed in various architectures, such as stacked, bidirectional, or even in combination with other neural network types (like CNN-LSTM).
Despite many of its benefits, LSTM also has some caveats that you should also consider:
The Cons
- Data Volume: LSTMs benefit from large amounts of data. If your dataset is limited, you might not be able to fully leverage the power of LSTMs.
- Computational Cost: Training LSTMs, especially deep architectures, can be computationally intensive. It’s beneficial to have access to powerful hardware.
- Overfitting: LSTMs can overfit to the training data, especially when the data is noisy (as financial data often is). It’s essential to use techniques like dropout, early stopping, or regularization to counteract this.
Data Collection and Preparation
Let’s move on to the juicy part where we’ll be using LSTM to forecast EUR/USD data. Here are some key details about the data that you should be aware of:
- Data Source: Investing.com ( EUR USD Historical Data — Investing.com)
- Data Granularity: Daily
- Data Period : January 1, 2010–December 31, 2022
Date Price Open High Low Vol. Change %
0 12/30/2022 1.0702 1.0663 1.0714 1.0639 NaN 0.38%
1 12/29/2022 1.0661 1.0609 1.0691 1.0609 NaN 0.50%
2 12/28/2022 1.0608 1.0642 1.0675 1.0606 NaN -0.28%
3 12/27/2022 1.0638 1.0638 1.0670 1.0611 NaN 0.03%
4 12/26/2022 1.0635 1.0611 1.0638 1.0604 NaN 0.20%
The dataset contains historical data for EUR/USD, with the following columns:
- Date: The date of the data entry.
- Price: The closing price for the EUR/USD pair on that date.
- Open: The opening price for the EUR/USD pair on that date.
- High: The highest price reached for the EUR/USD pair on that date.
- Low: The lowest price reached for the EUR/USD pair on that date.
- Vol.: Trading volume for the day (seems to have missing values).
- Change %: The percentage change in price from the previous day.
Looking at the first few collumn above, you’ll observe the presence of some missing Volume data. However, there’s no need for concern as we will exclusively utilize the Price column for our forecasting purposes.
The Code
Import Necessary Libraries
First, we’ll import all the essential libraries for our model. Our coding will be done in Python, and while I’ll be using Jupyter Notebook as our integrated development environment (IDE), feel free to use your preferred IDE.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from keras.models import Sequential
from keras.layers import LSTM, Dense, Dropout
from keras.callbacks import EarlyStopping
Data Preparation
Let’s import the CSV data for EUR/USD that you’ve downloaded from investing.com
data = pd.read_csv("EUR_USD Historical Data.csv")
Since the default order of historical data from investing.com is from the latest to the earliest, we need to extract the Price column and then reverse it to arrange it chronologically.
# Reverse the entire DataFrame to ensure it's in chronological order
data = data.iloc[::-1].reset_index(drop=True)
# Extract the 'Price' column without reversing
prices = data['Price'].values
prices = prices.reshape(-1, 1)
Next we need to normalize the data as normalization is crucial for LSTMs because they are sensitive to input data scale. Scaling data to a range, typically between 0 and 1, ensures consistent representation, prevents any particular feature from disproportionately influencing the model, and facilitates smoother and faster convergence during training.
# Normalize the data
scaler = MinMaxScaler(feature_range=(0, 1))
normalized_prices = scaler.fit_transform(prices)
Next, let’s create sequences of a given length to be fed into the LSTM, in the case we wil use sequence_length=60
.
# Create sequences of a given length (60 days) to feed into the LSTM
sequence_length = 60
X, y = [], []
for i in range(sequence_length, len(normalized_prices)):
X.append(normalized_prices[i-sequence_length:i, 0])
y.append(normalized_prices[i, 0])
Data Training and Model Creation
To start training our data, we first need to split our data into training and validation sets. This allows us to evaluate the model’s performance on unseen data.
X, y = np.array(X), np.array(y)
X = np.reshape(X, (X.shape[0], X.shape[1], 1))
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, shuffle=False)
Now let’s define our LSTM model, this part might be a little bit difficult to follow so I’ll try my best to thoroughly explain it:
model = Sequential()
#This is the first LSTM layer with 50 units (or cells).
model.add(LSTM(units=50, return_sequences=True, input_shape=(X_train.shape[1], 1)))
#First Dropout Layer with a rate of 20% to prevent overfitting
model.add(Dropout(0.2))
#This is the second LSTM layer, also with 50 units.
model.add(LSTM(units=50, return_sequences=True))
#Second Dropout Layer with a rate of 20% to prevent overfitting
model.add(Dropout(0.2))
#The third LSTM layer, also with 50 units.
model.add(LSTM(units=50))
#Third Dropout Layer with a rate of 20% to prevent overfitting
model.add(Dropout(0.2))
#This is a fully connected layer that outputs aggregates of features learned by the LSTM layers and produces a single output value
model.add(Dense(units=1))
#The model is compiled using the Adam optimization algorithm that combines the advantages of two other extensions of stochastic gradient descent: AdaGrad and RMSProp.
model.compile(optimizer='adam', loss='mean_squared_error')
So, the model we’re using is is defined using the Sequential
class from Keras, which allows us to build a model layer by layer in a linear stack. The model uses three LSTM layer as well as Dropout layers to mitigate overfitting, which can be a concern with deep networks and limited data.
For the first and second layer of LSTM :
- The
return_sequences=True
argument means these LSTM layers will return the full sequence to the next layer. This is necessary when stacking LSTM layers so that the next LSTM layer receives sequences as input. input_shape
specifies the shape of the input data. In this case, it takes sequences of lengthX_train.shape[1]
and each sequence has 1 feature.
For the third LSTM layer, return_sequences
is set to False
, so this layer will only return the output of the last time step, we set this to False
because after processing the entire input sequence, we’re going to use the model to make a single prediction (Price). Hence, only the final output of the sequence is passed to the Dense layer.
For all dropout layer:
- Dropout is a regularization technique where randomly selected neurons are ignored during training, helping to prevent overfitting. The rate
0.2
means approximately 20% of the input units will be dropped out at each training step.
Now, that we’re done defining our model, let’s start training our model
early_stop = EarlyStopping(monitor='val_loss', patience=10, restore_best_weights=True)
history = model.fit(X_train, y_train, epochs=100, batch_size=32, validation_data=(X_val, y_val), callbacks=[early_stop], shuffle=False)
Next, let’s visualize the training and validation loss to understand how well the model is learning.
plt.figure(figsize=(10, 6))
plt.plot(history.history['loss'], label='Training Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.title('Training and Validation Loss')
plt.xlabel('Epoch')
plt.ylabel('Mean Squared Error')
plt.legend()
plt.show()
The Output will be the following plot:
MSE ( Mean Squared Error)
MSE measures the average squared difference between predicted and actual values, with lower values indicating that the LSTM model is more accurately predicting the time series data.
Epoch
An epoch represents one full pass of the training data through the LSTM, refining its predictions with each iteration, while monitoring metrics like RMSE to gauge learning progress.
Let’s see what this plot tells us.
As the training progresses across epochs, we observe a decreasing trend in both training and validation losses. Each epoch signifies a full pass of the training data through the LSTM network.
The consistent downtrend in Training Loss
and Validation Loss
is a positive indicator, suggesting the LSTM model is not only assimilating patterns from the training data effectively but also showcasing a promising ability to generalize to new data.
Forecast Performance: Predicted vs Actual
Let’s use matplotlib to see how the Predicted Price perform vs the Actual Price:
# Use the trained model to generate predictions on the validation dataset.
# `X_val` contains the input features for the validation set.
# The resulting predictions are stored in `y_val_pred` for further evaluation against actual values.
y_val_pred = model.predict(X_val)
# Assuming val_dates is the list/array of dates corresponding to your validation set
val_dates = data['Date'][-len(y_val):].values
val_dates = pd.to_datetime(val_dates) # Convert to datetime format
plt.figure(figsize=(14,5))
plt.plot(val_dates, y_val, color='blue', label='Actual EUR/USD Price')
plt.plot(val_dates, y_val_pred, color='red', linestyle='dashed', label='Predicted EUR/USD Price')
plt.title('EUR/USD Price Prediction')
plt.xlabel('Date')
plt.ylabel('EUR/USD Price')
plt.legend()
plt.grid(True)
plt.show()
Looking at this plot we can understand:
- The LSTM model seems to follow the overall trend of the actual prices quite closely.
- There are certain periods where the model’s predictions deviate from the actual prices. However, the model generally captures the upward and downward movements.
- The LSTM appears to be slightly lagging in capturing rapid changes or peaks in some areas. This is a common characteristic of LSTM models in time series forecasting, especially when rapid or abrupt changes occur in the data.
We can also generate the Histogram of Prediction errors to see the the distribution of prediction errors (residuals) for the LSTM model:
# Calculate prediction errors
errors = y_val - y_val_pred.flatten()
# Plot histogram of errors
plt.figure(figsize=(10, 6))
plt.hist(errors, bins=50, color='blue', alpha=0.7)
plt.title('Histogram of Prediction Errors')
plt.xlabel('Error Value')
plt.ylabel('Frequency')
plt.show()
The code will generate the following chart:
Looking at the histogram above, we can understand that:
- Centered Around Zero: Most of the errors are centered around zero, indicating that the model’s predictions are generally close to the actual values.
- Bell-Shaped Curve: The error distribution somewhat resembles a bell-shaped curve (though not perfectly normal). This suggests that large errors (both positive and negative) are less frequent.
- Skewness: The histogram seems to have a slight right skew, indicating that there are a few instances where the model underpredicts (predicts a value lower than the actual value) by a larger margin.
Conclusion
In our forecast the EUR/USD exchange rates using LSTM, the results have been promising. The model’s predictions closely align with actual observed rates. This is proved by the Root Mean Squared Error (RMSE) for our LSTM model’s predictions on the EUR/USD data being 0.02860742615917037. This means that, on average, the model’s predictions deviate from the actual values by about 2.860%.
Such results indicate the reliability of LSTM for time series forecasting, especially in financial contexts. While the model isn’t flawless, its performance in this analysis suggests it’s a valuable tool for forecasting tasks.
Feel free to share your thoughts on utilizing LSTM for forecasting Forex data in the comments section below!