AlgoCraft

Algorithmic Trading Ideas

Featured

Mastering Forex Forecasting: A Comprehensive Guide to Harnessing XGBoost for Predictive Analytics

--

In the fast-paced world of finance, Forex trading stands as a towering figure. It’s a massive market, where currencies whirl around in exchange, their values ebbing and flowing like the tide. For us retail traders, its very important to have the right tool in our belt to navigate through the forex market so that we can generate consistent profit.

After previously we learned how to forecast the forex market with LSTM, now we’ll try to forecast the same pair with another machine learning algorithm called XGBoost.

XGBoost: The Game-Changer

XGBoost, standing for eXtreme Gradient Boosting, represents one of the pinnacles of modern machine learning methodologies. Born from the idea of optimizing the Gradient Boosting framework, XGBoost deploys decision trees to break down intricate predictive challenges into manageable bits. Noted for its unparalleled efficiency and formidable accuracy, this algorithm is a master at sifting through diverse data forms, be they structured or unstructured. What sets XGBoost apart is its ability to adapt swiftly and refine its models, making iterative enhancements to bolster the accuracy of its predictions. In essence, when faced with convoluted prediction scenarios, XGBoost emerges as a dependable and potent ally.

Why XBoost is Well Suited for Forex Forecasting?

Forex trading is all about predicting currency exchange rates quickly and accurately. XGBoost is particularly helpful for this task. Its method for improving predictions is great for making precise forecasts in the fast-paced world of Forex.

In Forex, where prices change rapidly, XGBoost’s ability to process large amounts of data and make accurate predictions is incredibly valuable. It simplifies the complex Forex landscape, giving traders the confidence they need to make decisions.

XGBoost for Forex Forecasting

To start using XGBoost for Forex forecasting, let’s first start import all the necessary libraries:

import yfinance as yf
import numpy as np
import pandas as pd
import xgboost as xgb
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.metrics import mean_absolute_error
from sklearn.preprocessing import StandardScaler
from ta.momentum import RSIIndicator, StochasticOscillator
from ta.trend import MACD, PSARIndicator
from ta.volatility import BollingerBands
from ta.utils import dropna

Looking at the libraries above, you will notice that we will be fetch the forex data using yfinance, a popular library to extract financial data. numpy and pandas are included for data processing purposes, while xgboost is imported for the predictive modeling. Finally, all data visualization in this forecast will be facilitated by matplotlib.

Next, let’s import our data:

# Define the ticker symbol and fetch data
ticker_symbol = "EURUSD=X"
df = yf.download(ticker_symbol, start="2000-01-01", end="2023-01-01")
df

The above code utilize the yfinance library where we import the necessary data for our forecast. In thismodel, we will try to forecast EUR/USD daily databy utlizing the data from 2020–01–01 to 2023-01-01 which we store into dataset df .

Raw Data Output for EUR/USD Pair retrieved with yfinance library

Feature Engineering for Forex Forecasting

DALL-E Generated Illustration for Forecast Forecasting just because I thought it would be fun

In predictive modeling, especially in financial forecasting, the quality and relevance of input features play a pivotal role in the accuracy and reliability of predictions. While raw data provides a foundational perspective, it often lacks the nuanced context that can capture the complexity of market dynamics. This is where feature engineering becomes invaluable. By transforming and synthesizing the raw data using established technical indicators, we can provide the model with enriched insights that could explain underlying market patterns and trends. Such enriched data can significantly improve the model’s ability to anticipate future price movements.

For this forecast, we will utilize a couple of technical indicators for our feature engineering like RSI, MACD, Bollinger Bands, Parabolic SAR, and Stochastic Oscillator. Additionally, we introduce lag features to capture temporal dependencies, ensuring our model benefits from both current and historical contexts.

Let’s calculate the features we will utilize in this forecast

# Compute RSI
df['momentum_rsi'] = RSIIndicator(close=df['Close']).rsi()

# Compute MACD
macd = MACD(close=df['Close'])
df['trend_macd'] = macd.macd()
df['trend_macd_signal'] = macd.macd_signal()
df['trend_macd_diff'] = macd.macd_diff()

# Compute Bollinger Bands
bollinger = BollingerBands(close=df['Close'])
df['volatility_bbm'] = bollinger.bollinger_mavg()
df['volatility_bbl'] = bollinger.bollinger_lband()
df['volatility_bbh'] = bollinger.bollinger_hband()

# Compute Parabolic SAR
psar = PSARIndicator(high=df['High'], low=df['Low'], close=df['Close']) # Assuming you have 'High' and 'Low' columns in your df
df['trend_psar'] = psar.psar()


# Compute Stochastic Oscillator
stochastic = StochasticOscillator(high=df['High'], low=df['Low'], close=df['Close']) # Assuming you have 'High' and 'Low' columns
df['momentum_stoch'] = stochastic.stoch()
df['momentum_stoch_signal'] = stochastic.stoch_signal()

# Create Lag Features
df['Close_Lag1'] = df['Close'].shift(1)

# Drop NaN values introduced due to lag features and indicators
df = df.dropna()

# Define features and target
X = df[['momentum_rsi', 'trend_macd', 'trend_macd_signal', 'trend_macd_diff', 'volatility_bbm', 'volatility_bbl', 'volatility_bbh', 'trend_psar', 'momentum_stoch', 'momentum_stoch_signal', 'Close_Lag1']]
y = df['Close']

The above code is organizing the dataset df into input features and a target variable for our model. The input features, captured under X, consist of various the features we calculated on and previously defined. The target variable, denoted by y, is the Close column, representing the daily closing price of EUR/USD, which our model aims to predict based on the provided features.

Model Initialization and Training

# Initialize the model
model = xgb.XGBRegressor(
learning_rate=0.75,
n_estimators=200,
max_depth=5,
subsample=0.9,
colsample_bytree=0.8,
colsample_bylevel=0.8,
gamma=0,
min_child_weight=1
)

# Train the model
model.fit(X_train, y_train)

Continuing from the previously discussed data preparation, this section of code dives into the model initialization and training phases using XGBoost. The xgb.XGBRegressor() initializes a regression model with specified hyperparameters to optimize the forecast. Key parameters include a learning rate of 0.75, which determines the step size at each iteration while optimizing, 200 estimators or trees, and a maximum depth of 5 for each tree, among others. These hyperparameters play a role in controlling the model’s complexity and fit to the data.

After initializing, the model is trained on the X_train and y_train datasets using the fit method. This step allows the model to learn the underlying patterns from the training data, preparing it to make future predictions on unseen data.

Performance Evaluation and Testing

# Predict on the test set
y_pred = model.predict(X_test)

# Calculate performance metrics
mae = mean_absolute_error(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)

print(f"Mean Absolute Error: {mae}")
print(f"Mean Squared Error: {mse}")
print(f"Root Mean Squared Error: {rmse}")

y_train_pred = model.predict(X_train)

After training the model on the historical data we evaluate its performance on unseen or test data. Using the predict method of the trained model, predictions (y_pred) are generated for the test dataset X_test. Subsequently, to assess the accuracy and reliability of these predictions, various performance metrics are computed:

  • The Mean Absolute Error (MAE) provides an average magnitude of errors between predicted and actual values.
  • The Mean Squared Error (MSE) squares these errors to emphasize larger discrepancies.
  • Root Mean Squared Error (RMSE) is the square root of MSE, providing error in the same units as the original data.

These metrics are then printed for clear visibility. We concludes by also predicting on the training set (X_train) with y_train_pred, to further analyze and compare the model’s performance on both training and test datasets.

The following output displays the performance metricswhich assess the accuracy of our model’s predictions:

Mean Absolute Error: 0.009141215039947168
Mean Squared Error: 0.000303615460154008
Root Mean Squared Error: 0.017424564848340058
  • Mean Absolute Error (MAE): At 0.0091, it shows the model’s average absolute deviation from the actual values.
  • Mean Squared Error (MSE): With a value of 0.0003036, it indicates the average squared error, emphasizing larger mistakes.
  • Root Mean Squared Error (RMSE): At 0.0174, it provides the average error in the original unit, illustrating the typical magnitude of error.

The relatively low values across these metrics suggest that the model has a good degree of accuracy in its predictions. The model appears to be reliably forecasting the target variable, depicted with minimal deviations in the forecasted data when compared to the actual data.

Data Visualization

# Create a new DataFrame for visualization
viz_df = pd.DataFrame({'True': y_test, 'Predicted': y_pred})

# Concatenate the training data for a complete view
viz_df_train = pd.DataFrame({'True': y_train, 'Predicted': y_train_pred})
viz_df = pd.concat([viz_df_train, viz_df])

# Plot the results
plt.figure(figsize=(14, 7))
plt.plot(viz_df['True'], label='True', color='blue')
plt.plot(viz_df['Predicted'], label='Predicted', color='red', alpha=0.7)
plt.title('EUR/USD Forecast: True vs Predicted')
plt.legend()
plt.grid(True)
plt.show()

The visual representation of the EUR/USD currency pair’s forecasted versus actual values offers an insightful glimpse into the model’s capabilities. The close alignment between the blue True line and the red Predicted line for most of the chart affirms the model’s strong predictive proficiency, especially given the low Mean Absolute Error (MAE) of 0.0091. The few areas where deviations occur resonate with the Root Mean Squared Error (RMSE) of 0.0174, indicating the average magnitude of error.

Notably, the small segment towards the right end, where predictions seem to diverge slightly, underscores the challenges of exact currency forecasting. Nevertheless, the model, as depicted in the graph and corroborated by the performance metrics, has shown remarkable accuracy in capturing the nuances of the EUR/USD exchange rate’s movements.

Conclusion

In conclusion, this exploration into Forex forecasting has underscored the critical interplay between data preprocessing, feature engineering, and model selection. Through this model we found that XGBoost in predicting the EUR/USD currency pair stands out, demonstrating the algorithm’s robustness and adaptability. Finally, the precision showcased by our model reinforces XGBoost’s reputation as an efficient tool to forecast financial data.

--

--

Alwan Alkautsar
Alwan Alkautsar

Written by Alwan Alkautsar

🚀 Product Manager | 📈 Stock & Forex Trader with a Dash of Algo | 🎾 Weekend Warrior on the Tennis Court

No responses yet