Artificial Intelligence and Machine Learning for Foreign Exchange (Fx) Trading Part 5— Features

ml bull
8 min readJun 9, 2023

--

This series of articles is dedicated to understanding AI/ML and how it relates to Fx trading. Most articles focus on predicting a price and are almost useless when it comes to finding profitable trading strategies and hence, that’s the focus here.

About Me

I have traded Fx for 20 years using both traditional statistical and chart analysis and AI/ML for the last 5 years or so. With a bachelor of engineering, masters and several certificates in Machine Learning I wanted to share some of the pitfalls that took me years to learn and explain why its difficult, but not impossible, to make a system work.

Introduction

In the first four articles we:
1. Created the most basic “hello world” example. We gathered data, generate a model and measure our result
2. We use class weights to get our model “in the ball park” and maybe slightly better than guessing and improved our measurement.
3. In the 3rd we peered under the covers of Logistic Regression to find its limitations looking for where we might go next.
4. Looked at normalization and its impact realizing our hypothesis maybe untrue.
5. This article looks at beefing up our hypothesis by adding more features

Disclaimer

This is in no way financial advice and does not advocate for any specific trading strategy but instead is designed to help understand some of the details of the Fx market and how to apply ML techniques to it.

Feature Engineering

Its likely our previous hypothesis was untrue (something in the closing price of the last 4 hours predicted a sudden price move in the next 4 hours). Hence we need to look at trying a completely new hypothesis (and we will in future articles) or add/modify features to see if we can make our features correlate better with our y_true variable. This process is known as feature engineering.

In this article we are going to use previous code but add the additional simple features. In our raw data we have data for AUDUSD pricing pair and EURUSD pricing pairs at the start of each hour with:
- Open Price: The price at the start of each 1 hour period
- High Price: The highest price achieved during that hour
- Low Price: The lowest price achieved during that hour
- Close Price: The price at the end of the 1 hour period
- Volume: The number of price changes during that period (not strictly number of trades but a proxy for it. Note this is normally very broker specific)

Example Bar Chart

These “OHLC” (Open, High, Low, Close) values are normally represented in bar or candlestick form (bar form above).

Currently our features include
- Close price in the last period (which is essential “now”)
- Close price 1 period (60 mins) ago
- Close price 2 periods (120 mins) ago
- Close price 3 periods (180 mins) ago

Remember we start a predication at the start of each period. So the current periods “open” and last periods “close” will be the same (or very similar). Hence “open” and “close” are essentially the same thing and using both wouldn’t make any sense. So, we can use:

Features available in the given data

This gives us 8 features at each time period or 32 features in total. We also need to include a “base” so we have a basis to move “points” from the raw price when we normalize (last article)

So our hypothesis now has become will more features, and specifically the features above improve our model? Lets find out:

Firstly lets simplify the load data to keep the feature processing seperate as we will be developing it over the coming weeks

import numpy as np
import pandas as pd
from datetime import datetime

def load_data():
url = 'https://raw.githubusercontent.com/the-ml-bull/Hello_World/main/Fx60.csv'
dateparse = lambda x: datetime.strptime(x, '%d/%m/%Y %H:%M')

df = pd.read_csv(url, parse_dates=['date'], date_parser=dateparse)

return df

Create an almost new create_x_values function that creates the base of each feature (without normalization). Instead of fixed names for each feature we now loop through each name and the number of back periods to create the new feature. In other articles we will develop this even further. It also returns the names of the features so we can use them in other parts of the code giving it the ‘x’ designation we can separate it from our generic feature names.

def create_x_values(df, feature_names):

x_values_df = pd.DataFrame()

# loop thorugh feature name and "back periods" to go back
x_feature_names = []
for feature in feature_names:
for period in [1,2,3,4]:
# create the name (eg 'x_audusd_close_t-1')
feature_name = 'x_' + feature + '_t-' + str(period)
x_feature_names.append(feature_name)
x_values_df[feature_name] = df[feature].shift(period)

# Add "starting" values when used in normalization
x_values_df['x_audusd_open'] = df['audusd_open'].shift(4)
x_values_df['x_eurusd_open'] = df['eurusd_open'].shift(4)
x_values_df['audusd_open'] = df['audusd_open']
x_values_df['eurusd_open'] = df['eurusd_open']

# add all future y values for future periods
for period in [0,1,2,3]:
name = 'y_t-' + str(period)
x_values_df[name] = df['audusd_close'].shift(-period)

# y is points 4 periods into the future - the open price now (not close)
x_values_df['y_future'] = df['audusd_close'].shift(-3)
x_values_df['y_change_price'] = x_values_df['y_future'] - df['audusd_open']
x_values_df['y_change_points'] = x_values_df['y_change_price'] * 100000
x_values_df['y'] = np.where(x_values_df['y_change_points'] >= 200, 1, 0)

# and reset df and done
x_values_df = x_values_df.copy()
return x_values_df, x_feature_names

Our normalization routing has changed the most. Focused on only the features we need (the x designation), supporting different start values for audusd, eurusd and volumes. In points and percentage we also manually scale the volume to put it in “roughly” the same scales and return the ‘_norm’ fields so we can use them elsewhere.

from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import MinMaxScaler, StandardScaler

def normalize_data(df, x_fields, method):

norm_df = df.copy()
y_fields = ['y_t-0', 'y_t-1', 'y_t-2', 'y_t-3']

if method == 'price':
for field in x_fields:
norm_df[field + '_norm'] = df[field]

for field in y_fields:
norm_df[field + '_norm'] = df[field]

if method == 'points':
for field in x_fields:
if 'volume' in field:
norm_df[field + '_norm'] = df[field] / 100
elif 'audusd' in field:
norm_df[field + '_norm'] = (df[field] - df['x_audusd_open']) * 100000
elif 'eurusd' in field:
norm_df[field + '_norm'] = (df[field] - df['x_eurusd_open']) * 100000

for field in y_fields:
norm_df[field + '_norm'] = (df[field] - df['audusd_open']) * 100000

if method == 'percentage':
for field in x_fields:
if 'volume' in field:
norm_df[field + '_norm'] = df[field] / 10000
elif 'audusd' in field:
norm_df[field + '_norm'] = (df[field] - df['x_audusd_open']) / df[field] * 100
elif 'eurusd' in field:
norm_df[field + '_norm'] = (df[field] - df['x_eurusd_open']) / df[field] * 100

for field in y_fields:
norm_df[field + '_norm'] = (df[field] - df['audusd_open']) / df[field] * 100

if method == 'minmax':
scaler = MinMaxScaler()
scaled = scaler.fit_transform(df[x_fields + y_fields])
norm_field_names = [x + '_norm' for x in x_fields + y_fields]
norm_df[norm_field_names] = scaled

if method == 'stddev':
scaler = StandardScaler()
scaled = scaler.fit_transform(df[x_fields + y_fields])
norm_field_names = [x + '_norm' for x in x_fields + y_fields]
norm_df[norm_field_names] = scaled

x_feature_names_norm = [x + '_norm' for x in x_fields]
return norm_df, x_feature_names_norm

No real changes to our train/val, class weight or metric functions. Now, lets run it for each normalization method and compare the result to last weeks single feature version

def get_train_val(df, x_feature_names_norm):
#
# Create Train and Val datasets
#

x = df[x_feature_names_norm]
y = df['y']
y_points = df['y_change_points']

# Note Fx "follows" (time series) so randomization is NOT a good idea
# create train and val datasets.
no_train_samples = int(len(x) * 0.7)
x_train = x[4:no_train_samples]
y_train = y[4:no_train_samples]

x_val = x[no_train_samples:-3]
y_val = y[no_train_samples:-3]
y_val_change_points = y_points[no_train_samples:-3]

return x_train, y_train, x_val, y_val, y_val_change_points

def get_class_weights(y_train, display=True):

#
# Create class weights
#
from sklearn.utils.class_weight import compute_class_weight

num_ones = np.sum(y_train)
num_zeros = len(y_train) - num_ones

classes = np.unique(y_train)
class_weights = compute_class_weight(class_weight='balanced', classes=classes, y=y_train)
class_weights = dict(zip(classes, class_weights))

if display:
print('In the training set we have 0s {} ({:.2f}%), 1s {} ({:.2f}%)'.format(num_zeros, num_zeros/len(y_train)*100, num_ones, num_ones/len(y_train)*100))
print('class weights {}'.format(class_weights))

return class_weights
from sklearn.metrics import log_loss, confusion_matrix, precision_score, recall_score, f1_score

def show_metrics(lr, x, y_true, y_change_points, display=True):

# predict from teh val set meas we have predictions and true values as binaries
y_pred = lr.predict(x)

#basic error types
log_loss_error = log_loss(y_true, y_pred)
score = lr.score(x, y_true)

#
# Customized metrics
#
tp = np.where((y_pred == 1) & (y_change_points >= 0), 1, 0).sum()
fp = np.where((y_pred == 1) & (y_change_points < 0), 1, 0).sum()
tn = np.where((y_pred == 0) & (y_change_points < 0), 1, 0).sum()
fn = np.where((y_pred == 0) & (y_change_points >= 0), 1, 0).sum()

precision = 0
if (tp + fp) > 0:
precision = tp / (tp + fp)

recall = 0
if (tp + fn) > 0:
recall = tp / (tp + fn)

f1 = 0
if (precision + recall) > 0:
f1 = 2 * precision * recall / (precision + recall)

# output the errors
if display:
print('Errors Loss: {:.4f}'.format(log_loss_error))
print('Errors Score: {:.2f}%'.format(score*100))
print('Errors tp: {} ({:.2f}%)'.format(tp, tp/len(y_val)*100))
print('Errors fp: {} ({:.2f}%)'.format(fp, fp/len(y_val)*100))
print('Errors tn: {} ({:.2f}%)'.format(tn, tn/len(y_val)*100))
print('Errors fn: {} ({:.2f}%)'.format(fn, fn/len(y_val)*100))
print('Errors Precision: {:.2f}%'.format(precision*100))
print('Errors Recall: {:.2f}%'.format(recall*100))
print('Errors F1: {:.2f}'.format(f1*100))

errors = {
'loss': log_loss_error,
'score': score,
'tp': tp,
'fp': fp,
'tn': tn,
'fn': fn,
'precision': precision,
'recall': recall,
'f1': f1
}

return errors

So lets run it going through each normalization method.


for norm_method in ['price', 'points', 'percentage', 'minmax', 'stddev']:
df = load_data()

feature_names =['audusd_open', 'audusd_close', 'audusd_high', 'audusd_low', 'audusd_volume', \
'eurusd_open', 'eurusd_close', 'eurusd_high', 'eurusd_low', 'eurusd_volume']
df, x_feature_names = create_x_values(df, feature_names)

norm_df, x_feature_names_norm = normalize_data(df, x_feature_names, method=norm_method)
x_train, y_train, x_val, y_val, y_val_change_points = get_train_val(norm_df, x_feature_names_norm)
class_weights = get_class_weights(y_train, display=False)

lr = LogisticRegression(class_weight=class_weights)
lr.fit(x_train, y_train)

print('Errrors for method {}'.format(norm_method))
errors = show_metrics(lr, x_val, y_val, y_val_change_points, display=True)

We can see that there isnt much improvement. However, there are some things worth noting.

  1. “did not converge” errors. Highlighted in italics here on points, percentage and minmax normalization. This means the math trying to find the best “linear” solution couldn’t minimize it before it hit its iteration limit. The algorithm is capped at using 100 iterations but you can easily change this to 1000 with max_iter=1000 by adding it in the LogisticRegression line. This will correct the problem but, lets explore it without doing this since it addesses some interesting points. The “cant converge error” is likely caused by a few things.
    - Not enough data available
    - Too many features (vs not enough data)
    - The data isn’t correlated with the features
    We consistently get precisions > 50% but only just. Hence its likely (our conclusion) is that there might be something in the past that predicts the future but its very very weak or non existant. Hence we must balance all these things (number of features, amount of data and data / feature correlation) to get the right balance.
  2. The points method stands out with a very high F1 score compared to other methods. Is this an anomaly? It didn’t converge so could very well be just a random coincidence or are we onto something? The training loss is large (remember in our first article is suggested loss and return were not necessarily correlated?)

As it stands this couldn’t possible work as a trading system but there are some tweaks we can do and there are some gaps in our measurement. We will be exploring these next article!

--

--