Artificial Intelligence and Machine Learning for Foreign Exchange (Fx) Trading Part 3— Lifting the Lid in Logistic Regression

ml bull
9 min readMay 28, 2023

--

This series of articles is dedicated to understanding AI/ML and how it relates to Fx trading. Most articles focus on predicting a price and are almost useless when it comes to finding profitable trading strategies and hence, that’s the focus here.

About Me

I have traded Fx for 20 years using both traditional statistical and chart analysis and AI/ML for the last 5 years or so. With a bachelor of engineering, masters and several certificates in Machine Learning I wanted to share some of the pitfals that took me years to learn and explain why its really difficult to make a system work.

Introduction

Last article we built on the “hello world” example to get it “in the ball park” and maybe slightly better than guessing. However, its still pretty useless. The next step is to take a look behind the scenes of Logistic Regression and see whats going on so we can find the gaps.

Disclaimer

This is in no way financial advice and does not advocate for any specific trading strategy but instead is designed to help understand some of the details of the Fx market and how to apply ML techniques to it.

Logistic Regression Background

Wikipedia has a very official and mathematical definition “In statistics, the logistic model (or logit model) is a statistical model that models the probability of an event taking place by having the log-odds for the event be a linear combination of one or more independent variables. In regression analysis, logistic regression (or logit regression) is estimating the parameters of a logistic model (the coefficients in the linear combination).”

However, in English and specific to Fx Trading it creates a model based upon input variables to produce the probability of an event taking place. We feed in the input variables as well the binary yes or no that then event happened. It learns from those inputs and we can then give it any input variables and it predicts the probability of the event happening.

In our example we use the last 4 periods close price as our input variables and the output was if the price would move upwards 200 points (we ignored a down or short movement to keep it simple). That’s our x_t1 (close price now), x_t2 (close price one hour ago), x_t3 (close price 2 hours ago) and x_t4 (close price 3 hours ago) input variables and y (if the price > 200 points higher in 4 hours as true/false) output variable.

To calculate and learn it uses a formula being some “weight” times each variable and then wrapping it all in a sigmoid function. Lets start with the weights.

f(x) = a + (w4 * x_t4) + (w3 * x_t3) + (w2 * x_t2) + (w1 * x_t1)

This is a linear equation (straight line) that the algorithm calculates to best fit the dataset. By default SciKit uses an algorithm call lbfgs (limited memory Broyden–Fletcher–Goldfarb–Shanno) algorithm. It also supports other algorythms but there wont be much difference between them in this scenario.

Once calculated we have a “straight line” which we now convert to a “sigmoid” function.

https://en.wikipedia.org/wiki/Sigmoid_function

This takes our straight line and “curves” it to give limits between 0 and 1 and spread out the predictions. This is important since we are giving a “probability” (which must be between 0 and 1).

See It in Action

We will run our example again but with only two variables (easier to chart two variables than 4), look at the weights and chart the decision boundary.

Firstly, lets repeat the data loading from last week with some changes to remove the charting and limit the inputs (or features) to the last 2 periods close price.

#
# IMPORT DATA From github
#

import pandas as pd
from datetime import datetime

url = 'https://raw.githubusercontent.com/the-ml-bull/Hello_World/main/Fx60.csv'
dateparse = lambda x: datetime.strptime(x, '%d/%m/%Y %H:%M')

df = pd.read_csv(url, parse_dates=['date'], date_parser=dateparse)

df.head(n=10)
#
# Create time shifted data as basis for model
#

import numpy as np

df = df[['date', 'audusd_open', 'audusd_close']].copy()

# x is the last 4 values so create x for each
#df['x_t-4'] = df['audusd_close'].shift(4)
#df['x_t-3'] = df['audusd_close'].shift(3)
df['x_t-2'] = df['audusd_close'].shift(2)
df['x_t-1'] = df['audusd_close'].shift(1)

# y is points 4 periods into the future - the open price now (not close)
df['y_future'] = df['audusd_close'].shift(-3)
df['y_change_price'] = df['y_future'] - df['audusd_open']
df['y_change_points'] = df['y_change_price'] * 100000
df['y'] = np.where(df['y_change_points'] >= 200, 1, 0)
#
# Create Train and Val datasets
#
from sklearn.linear_model import LogisticRegression

#x = df[['x_t-4', 'x_t-3', 'x_t-2', 'x_t-1']]
x = df[['x_t-2', 'x_t-1']]
y = df['y']
y_points = df['y_change_points'] # we will use this later

# Note Fx "follows" (time series) so randomization is NOT a good idea
# create train and val datasets.
no_train_samples = int(len(x) * 0.7)
x_train = x[4:no_train_samples]
y_train = y[4:no_train_samples]
y_train_change_points = y_points[4:no_train_samples]

x_val = x[no_train_samples:-3]
y_val = y[no_train_samples:-3]
y_val_change_points = y_points[no_train_samples:-3]
#
# Create class weights
#
from sklearn.utils.class_weight import compute_class_weight

num_ones = np.sum(y_train)
num_zeros = len(y_train) - num_ones
print('In the training set we have 0s {} ({:.2f}%), 1s {} ({:.2f}%)'.format(num_zeros, num_zeros/len(df)*100, num_ones, num_ones/len(df)*100))

classes = np.unique(y_train)
class_weight = compute_class_weight(class_weight='balanced', classes=classes, y=y_train)
class_weight = dict(zip(classes, class_weight))

print('class weights {}'.format(class_weight))

Next we will manually run the algorithm step by step and see what happens to the weights as we go. In this example we will use 500 data points at a time and “retrain” our algorithm on each block. After each training interval we will

  • display the new weights of the two input (feature) variables.
  • Using the last x and y of the block of data see if we can calculate a prediction and compare it to the SciKit probability prediction.

Note the “warm_start” switch in Logistic Regression tells the library that, when fitting, start from what we there before instead of from some random values ie build on what we already have learned.

#
# fit the model (step by step)
#

def sigmoid(x):
return 1 / (1 + np.exp(-x))

lr = LogisticRegression(warm_start=True)

start_ix=0
increments=500
x_list, y_list = [], []
while start_ix < (len(x_train) - increments):

x = x_train.iloc[start_ix:start_ix+increments].to_numpy()
y = y_train.iloc[start_ix: start_ix+increments].to_numpy()

lr.fit(x, y)

intercept = float(lr.intercept_)
coef_x1 = float(lr.coef_[0, 0])
coef_x2 = float(lr.coef_[0, 1])
x1 = float(x[-1, 0])
x2 = float(x[-1, 1])

predicted = float(lr.predict_proba(x[-1].reshape(1, 2))[0, 1])
calculated = intercept + (coef_x1 * x1) + (coef_x2 * x2)

print('ix: {}, x1: {:.5f}, x2: {:.5f}, y: {} int: {:.5f}, w1: {:.5f}, w2: {:.5f}, Calc: {:.5f}, CalSig: {:.5f}, Pred: {:.5f}'.format(start_ix+100,
x[0,0], x[0, 1], y[0],
intercept, coef_x1, coef_x2,
calculated, sigmoid(calculated), predicted))

start_ix += increments

You can see in the first few blocks the intercept and weights can change significantly. While they do “settle down” with time they do still move quite a bit. (that’s a clue we will be covering later). You can also see our manual calculation using the weights matches the prediction well so we know we are calculating things correctly.

This type of exercise is important to run through as it ensure you have your understanding and math correct (don’t have any fundamental mistakes) and may yield insights into what’s going on (there are some clues here).

Note if you reduce the block size down from 500 you may get some errors. The lbfgs algorithym needs at least one sample from each class (and 0 and 1) to make its determination with few and scattered 1’s that may not happen if the block size is small.

The Decision Boundary

The decision boundary refers the line in which samples over or above it are classified as a 1 and below a 0. From the above we have two variables (x1, x2) and y which is actually yes or no and a formulae to predict the probability of y from x1 and x2. Hence we can chart these to see if yields any new insights.

Note this is why we limited the inputs to 2. Charting with 3 can get complex and if you can figure out a good way to do 4 please do let me know. In our final model we will have close to 20 features!

Note we have moved our code structure to use functions so we can “loop” through a few different iterations with hyperparameters and see what happens.

Firstly, we calculate the model parameters (intercept, weights etc.) and use them to calculate x2 (the graphs y axis) from x1 which is set to its min and max values. The formulae used -w1/w2 * x1_values — (b/w2) can be derived from knowing the decision boundary yeilds a probability of 0.5. Its a little complex but there are a few articles that do a good of explaining it.

# Retrieve the model parameters.

def fit_and_get_parameters(x, y, class_weight):

lr = LogisticRegression(class_weight=class_weight)
lr.fit(x, y)

b = float(lr.intercept_[0])
w1, w2 = lr.coef_.T
w1 = float(w1)
w2 = float(w2)

# Calculate the intercept and gradient of the decision boundary.
c = float(-b/w2)
m = float(-w1/w2)

# get the min / max values of x1 and use to find decision boundary wtih x2
min_x1_value = x['x_t-1'].min()
max_x1_value = x['x_t-1'].max()
x1_values = np.array([min_x1_value, max_x1_value])
x2_values = -w1/w2 * x1_values - (b / w2)

print('y = {:.2f} + {:.2f} x1 + {:.2f} x2 Intercept(c): {:.2f}, Gradient(m): {:.3f} x1: {}, x2: {}'.format(b, w1, w2, c, m, x1_values, x2_values))

return x1_values, x2_values

Then we can feed a “graph” function the “points” and decision bound start and end to plot.

def plot_decision_boundary(x, y, x1_values, x2_values, heading):

# put 0's and 1's in two seperate lists for display
list_0_x1, list_0_x2, list_1_x1, list_1_x2 = [], [], [], []
for ix in range(len(y)):
if y.iloc[ix] == 0:
list_0_x1.append(x['x_t-1'].iloc[ix])
list_0_x2.append(x['x_t-2'].iloc[ix])
else:
list_1_x1.append(x['x_t-1'].iloc[ix])
list_1_x2.append(x['x_t-2'].iloc[ix])

# scaterplot the 0's and 1's
plt.scatter(list_0_x1, list_0_x2, marker='o', color='blue')
plt.scatter(list_1_x1, list_1_x2, marker='x', color='red')

# Draw the decision boundary
plt.plot(x1_values, x2_values, linestyle='-', color='black')

# axis labels
plt.xlabel('x1')
plt.ylabel('x2')
plt.title(heading)

return

We then run the simulation

start_ix, stop_ix = 0, -1
x1_values, x2_values = fit_and_get_parameters(x_train.iloc[start_ix:stop_ix], y_train.iloc[start_ix:stop_ix], class_weight)
plot_decision_boundary(x_train.iloc[start_ix:stop_ix], y_train.iloc[start_ix:stop_ix], x1_values, x2_values, '{} to {} with db {} to {}'.format(start_ix, stop_ix, x1_values, x2_values)).iloc[start_ix:stop_ix], x1_values, x2_values)

We can run this for a few scenarios of the data with different start and stop values. I have charted some of the ones below.

Some key points

  • Smaller datasets can result in completely nonsensical information.
  • The values of the data points move over time. From 0.55 up to 0.85 depending upon the range. This is to be expected, the “price” of Fx does change over time often over a wide range. This is the main issue with our model at the moment as we need to normalize the data. Next article we are going to do just that.
  • The distribution of 1’s and 0's (red and blue) is almost uniform. Hence drawing a decision boundary looks like it only slightly better than guessing (it looks right through the middle with equal numbers of both sides).

What does it all mean

As it stands our model is still pretty much completely useless but we have some insights we can develop.

  • the decision boundary runs through the middle with an almost uniform number of 1’s on both sides (so about as good as guessing)
  • The prices can move a lot and given our determination is a “weight” (ie y = w1 * t1 etc.) the office of a change in t1 can change the output completely.
  • This model is almost completely useless

Next Article

We can start to get closer next article as we explore different types of normalization.

Links and References

--

--