Trading in FOREX using Market Structure & Machine Learning

I was asked, as part of my final project of my Data Scientist Masters degree to submit a final project encompassing all the work I have learnt throughout the course. Well this is the story of a 1 year strategy developed in the Forex Market… A Market Structure Story…

The main objective of this project is to assess a number of strategies applied in the EURUSD Forex Market. The main problem here is obtain a number of buying/selling points that provides a statistical confidence > 50% in a 3:1 ratio bet, within the spreadbetting market.

The strategies have been based in Market Structure theory + Machine Learning algorithms.

The metrics used to assess the strategies has been:

Precision ≥ 50% in all 3 split dataframes.

What is Market Structure

The Forex markets trend in 3 different directions at any given time and understanding when a shift occurs based on chosen timeframe is pivotal to successful trading. The 3 types of market structure are:

  1. Bull trend (Up)
  2. Bear trend (Down)
  3. Sideways trend (Right)

The bull trend is depicted by higher highs (HH) and higher lows (HL). The trend will continue in that direction until a lower low is printed by the asset price. The trend begins to show signs of weakness when it fails to print a higher high.

The bear trend is the price action of lower lows (LL) and lower highs (LH). The bear trend will continue to fall as long as lower highs continue to print, once a higher high comes into the price, the trend will end. The sign that the trend may be reversing is when price begins to print higher lows or equal lows.

The sideways trend is a trend that has equal highs and equal lows. Price trends in a range during this point of the market and is in consolidation. Markets can move in a period of consolidation for a long time. This trend is broken if the price breaks out from the top or bottom of the range. This could be the beginning of one of the first two trends.

Here´s an example of a plotted figure of the 15M EURUSD Market Structure for a certain period using Plotly:

Data

The data used for this exercise is the EURUSD market candle sticks from 01.2007 to 06. 2021.

Parameters

The parameters used are the following. Please note all these functions have been appied for all timeframes (D1, H4, H1 & M15) and have been merged into a unique dataframes further along the line.

  • Closing → It defines whether the last candle closed up (above the opening price) Closing = 1; or closed down (below the opening price) Closing = -1.
  • MS_High & MS_Low → These are the top and bottom prices for the range in which the price moves.
  • Trend → It defines the latest trend. If the latest break in structure was bullish, Trend would = 1, and Trend = -1 if the last trend was bearish.
  • N_breaks → Function that calculates the number of previous breaks in the same direction.
  • MS_periods → Function that calculates the number of periods the price remains within the same Market Structure range.
  • MS_range → Function that calculates the number of pips between MS_High & MS_Low.
  • Indicators: There have been a number of common trading indicators that will be used in the 2nd part of the strategy to improve predictions. The chosen indicators have been:
RSI_14_SMA, RSI_14_EMA, 
EMA_MACD_12_26_9, Hist_MACD_12_26_9, MACD_signal,
Boll_SMA_20
  • Lastly, and most important the Retracement Levels: Function that calculates percentage-wise how much the price has retraced (0,100)% between MS_High & MS_Low depending on its trend. An example is shown below:

Strategy Parameters

Ratio 3:1 → This means that for all “bets” we put in, the Limit to where we collect our profit (whether we buy or lose) will be 3 times the Stop loss. In simple terms: if we bet 1 and lose, we lose 1; and if we win, we win 3.

Range: The range is the difference between Market Structure High and Market structure Low and measured in pips (0.0001 = 1 pip):

Range = [MS High]- [MS Low]

Pricing

Two pricing functions have been defined:

  • Price_M15_H1: Function that prices all M15 candles based on the Closing price of M15 candles and calculates its limit & stop loss based on the H1 Market Structure (MS_H_H1 & MS_L_H1).
  • Price_M15_H4: Function that prices all M15 candles based on the Closing price of M15 candles and calculates its limit & stop loss based on the H4 Market Structure (MS_H_H4 & MS_L_H4).

For every M15 candle this function will attribute a 1 or a 0 to:

  • Labelb_M15_H1 → 1, if buying at a 3:1 Limit/stop-loss ratio having the H1_Market_Structure determining the Limit and the stop-loss settings, and being successful.
  • Labels_M15_H1 → Same as above but selling instead of buying.
  • Labelb_M15_H4 →1, if buying at a 3:1 Limit/stop-loss ratio having the H4_Market_Structure determining the Limit and the stop-loss settings, and being successful.
  • Labels_M15_H4 →Same as above but selling instead of buying.

The Strategy

The strategy will consist in a two-phase approach. First of all the data will be filtered based on the Retracement Levels using the Fibonnaci retracement levels as thresholds [0, 23.6, 38.2, 50, 61.8, 78.6, 90, 99] and the Trends for all timeframes.

The second phase approach will be to apply Machine Learning (Random orest classifier) using the indicators described above to improve the predictions.

Methodology

The methodology used has been:

  • The 15 year dataframe will be split in 3 different dataframes equal in length (approx 5 years of data for each dataframe) and have the strategy be succesful on all 3 dataframes.
  • The same strategy will be applied on buy signals and sell signals, this will guarantee a simetry with the applied strategy, providing more confidence in having a ‘non-bias’ model.
  • We aim to assess the models with a ‘healthy’ ROC for all models.

1st part of the strategy: Filtering Process

The filtering process has been based on the Retracement Levels explained above set by the direction of the trend. If the trending were to be bullish (Up), the retracement level is calculated as: ([Close Price 15M] - [MS Low]) /(Range),

if the trending were to be bearish (Dw), the retracement level is calculated as: ([MS High]-[Close Price 15M]) /(Range).

By applying an exploration function it has been determined that the following fibonacci retracements levels and trends have shown to be the most successfull:

Strategy - MS_H1

  • BUY - H1 Trend up + MS_D1 between (38.6%, 50%) + MS_H4(23.6%, 61.8%) + MS_H1(0%, 78.6%) + MS_M15(38.2%,61.8%)
  • SELL - H1 Trend dw + MS_D1 between (38.6%, 50%) + MS_H4(23.6%, 61.8%) + MS_H1(0%, 78.6%) + MS_M15(38.2%,61.8%)

Strategy — MS_H4

  • BUY - D1 Trend Up + H4 Trend up + MS_D1 between (0%, 78.6%) + MS_H4(23.6%, 78.6%) + MS_H1(0%, 90%) + MS_M15(0%, 99%)
  • SELL - D1 Trend Up +H4 Trend dw + MS_D1 between (0%, 78.6%) + MS_H4(23.6%, 78.6%) + MS_H1(0%, 90%) + MS_M15(0%, 99%)

The results of this first stage are as follow:

Strategy — MS_H1
Strategy — MS_H4
  • Mean refers to the average of winning points,
  • (Count x 30) would be the number of points in total filtered out for the whole dataframe.
  • The Total months represents the number of months accross the whole timeline where points have been identified. It also represents the chances of having a point available any month.

2nd part of the strategy: Machine Learning

Next we will be applying Random Forest Classifier “throwing” the indicators calculated previously together with any other information that hasn´t been used into the “mix”.

The Random Forest Classifiers will be applied and trained independtly to 2 of the 3 dataframes, forming the following pairs:

  • trained on df1 & df2 and validated on df3,
  • trained on df1 & df3 and validated on df2, and
  • trained on df2 & df3 and validated on df3.

The 4 succesful models will be the ones that predict the best results for the Buy and sell strategies on the H1 buying label, H1 selling label, H4 buying label and H4 seling label.

The results for the optimum 4 models are as follows:

Strategy - MS_H1 - BUY (ROC curve & precision_recall curve)

ROC & precision recall curves for Strat 1 — BUY
Most relevant features for Strat 1 — BUY

Strategy - MS_H1 - SELL (ROC curve & precision_recall curve)

ROC & precision recall curves for Strat 1 — SELL
Most relevant features for Strat 1 — SELL

Strategy - MS_H4 - BUY (ROC curve & precision_recall curve)

ROC & precision recall curves for Strat 2— BUY
Most relevant features for Strat 2 — BUY

Strategy - MS_H4 - SELL (ROC curve & precision_recall curve)

ROC & precision recall curves for Strat 2 — SELL
Most relevant features for Strat 2 —SELL

CONCLUSION

Strategy 1:

There are 70% chances that at least 1 of these points (Buying & selling) will appear in the month. By filtering the chances of success is already approx. 40% on a 3:1 bet. But when applying the random forests classifier, by sacrifying approx. 40% of these points we raise are chances of success over 50%.

Strategy 2:

There are 33% chances that at least 1 of these points (Buying & selling) will appear in the month. By filtering the chances of success is already approx. 40% on a 3:1 bet. But when applying the random forests classifier, by sacrifying approx. 40% of these points we raise are chances of success over 50%, as well.

WARNING

For anyone that has reached to the end of the note, please note that BIAS is always there, and there even though the models have only been ajusted by 2/3 of the data and validate by the other 1/3, threre are always chances of BIAS that could invalidate the strategy. Any strategy always has to be tested in a DEMO scenario and only when the statistics prove right, is smart to invest your own money.

--

--