# Algorithmic Trading in Python with Machine Learning: Walkforward Analysis

Implementing a successful trading strategy with code can be a challenging task. While some traders prefer to use basic trading rules and indicators, a more advanced approach involving predictive modeling may be necessary.

In this tutorial, I will guide you through the process of training and backtesting machine learning models in **PyBroker**, an open-source Python framework that I developed for creating trading strategies. We will also learn about **Walkforward Analysis**, a popular technique that helps simulate how a strategy would perform during actual trading.

# Introducing PyBroker

**PyBroker** is a free and open-source Python framework that was designed with machine learning in mind and supports training machine learning models using your favorite ML framework.

Some of the key features of **PyBroker** include:

- A super-fast backtesting engine built using NumPy and accelerated with Numba.
- The ability to create and execute trading rules and models across multiple instruments with ease.
- Access to historical data from Alpaca and Yahoo Finance.
- The option to train and backtest models using Walkforward Analysis, which simulates how the strategy would perform during actual trading.
- More reliable trading metrics that use randomized bootstrapping to provide more accurate results.
- Caching of downloaded data, indicators, and models to speed up your development process.
- Parallelized computations that enable faster performance.

To begin using **PyBroker,** you can install the library with pip:

`pip install lib-pybroker`

Or you can clone the Github repository:

`git clone https://github.com/edtechre/pybroker`

# Walkforward Analysis

**PyBroker**** **utilizes a robust algorithm called **Walkforward Analysis** to perform backtesting. Walkforward Analysis essentially divides your historical data into multiple time windows and then walks forward in time, emulating the process of executing and retraining the strategy with new data in the real world.

During Walkforward Analysis, your model is initially trained on the earliest window and evaluated on the test data in that window. As the algorithm moves forward to evaluate the next time window, the test data from the previous window is added to the training data. This process is repeated until all the time windows are evaluated.

Walkforward Analysis is also useful in addressing the issue of data mining and overfitting by testing your strategy on out-of-sample data.

# An Example

Let’s take a look at example code for an indicator that calculates the difference between close prices and a moving average (CMMA). This indicator could be helpful for a mean reversion strategy:

`import pybroker`

import numpy as np

from numba import njit

def cmma(bar_data, lookback):

@njit # Enable Numba JIT.

def vec_cmma(values):

# Initialize the result array.

n = len(values)

out = np.array([np.nan for _ in range(n)])

# For all bars starting at lookback:

for i in range(lookback, n):

# Calculate the moving average for the lookback.

ma = 0

for j in range(i - lookback, i):

ma += values[j]

ma /= lookback

# Subtract the moving average from value.

out[i] = values[i] - ma

return out

# Calculate with close prices.

return vec_cmma(bar_data.close)

We then register the indicator function with PyBroker and specify the lookback parameter as 20 days (bars):

`cmma_20 = pybroker.indicator('cmma_20', cmma, lookback=20)`

Next, we want to build a model that predicts the next day’s return using our 20-day CMMA indicator. A simple linear regression is a good approach to start with, and we can use the LinearRegression model from scikit-learn:

`from sklearn.linear_model import LinearRegression`

from sklearn.metrics import r2_score

def train_slr(symbol, train_data, test_data):

# Train

# Previous day close prices.

train_prev_close = train_data['close'].shift(1)

# Calculate daily returns.

train_daily_returns = (train_data['close'] - train_prev_close) / train_prev_close

# Predict next day's return.

train_data['pred'] = train_daily_returns.shift(-1)

train_data = train_data.dropna()

# Train the LinearRegession model to predict the next day's return

# given the 20-day CMMA.

X_train = train_data[['cmma_20']]

y_train = train_data[['pred']]

model = LinearRegression()

model.fit(X_train, y_train)

# Test

test_prev_close = test_data['close'].shift(1)

test_daily_returns = (test_data['close'] - test_prev_close) / test_prev_close

test_data['pred'] = test_daily_returns.shift(-1)

test_data = test_data.dropna()

X_test = test_data[['cmma_20']]

y_test = test_data[['pred']]

# Make predictions from test data.

y_pred = model.predict(X_test)

# Print goodness of fit.

r2 = r2_score(y_test, np.squeeze(y_pred))

print(symbol, f'R^2={r2}')

# Return the trained model.

return model

The `train_slr`

function uses the 20-day CMMA as the input feature, or predictor, for the `LinearRegression`

model. The function then fits the `LinearRegression`

model to the training data for that stock symbol.

The final output of the `train_slr`

function is the trained `LinearRegression`

model for that specific stock symbol. **PyBroker** will use this model to predict the next day’s return of the stock during the backtest. The `train_slr`

function will be called for each stock symbol, and the trained models will be used to predict the next day’s return for each individual stock.

Then we register our training function with **PyBroker,** passing our `cmma_20`

indicator as training input:

`model_slr = pybroker.model(name='slr', fn=train_slr, indicators=[cmma_20])`

Now, let’s implement trading rules that generate buy and sell signals from our `slr`

model:

`def hold_long(ctx):`

if not ctx.long_pos():

# Buy if the next bar is predicted to have a positive return:

if ctx.preds('slr')[-1] > 0:

ctx.buy_shares = 100

else:

# Sell if the next bar is predicted to have a negative return:

if ctx.preds('slr')[-1] < 0:

ctx.sell_shares = 100

The `hold_long`

function opens a long position when the model predicts a positive return for the next bar, and then closes the position when the model predicts a negative return.

The ctx.preds(‘slr’) method is used to access the predictions made by the `'slr'`

model for the current stock symbol being executed in the function. The predictions are stored in a NumPy array, and the most recent prediction for the current stock symbol is accessed using `ctx.preds('slr')[-1]`

, which is the model’s prediction of the next day’s return.

We create a `Strategy`

object that will train our model and run our trading rules on *NVDA *and *AMD *using data downloaded from Yahoo Finance:

`from pybroker import Strategy, StrategyConfig, YFinance`

config = StrategyConfig(bootstrap_sample_size=100)

strategy = Strategy(YFinance(), '3/1/2017', '3/1/2022', config)

strategy.add_execution(hold_long, ['NVDA', 'AMD'], models=model_slr)

Finally, we run our backtest using the Walkforward Analysis algorithm, using 3 time windows, each with a 50/50 train/test data split:

`result = strategy.walkforward(windows=3, train_size=0.5)`

The `result`

contains trades and performance metrics from the backtest. There are 35 evaluation metrics in total, but here is a sample of a few:

`result.metrics_df`

trade_count 43

total_pnl 11293.00

max_drawdown -14177.60

win_rate 76.744186

loss_rate 23.255814

ulcer_index 1.195682

Additionally, PyBroker calculates metrics such as **Sharpe Ratio**, **Profit Factor**, and maximum** ****drawdown** using bootstrapping, which randomly samples your strategy’s returns to simulate thousands of alternate scenarios that could have happened. This allows you to test for statistical significance and have more confidence in the effectiveness of your strategy.

Below are confidence intervals for the log-transformed Profit Factor and Sharpe Ratio of our strategy:

`result.bootstrap.conf_intervals`

Log Profit Factor lower upper

97.5% -1.192473 0.114899

95% -1.192473 -0.001504

90% -1.104707 -0.133840

Sharpe Ratio

97.5% -0.303638 -0.010800

95% -0.303638 -0.042949

90% -0.303638 -0.078839

The resulting table shows the lower bound of the confidence interval at the given confidence level. This provides a more conservative estimate of the strategy’s performance. For example, we can be `97.5%`

confident that the Sharpe Ratio is at or above a given value of *x*.

A negative lower bound is not a good sign as it indicates that the strategy is not consistently profitable. In this example, both the Sharpe Ratio and the log-transformed Profit Factor have negative lower bounds, which suggests that the strategy is not reliable.

# Conclusion

Obviously, our strategy needs a lot of improvement! But this should give you an understanding of how to train and evaluate a model in **PyBroker**.

Please keep in mind that before conducting regression analysis, it is important to verify certain assumptions such as homoscedasticity, normality of residuals, etc. I have not provided the details for these assumptions here for the sake of brevity and recommend that you perform this exercise on your own.

We are also not limited to just building linear regression models in **PyBroker**. We can train other model types such as gradient boosted machines, neural networks, or any other architecture that we choose with our preferred ML framework.

With this knowledge, you can start building and testing your own models and trading strategies in **PyBroker**, and begin exploring the vast possibilities that this framework offers! Furthermore, I have written additional tutorials on using **PyBroker** and general algorithmic trading concepts that can be found on **https://www.pybroker.com****.**

Thanks for reading!