How to Build a Backtesting Engine in Python Using Pandas

Jachowski
8 min readApr 22, 2022

--

A simple way to build an easily scalable backtesting engine to test your Trading Systems in Python using only Pandas and Numpy

Backtesting is a crucial step in designing your Trading Systems, I would say that it is the crucial step given that it assesses the viability of your strategies.

Just imagine: Earth, 2050. The first flying car ever is released on the markek but it’s never been tested. Would you buy it? I think (hope) no.

This simple analogy intends to highlight the importance of backtesting: before investing through a whatever algorithmic model, test it, again and again, even if your favourite financial guru on YouTube says that a certain strategy will provide a 100% return in less than a year.

Believe in what you see, not in what they tell you to see.

In this sense, it’s not the best idea to use a pre-built engine for backtesting using libraries such as Backtrader for many reasons: you can’t neither properly see what is going on in there nor modify it as much as you want.

Remember, the second principle of the Zen of Python states that “Explicit is better than implicit”. If you can build explicit functions by your own instead of using black-box pre-built ones, go for it.

Oh, and the third principle says that “Simple is better than complex”. Let’s see how easily you can backtest your strategies with Pandas.

The Idea

This is what we’re going to do:

  1. Import the libraries
  2. Import stock data
  3. Define a trading strategy
  4. Define a market position function
  5. Define a backtesting function

Let’s get into code stuff!

1. Import the Libraries

Let’s import the three libraries we need. Said and done:

import numpy as np
import pandas as pd
import yfinance as yf

2. Import Stock Data

Let’s download 20 years of Amazon (ticker AMZN) stock data.

amzn = yf.download('AMZN', '2000-01-01', '2020-01-01')

3. Define a Trading Strategy

In this case, we’re going to test one of the most popular strategies: the Double Moving Averages Crossover.

First of all, we have to define two Simple Moving Averages. That’s how:

def SMA(array, period):
return array.rolling(period).mean()

That is, this function has three arguments:

  • dataset is the dataframe that contains the stock data we previously imported (AMZN stock data),
  • array is the series we will apply the function on (Close Prices) and
  • period is the lenght of our moving averages (e.g. 14 and 200 days).

The function returns a sliding window (.rolling()) of a desired lenght ((period)) of an array (array) on which it is computed the arithmetic mean (.mean()).

Let’s define the two moving averages we will use. The first is the shorter-term (14 days), while the second is the longer-term (200 days):

sma14 = SMA(amzn['Close'], 14)    
sma200 = SMA(amzn['Close'], 200)

This is what we get:

Now, we need to define the entry rules and exit rules of our strategy, which are the crossover and the crossunder, respectively.

In other words, we get an:

  • entry (buy) signal when the shorter-term moving average (14 days) crosses above the lower-term moving average (200 days)
  • exit (sell) signal when the shorter-term moving average (14 days) crosses below the longer-term (200 days).
def crossover(array1, array2):
return array1 > array2

def crossunder(array1, array2):
return array1 < array2

And after that we assign crossover to the enter rules and crossunder to the exit rules:

enter_rules = crossover(sma14, sma200)exit_rules = crossunder(sma14, sma200)

Basically, we obtain two boolean series (True or False):

  • enter_rules is True whenever sma14 > sma200 while
  • exit_rules is True whenever sma 14 < sma200.

Hence, looking at the images above of the series sma14 and sma200, we expect to find False on the enter_rules on the 13th of October, 2000, since 33.5714 < 51.9385, i.e. sma14 < sma200.

Let’s check for it:

check = enter_rules[enter_rules.index == '2000-10-13']
print(check)

This is the starting point.

Now we fly. But not with that never tested flying car.

4. Define a Market Position Function

Here, we’re going to create a function that defines the ongoing trades: to achieve this, we will create a switch that:

  • turns on if enter_rules is True and exit_rules is False and
  • turns off if exit_rules is True.

Here it is the function:

def marketposition_generator(dataset, enter_rules, exit_rules):

dataset['enter_rules'] = enter_rules
dataset['exit_rules'] = exit_rules

status = 0
mp = []
for (i, j) in zip(enter_rules, exit_rules):
if status == 0:
if i == 1 and j != -1:
status = 1
else:
if j == -1:
status = 0
mp.append(status)

dataset['mp'] = mp
dataset['mp'] = dataset['mp'].shift(1)
dataset.iloc[0,2] = 0

return dataset['mp']

It takes three arguments:

  • dataset is the dataframe that contains the stock data we previously imported (AMZN stock data),
  • enter_rules is the boolean series containing the entry signals and
  • exit_rules is the boolean series containing the exit signals.

On the first two rows we copy on our dataset the exit and the entry rules. status is the switch and mp is an empty list that will be populated with the resulting values of status.

At this point, we create a for loop with zip that works like a… ye, a zipper, enabling us to do a parallel iteration on both enter_rules and exit_rules simultaneously: it will return a single iterator object with all values finally stored into mp that will be:

  • mp= 1 (on) whenever enter_rules is True and exit_rules is False and
  • mp= 0 (off) whenever exit_rules is True.

Note: in Python, True corresponds to 1 but here, in the if j == -1 statement related to the exit_rules, True is -1. Later on it will be clear the reason of that.

In the last three lines, we add mp to our dataset, we forward shift its values by one period so that the trade starts the day after we received the signal and in the last line we substitute the nan value, subsequent to the shift operation, with 0. The function returns the mp series.

5. Define a Backtesting Function

Last step. We’re close to the end, hang on!

First of all, we have to define some parameters such as:

  • COSTS: fixed costs per trade (i.e. transactions’ fee)
  • INSTRUMENT: type of instrument (1 for stocks, 2 for futures, etc.)
  • OPERATION_MONEY: initial investment
  • DIRECTION: long or short
  • ORDER_TYPE: type of order (market, limit, stop, etc.)
  • ENTER_LEVEL: entry price
COSTS = 0.50
INSTRUMENT = 1
OPERATION_MONEY = 10000
DIRECTION = "long"
ORDER_TYPE = "market"
ENTER_LEVEL = amzn['Open']

We’re assuming that:

  • COSTS: every operation will cost us 50 cents, 25 to buy and 25 to sell
  • INSTRUMENT: the system will be tested on a stock (AMZN)
  • OPERATION_MONEY: the initial capital is 10k dollars
  • DIRECTION: the strategy will be tested for long trades
  • ORDER_TYPE: the strategy will process market orders
  • ENTER_LEVEL: the entry price corresponds to the open price

And here it is the best part:

Let’s analyze the function line by line.

From line 3 to line 5 we add the two boolean series and the market position function to the dataset.

Note: In the previous note, I told you that everything would have been clear: in the lambda function of exit_rules, all values equal True are assigned to -1 while False values are assigned to 0. Thanks to that, marketposition_generator runs wonderfully.

From line 7 to line 12 we define market orders for stocks:

  • In lines 7–9 we define the entry_price: if the previous value of mp was zero and the present value is one, i.e. we received a signal, we open a trade at the open price of the next day;
  • In lines 10–12 we define number_of_stocks, that is the amount of shares we buy, as the ratio between the initial capital (10k) and the entry_price;

In line 14 we forward propagate the value of the entry_price ;

In lines 16–17 we round number_of_stocks at the integer value and forward propagate its value as well;

In line 20 we associate the label 'entry' to 'events_in' every time mp moves from 0 to 1;

From line 22 to line 27 we define the long trades:

  • In line 24 we compute open_operations, i.e. the profit;
  • In line 25 we adjust the previous computation of open_operations whenever we exit the trade: whenever we receive an exit signal, the trade is closed the day after at the open price. Here, round turn costs are included;

From line 28 to line 33 we replicate for short trades what was said for long trades: to test short trades you just have to set DIRECTION = ‘short';

In line 35 we assign open_operations equal 0 whenever there is no trade in progress;

In line 36 we associate the label 'exit' to 'events_out' every time mp moves from 1 to 0, i.e. we receive an exit signal;

In lines 37–38 we associate the value of open_operations to operations only when we’re exiting a trade, otherwise nan: by doing so, it will be very easy to aggregate data;

In line 39 we define the equity_line for close operations and in line 40 it is defined the equity_line for open operations;

In line 42 we save the resulting dataset in a csv file.

Let’s call the function and inspect the results.

COSTS = 0.50
INSTRUMENT = 1
OPERATION_MONEY = 10000
DIRECTION = "long"
ORDER_TYPE = "market"
ENTER_LEVEL = amzn['Open']
trading_system = apply_trading_system(amzn, DIRECTION, ORDER_TYPE, ENTER_LEVEL, enter_rules, exit_rules)

These are two long trades registered by the Trading System:

To check if the Trading Strategy— the Double Moving Averages Crossover — produced profitable long trades in the time period considered for that stock, you can just digit:

net_profit = trading_system['closed_equity'][-1] - OPERATION_MONEY
print(round(net_profit, 2))

A return of almost 500% in 20 years. Not suprising considered that Amazon stock increased by 2400% in those 20 years and we used a trend-following strategy.

That’all for this article. Hope you’ll find it helpful.

Let me know if you could be interested in seeing extensions of this backtesting engine, for example how to implement limit orders.

In case you need clarification or you have advices, feel free to contact me on Telegram:

Cheers 🍻

Reference:

Trombetta, Giovanni, Strategie di Trading con Python, Hoepli, 2020

--

--