What Everybody Ought to Know About Backtesting
All backtests are not created equal
Running a quick, vectorized backtest is a great way to get your head around a trading strategy or test a new idea. But you should never trade off of such a model if you want to make money. Instead, you should opt for an event driven system.
Vectorized? Event-Driven? What’s the difference?
Most backtests are typically broken down into one of two categories: event-driven and vectorized. Vectorized tests are very fast and basic strategies can usually be fully implemented in a few lines of code (see here for example). They’re called vectorized because they get their speed from taking advantage of vectorized computation using NumPy or Pandas arrays (if you’re in Python) rather than running through a loop and calculating each value one at a time.
If you’re building a mean reversion strategy, for example, you may calculate some moving average over your data, look at the divergence of the price action from that baseline, then encode an array of positions to indicate whether you would be long, short, or neutral (e.g. 1, -1, or 0) for each of those. Then take those values, multiply it be the returns on your bars, sum it up, and voila! You’ve just run a backtest.
It’s a quick and dirty way of doing things.
Event-driven backtests are more involved and typically require some type of simulation engine. At the most basic, you’re running that long, slow loop to your model giving it one tick at a time to make decisions on. In the mean reversion example, you’ve now got to keep the last N-ticks of price data to calculate your moving average at each new tick and compare that to your divergence indicator. On top of that, you need to keep close tabs on your position and the prices you opened or closed positions at, which often requires its own data structure (or two) to manage. Handling all of the cash and portfolio issues can quickly become a complicated mess, plus these extra operations make your event-driven model much, much slower than a comparable vectorized system.
So why do I say you should always opt for the latter?
Capture the Movements of the Market
There are simply some ideas that cannot be adequately implemented in a vectorized fashion.
I’m going to go out on a limb here and assume that most of us traders have a limited amount of cash to work with. If we have two securities, but put all of our money into security A, then want to put some into security B on the next tick, a typical vectorized back test system would not be able to simulate that and lead to unrealistic returns.¹
One of the most pernicious issues with vectorized systems is their propensity to suffer from look-ahead bias. In short, this occurs when future data gets incorporated into previous decisions; it’s like running a trading system with a crystal ball. Clearly, this has no place in any simulation worth your time.
Further, event-driven systems allow more of the richness of the market to be incorporated into your testing. You can build separate models to account for liquidity and slippage issues, build more robust risk management and portfolio optimization strategies, and much more. These models are designed to be tested like you trade — one tick at a time — and can more accurately match that without a lot of ad hoc coding that is likely to lead to mistakes.
Trading what you test is absolutely key to developing a good track record to ensure you’re relying on the same statistical signals you found during your backtestings. Using event-driven models is the best and easiest way to plug your live data feed into your system to ensure you’re live trading set up matches your research.
Building an Event-Driven System
There are a host of solid backtest systems available online. It just requires you to download them from GitHub, spend hours learning all of the ins-and-outs of the particular framework you’re working with, buy the data, and code up whatever strategies you have in mind. There are lots of great posts out there to help you out (like this classic series). However, if you can’t code in C, Python, MATLAB, or some other scientific language, then add another step to your process: learn a programming language. This isn’t an insurmountable list — it’s where I started myself — but it does take time and commitment.
An Easier Way to High Quality Trading Systems
I think more time should be spent researching your ideas and trading them, not fiddling with code and data. High quality code and high quality data ought to be your starting point.
At Raposa, we’re building exactly that: a complete, cloud-based backtest system so you can go from idea to execution in no time and without any code. Sign up for early access below and let our pros handle the messy details for you and make sure you have a backtest that is done right.
1. I do realize there are ways around this, but they become increasingly complicated with more sophisticated risk/money management strategies and become a surefire way to introduce bugs into your system.