Algorithmic Trading 101 — Lesson 2: Data, Strategy Design, and Mean Reversion

Published in

The Ocean

9 min readMay 11, 2018

Last week, we jumpstarted our Algorithmic Trading 101 series with a lesson in time series analysis. Now we’re taking it one step further by introducing mean reversion as a model when trading a single asset. As we walk you through the fundamentals, remember that there’s never one correct model. Building a profitable strategy takes good data management, fine tuning parameters, and optimizing execution. One strategy today might be not be viable tomorrow.

Don’t forget: We’re always here to help! If you get stuck or have your own strategy that you’d like to port over to The Ocean X, ping us on Telegram or email us at hello@theoceantrade.com. We’ll also be awarding five prizes of $1,000 in crypto to students or mentors during our series, and it’s not too late to sign up! Discuss, submit, or help out, and we’ll consider you in the running for our little contest.

A Quick Comment on Data

Data is the lifeblood of algorithmic trading — it’s how you identify initial patterns, backtest your strategies, and make sure your models respond in real time. But more data is not necessarily better.

It takes time to organize and clean data in a format that’s easy to use, e.g. changing values from integers to floating decimals or converting data that’s in minutes or seconds to milliseconds.
Testing big data can eat up server space and slow down performance of your machine.
Too many variables in your models can lead to multicollinearity, where parameter estimates become unstable and make it difficult to assign explanatory power.

There’s a right balance to strike — how much data do I need to explore vs. how much data can I manage effectively and efficiently. As we progress through the course, we’ll include additional references for data management and processing. But SQL and other relational databases are good places to start: Silota provides a great SQL query example with cryptocurrency data, and CoinAPI lets you collect information from different cryptocurrency exchanges. (And once we launch, we’ll show you how to grab data from The Ocean API.)

Creating a Strategy Revisited

Last time we outlined the steps you need to take when building an algorithmic trading strategy. Let’s discuss it with more detail:

1. Determine the type of strategy

We’ll show you a few types of models, but there are endless opportunities for experimentation depending on your objective: alpha generation, spread capture, arbitrage, market making, event driven, etc. And there is no type of strategy that is better than any other. It really depends on your risk/reward profile, and how well you implement and test. Algorithmic trading websites and quantitative finance papers can be good sources to find ideas. Some of our favorites include Quantstart, Quantopian, QuantConnect, Wilmott, arXiv, and SSRN.

2. Generate the signals for position taking

It’s also important to have entry and exit signals. If you’re holding only one position, this can be fairly straightforward. With multiple positions, it’s important to make sure that your signals do not overlap and/or do not contradict (i.e. your model tells you to enter and exit positions on the same asset at the same time). Entry signals are generally easier to create — once your model’s trigger ‘hits,’ you trade. Exit signals can be a bit trickier — you could take small quick wins, but you might also think to yourself, ‘if I wait just a bit longer, maybe I can get a bigger score’. So your decisions around exit (as defined by total profit or stop loss) depend on your risk tolerance — the longer you hold your position, the more uncertainty you face.

3. Backtest and optimize the model’s parameters

Backtesting means testing your model against historical data before trading with it live. You can simulate results in a controlled environment, over many different time periods or scenarios, to see how your model performs in a variety of conditions. And you can play with the ‘parameters’ (e.g., should I use one or five lag periods?) to find a set that produces consistent, positive results — optimized for your risk/reward profile, of course. You may need to run many, many, many iterations and model configurations to find a profitable strategy. And small changes to parameters can have a large and sometimes unexpected impact. That’s why it’s important to collect data, test your models against it, and continuously update/fine tune with new data. You can read more about backtesting here.

4. Consider outside risk factors like execution

No model can predict the future. A model, by nature, is a simplified representation of the world. In the volatile cryptocurrency market, this rings especially true around execution. For example, the current price of an asset is 1.00, and you’re in a long position with your stop loss set to sell at 0.95. Suppose prices aren’t updated continuously, but instead in intervals, and the next price you actually see and can trade is actually 0.90. Well, then you’re selling for a loss of 0.10, even though your supposed ‘max loss’ was 0.05! This issue arises due to liquidity and latency within a price (it’s an issue that’s common in cryptocurrency exchanges, especially DEXs).

Another outside factor that can mute strategy profitability is transaction costs. Even a perfectly backtested model can be unprofitable, especially if the modeler didn’t account for transaction costs (as measured not only by fees but slippage and other factors) properly. If your live strategy performance deviates from your backtests, you may need to dig into some of these issues. We’ll highlight some of these issues related to cryptocurrency in later lessons.

5. Evaluate the benchmark returns

Observing magnitude of returns can be misleading because profits are naturally higher if you invest in larger positions. The most common benchmark that people use is the Sharpe Ratio:

Sharpe Ratios put your strategy performance into the larger context of how much risk, or volatility, you took to achieve those returns. Bigger Sharpe ratios are better — it means either higher returns or low volatility to get those returns. Other factors to consider include drawdowns for tail end risk, average returns per trade, and average holding period.

Mean Reversion Model

Mean reversion models operate on the assumption that if the price on an asset deviates from its average, it is destined to revert back to its average. This can be a fair assumption to make in many markets — if the price falls too ‘low’ it can be viewed as cheap by many market participants. And so when many traders go to buy, it increases the price — back to its average level. The reverse is also true — if viewed as ‘expensive,’ traders will sell to capture the (perhaps temporary) gains. Thus mean reverting models employ a ‘buy low, sell high’ mentality, and it’s up to the modeler to figure out what the appropriate ‘mean’ level might be.

What may cause the price to deviate from its average? News events often cause the largest variation. There could be a new product release, earnings call, or lawsuits that could drive the price in either direction. Even tweets or the comments of prominent participants can have drastic impacts on the price. Note that the general belief (at least in the mean reversion context) is that these events, or other factors causing price deviations, are one-offs and the price will eventually revert, but it’s possible that there may be a true breakout — when there’s a permanent, not temporary, change in the ‘average’ price level. That’s why it’s important to include a stop loss parameter — a max amount you’re willing to lose — in your model.

Let’s dive into the mathematics behind the model. In a discrete time mean reverting model, we can model the price movements as:

So we can see that our price prediction is a function of how much our price at time t deviates from the mean, multiplied by some constant K. K can be thought of as the ‘speed of mean reversion’ — a bigger K means that we expect price to revert to its mean faster. This model is actually an extension to the autoregressive model with one lag period as we are using no outside regressor in the equation.

Technical note: in continuous time, using differential calculus, another common form of the mean-reverting model is the Ornstein-Uhlenbeck process.

For more information on this model, see Ornstein-Uhlenbeck Process as a Model of Volatility.

Putting it all together

Now let’s use the same sample data from last lesson to explore this model in detail. In practice, we must determine how much the price needs to deviate from the mean for it to be statistically significant enough to take a position. Common methods are using confidence intervals, relative strength indicators, Bollinger bands, or even fixed standard deviations away from the moving average.

Using the data table, we see that the simple average of the close prices is 1.0059, with a standard deviation of 0.0893. Thus, the one standard deviation confidence interval would be (0.9166,1.0952). Assuming we take positions only if the price breaks outside this interval, and if we were starting on day 1, our first position would be taken on day 4, since 1.18 is outside of the bands. We would take a short position here with the belief that the price will revert back to the mean of 1.0059. Since the price does revert back on day 5, this is a profitable position.

This example ignores the mean reversion speed parameter as well as setting an appropriate stop loss — optimizing each can be done through backtesting, to explore how to maximize profit with minimal risk. For example, you could set your stop loss such that you exist the position if it goes beyond 3 standard deviations away — which might indicate a new mean price level. Or maybe it should be 2 standard deviations away — you might have smaller profits, but face less risk. As with all algorithmic trading, your own objectives and risk tolerance will drive many of your modeling decisions.

Here’s some more info on mean reversion in the cryptocurrency context:

From a day-trading perspective (Dr. Philipp Kallerhoff)
A Python implementation (Catalyst)
Backtesting (Andrew Bannerman)
Impact of transaction fees (Bart Michalczuk)

Disclaimer: As always, we’re not advocating for one strategy over another, and we’re not responsible for any gains or losses you may experience when trading in a live environment using these techniques.

Challenge #2 — Mean Reverting Model

Create your own mean reverting model using The Ocean API
Bonus: Backtest on data available at CoinAPI

Remember to send your answers to hello@theoceanx.com if you’d like us to review your solution. Otherwise, our solution is now available on GitHub. Check it out! 👍

*Since we’re not live (yet!), feel free to learn from and tweak this code to meet your current needs. This course is designed to give you the fundamental knowledge and base code to craft your own strategies, both on and off The Ocean.

Answer to Challenge #2 — Mean Reversion Model

Last week, we asked you to write a mean reversion model. Our solution is now available on GitHub. 👍

*Remember, anyone that participates on Telegram or sends us a solution anytime during the course of our Algorithmic Trading 101 series is eligible to receive part of $5000 in cryptocurrency prizes.

__________________________

🤖 Links to Lessons 🤖
The Syllabus & How to Win
Lesson 1: Time Series Analysis
Lesson 2: Data, Strategy Design, and Mean Reversion
Lesson 3: Intro to Arbitrage Strategies
Lesson 4: Portfolio Management and Machine Learning in Python
Lesson 5: More Machine Learning

Having trouble or just want to discuss your strategy? Join our Telegram to get real-time feedback and answers.

Follow us on Twitter at @TheOceanTrade or subscribe to our newsletter to stay in the loop.