Optimize a Trading Strategy: An Introduction

Photo by Crissy Jarvis on Unsplash

Welcome to our new series about how to optimize a trading strategy. This introductory article was prepared specifically for algorithmic traders who possess ample trading experience but sometimes get confused and possibly mislead by mathematics and statistics. The general audience can also benefit from reading this article by learning which statistical approaches are likely NOT to work in real life. In particular, we’ll disseminate, in the context of algorithmic trading, what it really means by “optimization” and what it really means when we say one set of parameters is “better” than another. The methodology discussed here is generally applicable to all kinds of trading algorithms. Let’s get started.

Mathematical optimization is the selection of a best element, with regard to some criterion, from some set of available alternatives.

Trading is about making the “best” decisions given the available information. Down to the bottom, the decision engine of a trading system is a function parameterized by a set of constants C that, after being executed on a market condition dataset D, maps the initial account balance B_0 into the final account balance B. Here comes the critical part: the dataset D which represents the market conditions is a random variable. Market conditions are unpredictable, volatile, ferocious, and random. Therefore the final account balance B is also a random variable. On the bottom of our hearts, most traders want to select the best set of constants C with the criterion that it maximizes the expected value of B. Here comes the problem: the historical market condition dataset is a fixed and frozen dataset because for each and every moment we know the bids, we know the asks, we know the trades, we know everything, it’s already immutable history rather than the unknown future. Here comes the pitfall: what a sizable number of algorithmic traders and even professional trading tool vendors (especially in the crypto community) ended up doing was that given the frozen historical market condition dataset (which is not a random variable anymore) find the set of constants C that maximizes the value of B (which is not a random variable anymore). This is the “statistical” approach that is likely not to work in real life. A vivid analogy is that if each trader is a ninja and trading is fighting among ninjas, this approach assumes that all other ninjas stand still and freeze, and it selects your “ideal” motions of attacks based on the frozen picture of all other ninjas. When such “ideal” motions are deployed to production, you’ll find that all other ninjas are all of a sudden making incredibly unexpected moves, but your strategy as such “optimized” moves like a dummy robot and is doomed to get punched relentlessly. This is a classical textbook problem called “overfitting”. Here comes the solution: we have to use the frozen historical market condition dataset to somehow generate a random-variable dataset that can be used to simulate the randomness, unpredictability, volatility, and ferociousness of the future unknown market condition random-variable dataset. In short, we need to somehow mobilize those ninjas. ^_^ Among many possible approaches, a relatively straightforward one is to take sub-samples from the frozen historical market condition dataset to form the random-variable dataset D. Given a set of constants C, for each sub-sample we can compute a final account balance, and the average A of those computed values is an estimation of the expected value of the random variable B under the constants C, and we select the constants C_best that gives the maximum of those A values.

Cool. Sounds too complicated? Let’s look at a concrete example: optimize a simplified Avellaneda & Stoikov’s market making strategy. The constants we’d like to optimize are the minimum spread and the maximum spread. They are called SPREAD_PROPORTION_MINIMUM and SPREAD_PROPORTION_MAXIMUM, respectively in that model. Assume we have 3 candidates: a. minimum spread = 0.001, maximum spread = 0.01, b. minimum spread = 0.002, maximum spread = 0.02, c. minimum spread = 0.003, maximum spread = 0.03. The frozen historical market dataset is coinbase’s btc-usd’s best bids/asks and trades from 2021–07–01T00:00:00Z to 2021–08–01T00:00:00Z. Quick download of historical data can be achieved by a script: https://github.com/crypto-chassis/ccapi/blob/v5.10.0/app/src/spot_market_making/config.env.example#L31.

python3 download_historical_market_data.py --exchange coinbase --base-asset btc --quote-asset usd --start-date 2021-07-01 --end-date 2021-08-01 --historical-market-data-directory <any-location-you-like>

Now let us draw sub-samples from it such that each sub-sample contains 2 day’s of data:

sub-sample-1: 2021–07–01T00:00:00Z to 2021–07–03T00:00:00Z
sub-sample-2: 2021–07–02T00:00:00Z to 2021–07–04T00:00:00Z
sub-sample-30: 2021–07–30T00:00:00Z to 2021–08–01T00:00:00Z

For constants candidate a, let us run our market making program’s executable spot_market_making using each sub-sample and record the corresponding final account balances. For 30 sub-samples, we get 30 such balances : $7786, $8725, …, $8565 (Here we have converted final BTC to USD using the mid price at the final moment). Their average value is $8658. This is the expected value of B for constants candidate a. Now repeat the same for the other two constants candidate. We get that the expected value of B for constants candidate b is $9737, and the expected value of B for constants candidate c is $9959. Because our criterion is maximization of the expected value of B, we’d select constants candidate c as the best one.

Cool. Before completing this article, here are some interesting questions to think about:

  • What is the appropriate amount of historical data to use?
  • What might be other ways to create a random-variable dataset from the frozen historical dataset?
  • In those backtests, we used an initial BTC balance of 0 and an initial USD balance of 10000. What if we change those initial balances to some other values?
  • If the sub-samples are changed to contain 3 days of data rather than 2 days, does that change our selection on the best candidate?
  • If our optimization target is not the maximization of the expected value of B but rather the minimization of the standard deviation of B, what’s the consequence and interpretation of that? What other optimization targets can be used?
  • Grid search is exhaustive but very time-consuming. What other search algorithms can be used to accelerate the process of finding the best candidate?

^_^ If you are interested in our work or collaborating with us, join us on Discord https://discord.gg/b5EKcp9s8T. 🎉

Disclaimer: This is an educational post rather than investment/financial advice.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store