Step Four— Optimizing Trading Strategies: Stages of the Backtesting Pipeline

Artem Stepanenko
5 min readJun 12, 2024

--

In this article, we explore the design of a pipeline for backtesting trading ideas. Leveraging our own Japanese candlestick library, pre-processed data, and trading system design principles developed in our previous articles, we add a backtesting pipeline step to our story. The discussion encompasses the process of identifying viable trading ideas, their subsequent comparison and optimization, and the application of portfolio management techniques to curate an effective ensemble of trading robots.

Defining the Objective

To begin with, we will outline the overall task. Our aim is to evaluate the signal strength of various Japanese candlestick patterns across all Binance futures trading pairs and within all available timeframes, seeking parameters that yield the most stable and high-return results. For instance, we illustrate the process using the Marubozu pattern:

General Backtesting Pipeline

Parameter Sweep

At the time of our study, the futures market featured only 250 trading pairs with no more than five years of historical data. Given the selected timeframes of [1d, 12h, 8h, 6h, 4h], our dataset encompasses approximately 4 million rows (since not all pairs were available from the beginning). With this manageable data volume, a parameter sweep method appears viable for identifying the most effective parameters for Japanese candlestick patterns based on historical data.

However, if your data volume is significantly larger, or the frequency of backtesting is high, or you are constrained by hardware limitations, alternative methods can be considered. These can include:

  • grid search, which systematically evaluates a subset of the parameter space,
  • random search, which samples the parameter space randomly,
  • and Bayesian optimization, which builds a probabilistic model of the objective function and uses it to select the most promising parameters to evaluate.

These techniques can reduce computational complexity and improve efficiency.

Thus, moving on to our specific case, the core backtesting process involves exhaustively testing the parameters of the Marubozu pattern (upper shadow, lower shadow, body length, ref_period, periods, take profit, stop loss) across all trading pairs within the selected timeframes. Evaluating 5 values per parameter generates a dataframe with 5⁷ = 78,125 possible equity curves for each of the 5 chosen timeframes. Given that there are 5 different timeframes, this results in a total of 78,125 * 5 = 390,625 equity curves to be evaluated.

It is important to note that the processing time for each timeframe will vary significantly. For instance, the 4-hour timeframe includes six times more data than the daily timeframe, which will consequently require more computational resources and time for backtesting. Therefore, when planning your backtesting process, ensure to account for these variations in data volume and processing time.

At the same time, expanding the parameter range exponentially increases the number of operations, which substantially impacts hardware requirements. Therefore, parameter testing must be optimized, necessitating the implementation of efficient data processing techniques. These techniques include:

  • vectorization of computations to leverage efficient array operations,
  • parallel processing to distribute the workload across multiple CPU cores,
  • and utilizing high-performance libraries such as NumPy and Pandas.

Additionally, caching intermediate results and employing memory-efficient data structures can further enhance the performance of the parameter sweep. The technical execution of these aspects will be discussed in the next articles.

Ranking Results

The next step involves ranking the 78,125 equity curves for each selected timeframe to identify the optimal ones. The choice of metrics for the ranking system depends on the investor’s preferences and risk tolerance. While several thousand combinations might be profitable, each will exhibit a unique equity curve shape. Each investor, observing the equity curve, will react differently to its form: some may find sharp drawdowns unacceptable, while others may view prolonged flat periods as wasted time. Therefore, selecting the optimal equity curve must consider how the investor perceives their desired optimum.

It is also advisable to avoid overloading the metric set and to focus on those that most effectively describe the proximity of each equity to the ideal (ideally, a straight line with a steep slope).

For this project, we identified the following key parameters:

Key metrics to rank equity curves

A. Result — the final equity value.

B. Drawdown — the maximum and average observed loss from a peak to a trough.

C. Time to recover from a drawdown — the duration required to return to a peak after a drawdown.

These parameters can be also schematically represented in a three-dimensional coordinate system where the axes correspond to the final equity result, drawdown size, and recovery time, respectively.

Additionally, characteristics of individual trades can be considered, such as:

  • Standard Deviation — a measure of the volatility of returns,
  • Coefficient of Variation — a normalized measure of the dispersion of returns.

However, the list of possible comparison metrics is not complete. In a subsequent article on developing a system of ranking coefficients, we will discuss the most widely mentioned metrics in the literature.

Important! To compare results with varying returns on a risk-adjusted basis, it is necessary to normalize the metrics against a baseline scenario. In this project, the baseline scenario is the parameter combination that yields the highest return.

Normalizing the other scenarios provides comparable risk and return metrics, allowing us to calculate coefficients for each metric and derive a unified ranking indicator for each equity. The technical implementation of this process and the design nuances of the ranking coefficient system are detailed in the following article.

Ultimately, by averaging these metrics’ coefficients, we can rank the equities and select the optimal parameter combination.

Visualizing Results

To evaluate the selected parameters and make further portfolio management decisions, it is crucial to visualize the results, presenting the information necessary for an investor to decide on a trading strategy.

The pipeline results in the following graph, based on backtesting and optimization outcomes:

Example of Backtest Result Visualization

Below is the Python code used to generate the above visualization:

Conclusion

In summary, the process of optimizing trading strategies through a backtesting pipeline involves several critical stages. We began by defining the objective, aiming to evaluate the signal strength of various Japanese candlestick patterns across all Binance futures trading pairs and timeframes. We then discussed the parameter sweep, highlighting the importance of efficient data processing techniques to handle the computational complexity. The ranking of results was elaborated with a focus on key metrics and the importance of normalizing these metrics to enable risk-adjusted comparisons. Finally, we emphasized the necessity of visualizing results to make informed portfolio management decisions.

Having formed the main pipeline of our backtesting system, we move on to the next steps in designing the trading robot:

  • Choosing the best: a system for ranking trading metrics.
  • Technical implementation of the backtesting system.
  • Analyzing options for filtering false signals in the trading system.
  • Selecting an optimal portfolio of trading strategies.
  • Analyzing the performance of the trading robot: types of discrepancies.
  • Adapting trading strategies to the capabilities and limitations of Binance.

Stay tuned as we tackle these challenges and move towards launching our trading robot.

Thank you for reading! Your comments and feedback are greatly appreciated.

--

--