Step Five — Choosing the Best: A System for Ranking Trading Metrics

7 min readJul 4, 2024

In this article, we explore the creation of a system of ranking coefficients used to rank the obtained results during the optimization process. Leveraging our Japanese candlestick library, pre-processed data, trading system design principles, and the designed backtesting pipeline described in our previous articles, we add a ranking system step to our project. The discussion encompasses the process of identifying key metrics, their comparative analysis and optimization, and the application of portfolio management techniques to curate an effective ensemble of trading strategies.

Objective — Ranking Backtesting Results

In the previous step, we generated an array of 78,125 * 5 = 390,625 equity curves by applying various strategy parameters across different timeframes. The obvious next step is to select the optimal curve and thus determine the best parameters for deploying the trading system in the live futures market. One method to find the optimum is to rank the equity curves by the minimal sum of coefficients of various metrics. Let’s see how this works in practice.

The chart below shows two equity curves resulting from historical data using two Japanese candlestick patterns: Piercing Line and Morning Star, both applied on the 4-hour timeframe across the same number of trading pairs.

From the chart, it’s clear that both curves yield the same result over five years, but are they equally good? Visually, we can quickly form an intuitive opinion on which curve is preferable. However, when evaluating hundreds of thousands or millions of parameter combinations, pairwise visual comparison isn’t feasible. Instead, we can objectively assess each curve by summing the coefficients of key metrics.

Metrics for Evaluating Equity Curves

As mentioned earlier, every investor should determine their set of metrics to choose the most comfortable strategy for trading. Below is a list of the most popular metrics that objectively evaluate and rank equity curves of various shapes:

1. Final Capital: This represents the total amount of capital accumulated at the end of the trading period. It reflects the overall success of the trading strategy in terms of capital growth.

2. Max Drawdown: The maximum drawdown is the largest peak-to-trough decline in the value of a portfolio or trading strategy. It measures the most significant loss experienced before a new peak is achieved and is crucial for understanding the potential risk.

3. Average Drawdown: This metric calculates the average amount of drawdown over a specific period or across multiple drawdown events. It provides insight into the typical decline experienced during drawdowns, indicating the frequency and severity of losses.

4. Sharpe Ratio: The Sharpe ratio assesses the risk-adjusted return of an investment by dividing the excess return (over the risk-free rate) by its standard deviation. It helps investors understand how much additional return is received for the additional volatility endured.

5. Sortino Ratio: Similar to the Sharpe ratio, the Sortino ratio focuses only on downside volatility, providing a measure of risk-adjusted return that penalizes only negative deviations. It is used to better assess the performance of strategies that might be skewed towards positive returns.

6. Calmar Ratio: The Calmar ratio is calculated by dividing the average annual compounded return by the maximum drawdown. It evaluates the risk-adjusted performance of a strategy by considering the trade-off between return and drawdown, often used for hedge funds and trading strategies.

7. Profit Factor: This metric is the ratio of gross profits to gross losses. A higher profit factor indicates that the strategy generates significantly more profit compared to losses, making it a key indicator of overall trading effectiveness.

8. Win Rate: The win rate is the proportion of trades that result in a profit. It is a straightforward measure of how often a trading strategy is successful, reflecting the consistency and reliability of the strategy.

9. VaR (95%): Value at Risk (VaR) at the 95% confidence level estimates the maximum potential loss over a given time period, within a specified confidence interval. It is used to quantify the risk of loss in a portfolio.

10. Standard Deviation: This metric measures the amount of variability or volatility in the returns of a trading strategy. It indicates how much the returns can deviate from the average return, providing insight into the risk and consistency of the strategy.

11. Coefficient of Variation: The coefficient of variation is the ratio of the standard deviation to the mean. It allows comparison of the risk per unit of return across different strategies or investments, highlighting the relative riskiness of each strategy.

Let’s evaluate the two equity curves using this set of metrics. To make the evaluation comparable, we need to normalize each metric by a single parameter. In our case, this parameter will be Max Drawdown (marked in red in the table below). We proportionally adjust the results by the Max Drawdown ratio of both curves to obtain the following results:

From the table, we observe the differences in metrics when normalized by Max Drawdown. This normalization allows us to compare potential equity curves using a wide range of metrics against a common benchmark. Each metric provides insights into different aspects of our trading strategy, enabling traders to focus on the metrics that align with their trading preferences.

Metric Coefficients

But is calculating metrics enough? How do we choose the best equity curve if one metric favors one strategy (e.g., Piercing Line’s Sortino Ratio: 0.07 vs. Morning Star’s 0.04), and another metric favors the other strategy (e.g., Morning Star’s Win Rate: 0.13 vs. Piercing Line’s 0.05)? To rank all 78,125 * 5 = 390,625 equity curves, we need a system that can compare and consolidate all metrics into a single indicator, preferably a single number.

One solution is to introduce metric coefficients that assign weights to each metric in the final aggregate evaluation. This can be achieved by proportioning each metric’s values relative to a chosen base indicator. The formulas for calculating these coefficients are provided in the example code at the end of the article.

Previously, we normalized metrics using Max Drawdown as the base indicator. However, traders are free to choose any metric they find appropriate. For instance, in the example below, Final Capital is selected as the base indicator (marked in red) to make the coefficients comparable. By taking Final Capital as the base indicator, we obtain the following metric coefficients, which we can now sum:

Since we normalize coefficients using Final Capital, the metric coefficients of the curve with the highest capital growth are set to 1.0, while the coefficients for other curves are calculated accordingly. After calculating the metric coefficients, we can determine their average value for each analyzed equity curve. In our example, the average coefficients are 1.0 and 6.3, respectively, leading to the conclusion:

Although the equity curves of the Piercing Line 4h and Morning Star 4h strategies yield almost identical results over a 5-year backtest, the Morning Star strategy is more than six times worse than the Piercing Line strategy based on aggregate criteria.

The specific reasons are highlighted by the coefficients of our chosen metrics. For a more detailed analysis, we can examine each metric’s coefficients and the components used in their calculation. However, the conclusion is clear: what was initially understood intuitively is now detailed with comparable metrics and transformed into a numerical comparison of an aggregated indicator — the average value of normalized metric coefficients. This mechanism allows us to compare hundreds of thousands of parameter combinations in seconds and determine the parameters that make working with a strategy comfortable for the trader.

How Many Metrics to Include in the Evaluation System?

When selecting metrics for your own trading strategy evaluation system, include indicators that allow you to rank backtesting results and identify the most suitable strategy. However, avoid overloading the system with too many metrics, as an excessive number of indicators can obscure the objective picture due to the similar components used in their calculations. Conversely, minimalism can also mislead, depriving the trader of important information needed to understand backtesting results. A balanced methodological approach enables traders to build more reliable and tailored trading systems.

The following Python code provides a comprehensive example of how to calculate all the coefficients and metrics discussed above. This code can be used to implement the described evaluation system, enabling efficient processing and comparison of trading strategy results.

Python code snippet for calculating trading strategy metrics and their coefficients.

Conclusion

Developing a metric coefficient system for evaluating backtesting results allows for the efficient processing and comparison of vast data arrays. This approach provides a more objective and precise assessment of strategies, which is crucial when working with numerous parameters and timeframes. For maximum efficiency, maintain a balanced approach to metric selection, avoiding both excess and insufficiency. This methodology helps traders create more reliable and customized trading systems.

Having developed a metric coefficient system for evaluating backtesting results, we move on to the next steps of designing the trading robot:

Technical implementation of the backtesting system.
Analyzing options for filtering false signals in the trading system.
Selecting an optimal portfolio of trading strategies.
Analyzing the performance of the trading robot: types of discrepancies.
Adapting trading strategies to the capabilities and limitations of Binance.

Stay tuned as we tackle these challenges and move towards launching our trading robot.

Thank you for reading! Your comments and feedback are greatly appreciated.

Step Five — Choosing the Best: A System for Ranking Trading Metrics

Written by Artem Stepanenko