The training dilemma: loss vs profit function?

Tuning machine learning models for stock prediction applications

Haris (Chariton) Chalvatzis

Published in

Analytics Vidhya

6 min readAug 22, 2020

Introduction

In the world of machine learning, models are trained by minimizing some variation of a loss function. For example, when we try to predict the median house value of a specific zip code, we are dealing with a regression type problem and prime suspects for the loss function, to be used to train our model, are the Mean Squared Error (MSE), Mean Absolute Error (MAE) or Mean Absolute Percentage Error (MAPE) functions.

Continuing our example, let’s assume we train our model using MAE and choose to perform a grid-search over N total parameter combinations. In order to pick the best model it’s common practice to look which group of parameters resulted in the lowest MAE.

The same error metric is used to both train and evaluate our model.

This creates an alignment between training, evaluation and how we actually use the model in practice. Saying that the mean absolute error is $50K, makes sense and has a clear interpretation, we anticipate the real value to be+/- $50K of the predicted value.

Stock Prediction models are different

However, in stock prediction things are little bit more involved and by now you are probably wondering why. For starters, in stock trading (or generally investments) we are interested in profit, which is far more involved that just using the prediction as is. Hence, the error information becomes less relevant as we will demonstrate below. Think for a moment, which of the following statements would make feel more comfortable or resonate with you the most, when speaking for stock prediction models

The best model achieves an MAE of 12.5
The best model achieves cumulative return of 23%

It is clear that the second sentence carries the meaning of what we are actually trying to achieve. It’s more informative. Actually, what does it mean for a model to have a value of MAE = 12.5? We are not sure how that information translates into profits. More importantly, there is no guarantee that the model with the lowest error will be the one with the highest profit.

Why?

When we train models using a loss function and accordingly pick the best model using the same metric, we are in agreement in terms of process, however, this does not necessarily align with our actual goal, which (usually) is maximizing profit!

Minimizing loss does not necessarily imply maximizing profit.

There is a very important reason why that’s the case. The loss functions mainly used, such as MAE, MSE, MAPE, are symmetric error functions, meaning they are “blind” whether you approach the true value from “above” or “below”. For instance, when using MAE, a price prediction of 102 is equally good as 98 if the true price is 100, as both predictions give MAE = 2. However, when trading only one of the two provides a profit while the other a loss!

To make the above idea more concrete, consider the following example of the four-day price series

Y = [101, 100, 98, 101],

and two possible predictions of its last three samples,

X = [95, 92, 105], and
W = [102, 103, 97],

both generated on the first day, when the price was 101. X has a mean absolute error (MAE) of 5 when measured against the last three sample of Y (i.e. (|100–95| + |98–92| +|101–105|)/3) , while W’s error is 3.67. However, X is always correct on the direction of the price movement of Y, i.e., its predictions are always higher (resp.lower) than the previous price of Y when the price will indeed rise (resp. fall) in the next sample, while W is always wrong. Figure 1 below visualizes the above.

Fig 1. Showcasing the predictions of two models. Model X has higher mean absolute error (MAE) compared to model W, however, model X is always predicting in the correct direction (i.e. in the same direction of Y). In contrast, W is always wrong (i.e when Y goes up (down), W predicts down (up)).

Hence, by using MAE for both training and evaluation we would have picked model W (which has lower error but zero profit) despite the fact that X’s predictions are more profitable albeit having more error.

The model with the lowest error is not the one with the highest profit!

Trading requires direction

So what is happening here? How come the model with the lowest error is not the one with the highest profit? Let’s dig deeper.

If you recall, we used the word symmetric to describe the loss function and this is exactly the point of failure. Symmetric error functions fail to consider the direction, i.e. they treat positive and negative values the same (as demonstrated by the example above), which is of utmost important when making trading decisions.

Which brings us back to our original quest: should we actually use loss function to both train and evaluate our models? This is a rather interesting question but difficult to answer. Given the above discussion, let’s split the question into two parts: (a) training, (b) evaluation.

(a) Unfortunately, there is no proven metric that accounts for both maximizing profit and minimizing error while training models. So here there is no dilemma. For now, we are stuck with using any (symmetric) loss function for the training part (this is actually an active research area, trying to create a loss function that incorporates direction as well as magnitude).
(b) However, during evaluation, we can actually change the metric. Instead of choosing the model with the lowest error, we should instead consider the model with the highest profit (which could be defined in various ways, average or cumulative returns, Sharpe Ratio or some other form).

By doing so, in the above example above we would have picked the most profitable model (X), which is exactly what we want after-all.

What have we achieved?

Well, we are aligning our selection criteria with our goal. If our goal is to maximize profit by utilizing our predictions, then we might as well select the best model that does so (in sample at least). It is pretty clear to state that my model generated X% return in sample then stating my model had MAE of 12.5 (which is great but what do I do with that?)

Selection Framework

The proposed framework is straightforward.

train models using any loss function, and then,
pick the best model using any profit-based metric.

By using the loss function at training, we make sure our predictions are as close as possible to the actual values. Then, at evaluation phase, we put those predictions to work by using them through our favorite trading strategy (or even an optimization model). Only then we will know if the predictions are actually good or not.

This requires incorporating your trading model at evaluation phase in order to calculate profit. Whether you intend to use predictions by simply applying a basic up-down directional trading strategy or more elaborate distribution based trading-strategy, all of them have to be incorporated during evaluation.

The above process at least ensures we are using the predictions as intended. If you would like to read more, have a look at this paper which uses the aforementioned framework and yielded strong results.

Stock prediction models should be tuned to optimize profitability instead of accuracy

Conclusion

Hopefully we shed some light with respect to some pitfalls when using symmetric error functions, such as MSE or MAE to both train and evaluate stock prediction models and suggested a framework that aligns better with our end goal. Until we come up with a loss function that also promotes profitability, we should instead find ways to pick the best model that aligns with our intended use: make profit.