Issues to address while comparing forecast results

Aditya Kumar
tech-that-works
Published in
3 min readFeb 5, 2019
Photo by Markus Spiske on Unsplash

This post describes different issues that are generally present when you are comparing two forecasts with actual results. And will tell about the type of complexity associated with it and how difficult it can be to come up with a single metric or number for comparing two forecasts over a period.

Consider a hypothetical situation in your organization, every day you need to forecast cash withdrawal in ATM so that based on the demand and availability of cash you can schedule a trip and fill that ATM to avoid cash-out.

This is a general problem that every bank faces and everyone has a solution of their own. But your organization uses a proprietary solution from some company which charges $X yearly as a licensing fees which is too much for that product.

To avoid heavy charges and dependency for this product, company decided to replace that system and wants to use open source tools or libraries.

So far so good.

You started developing the solution for all ATMs using open source libraries and tools. And after some time you developed the solution which scale well to distributed environment and provides forecast.

Now time has come to compare forecast with actual and also with existing solutions lets say on for month data. These are some points that need to be consider

  1. What is the metric you will compare the results on? will it be RMSE(root means square error), MAE (mean absolute error)or MAPE (mean absolute percentage error).
  2. Now for some days your open source model will give good result in terms of metric that you might have chosen but at the same time existing solution will give better results for some days. How collectively you can say which one is better?
  3. To check the robustness of model compare the results of model son special events like public holidays, new year, US public holidays, Diwali, Christmas etc. Because these are the days where abnormal pattern are generally observed and it is good chance to see how your model behaves in these extreme events.
  4. Can you divide the days into peak days i.e. where your problem has much impact on business and non peak days and check the performance of the model?
  5. How many under predictions and how many over predictions are there for each model? Are over predictions are acceptable by business or under predictions at acceptable to your problem?
  6. If under prediction/ over-prediction are acceptable, then by what magnitude?
  7. What about the case where due to one or two very high or very low prediction whole month metric (i.e. MAPE) goes very high, but if you remove these outliers from comparisons forecast is close to actual.

Suppose you come up with all the answers to the above questions, then how to come up with one metric that combines all the above points?

Remember we are talking this only for 1 ATM machine. What about once we will consider let say 1000s of machines? And how about different denominations like $5, $10 and $100 that a machine can have.

Consider yourself as a business person who has an authority to take decision to decommission the existing solution and start using the solution that you developed. Before taking any decision you want to check how overall new solution is behaving when compared to actual and existing solution because very high risk is involved if you take decision without considering all these factors.

So the point here is in this type of scenario all these type of issues needs to be considered and it is very hard to say about quality of forecast considering only few of above points. But again it become even more difficult once you start considering all the points to say about quality of forecast.

--

--