Best Practices For Bulletproof Time Series Modeling

Thomas Oriol
Datapred
Published in
3 min readMay 13, 2019

Our goal in this post is to discuss Datapred’s standard strategy (beyond respecting basic time series modeling principles) for building accurate predictive models. We will use the example of commodity procurement optimization.

Tip 1: Formalize your operational objective and target it directly

Machine learning always works better when targeting the real industrial objective, not a proxy.

If you are managing a grain mill, your real operational question is not: “What will be the price of wheat in four weeks?”. It is probably closer to: “How should I plan my wheat purchase orders over the next 4 weeks?”.

To answer the first question, backtesting a standard L1 or L2 regression error will be fine. But to answer the second question, you must backtest, and thus first formalize, the relevant loss function over the corresponding period.

Those are very different machine learning problems, yielding different solutions — the second solution being operationally superior to the first. Implementing it requires extended talks with business experts — one of the reasons why auto-ML doesn’t work for real industrial applications.

Tip 2: Compute the relevance of each explanatory variable and model parameter

Superior modeling lets you display the influence of explanatory variables and model parameters (e.g. training window size, prediction horizon) over time.

Your commodity procurement optimization solution could use a sequential and linear combination of multiple predictive models, where each model is specific to: (i) a group of homogeneous variables (e.g. commodity prices, weather forecasts), and (ii) a structuring parameter value (e.g. training window = 1 day, 1 week, 2 weeks).

The relative weight of each model in the combination thus stands for the influence of the corresponding group of variables or parameter value, with the following benefits:

  • Domain experts can check that your solution matches their operational reality (always good for adoption).
  • If your solution underperforms, understanding why is easier, and iterating on potential remedies faster.
  • It helps you reduce the number of variables and parameters in your solution, which increases its reactivity and robustness and accelerates the modeling cycle.

You could realize that a short rolling training window is best for optimizing your loss function, meaning that for those variables, recent observations are more relevant.

Tip 3: Put pressure on your model

You know the famous quote about unknown unknowns:

https://youtu.be/GiPe1OiKQuk

Industrial life is full of unknown unknowns. By definition, they are not in your historical data, so the only way to prepare for them is to watch how your model reacts to extreme values or totally new circumstances.

Practically, this means you should backtest your model with varying variables, parameters and operational costs/constraints.

Datapred data scientists use two types of tests for unknown unknowns:

  • Robustness tests, where they measure model performance in willingly adverse or plain wrong conditions.
  • Simulation tests, where they assess model performance under neutral, but new conditions.

You could assume that purchase orders based on your commodity procurement solution are super slow, and check if model performance holds up (robustness test). You could also enter new values for a key explanatory variable, and ask domain experts if the corresponding results are realistic.

***

Visit the Datapred Blog for more posts on machine learning, streaming data and continuous intelligence applications, and this page for an extensive list of online resources on machine learning and time series.

Originally published at https://www.datapred.com.

--

--