Time series forecasting is the use of statistical methods to predict future behavior based on historical data.
This is similar to other statistical learning approaches, such as supervised or unsupervised learning. However, time series forecasting has many nuances that make it different from regular machine learning. From data processing all the way to model validation, time series forecasting is a different beast.
Many companies are exploring time series forecasting as a way of making better business decisions. Take a hotel as an example. If a manager has a good idea of how many hosts to expect next summer, they can use these insights to plan for staff management, budget, or even a facility expansion. Likewise, confident insights for future events can benefit a wide range of industries and problems, from traditional agriculture to on-demand transportation and more.
In this article, we explore the fundamentals of time series data. We talk about how very simple forecasting methods work. Plus, we describe the most common patterns found in time series data.
Time Series Data
A time series is a sequence of data points recorded through time.
Thus, when dealing with time series data, order matters. Specifically, values in a time series express a dependency on time. Consequently, if we change the order of a time series, we may change the meaning of the data.
Usually, time series data have two important properties.
- Data is measured sequentially and equally spaced in time.
- Each time unit has at most one data measurement.
In addition, when doing time series forecasting, we usually have two goals.
- First, we want to identify patterns that explain the behavior of the time series.
- Second, we want to use these patterns to forecast (predict) new values.
Simple Forecast Methods
Time series forecasting has a rich family of algorithms. Some of the most basic ones include:
- Average Method
- Moving Average Method
- Naive Method
These algorithms are simple to understand. Each takes a different assumption to predict new values.
The Average Method assumes that a future event is best described by the average of all past events.
The Moving Average method builds on simple Average Methods. Instead of using the average of all past events, it predicts a new event as the average over a predefined number of recent values.
Lastly, the Naive Method assumes that the next event will be equal to the most recent one.
However, with such simple methods, you certainly won’t get good forecasting. Hence, your business decisions might turn out bad more frequently than you expected.
These methods don’t account for many fluctuations that are usually present in time series data. Then, the question really is, how can we do better? To answer question, we need to first understand the main patterns usually we usually find in time series.
Time Series Patterns
Most time series data usually have at least one of these three kinds of patterns: trend, seasonality, and/or cycles. Let’s briefly describe each one.
The trend describes the general behavior of a time series. If a time series manifests a positive long-term slope over time, it has an upward trend. If instead, it describes a general negative slope, it has a downtrend.
The overall trend may also change direction. There can be an up-to-down trend or a down-to-up trend. Lastly, a stationary or horizontal trend defines a time series with neither positive nor negative long-term patterns.
A seasonal pattern is any kind of fluctuation (change) in a time series that is caused by calendar-related events.
These events can be the time of year (like winter or summer), or the time of day or the week. Seasonality always has fixed frequencies. That is, a seasonal pattern always starts and ends in the same period of a week, year, etc.
Take a data center as an example. If we consider the cooling system as the primary source of energy consumption, it is easy to imagine that in the summer, energy costs probably go up, while winter might show a decrease in energy consumption.
Also, a clothing store that sells heavy coats might observe higher selling rates during winter, as opposed to the summer.
Lastly, a cyclical pattern in a time series is a kind of change that is not related to seasonal factors. These are rises and falls with non-fixed magnitudes that can last for more than a calendar year. Cyclical patterns are not repetitive. Usually, they result from external factors which make them much harder to predict.
Forecasting methods usually take advantage of these patterns to produce reliable predictions.
Below, you can see time series data for Sales of new single-family houses in the USA. Note the strong seasonality. House sales are normally slow at the beginning of the year. Peaks occur around the months of June and July.
Also, there are strong cycles that might range from six to ten years. Remember, cycles do not have fixed periods.
It is important to note that not all time series are predictable. More specifically, some of them present no predictable patterns in the long term. Such time series are difficult, if not impossible, to forecast since future movements are equally likely to be up or down.
To forecast this kind of data, we usually use the random walk model. This model assumes that the next event is completely uncorrelated from the previous one. Hence, forecasts from a random walk model are equal to the last observation plus some noise. Random walk models are typically used with financial and economic data.
Trend, seasonality, and cycles are by far the most common patterns in time series data. Knowing what they are and their characteristics are essential tools for any analyst toolbox.
Once faced with a time series analysis task, the next step is to identify how each of these patterns behaves. Indeed, most classic forecasting methods require the analyst to specify how trend and seasonality should be applied. To answer this question, one usually performs a time series decomposition analysis. In our next post, we go over, step-by-step, how different kinds of decomposition work.
By Thalles Silva, as part of Daitan’s Research on multiple aspects of AI&ML. Thanks to the team working on time series forecasting PoCs and demos: Bruno Schionato, Diego Domingos, Fernando Moraes, Gustavo Rozato, Isac Souza, Marcelo Mergulhão and Marciano Nardi.