“Forecasting is the art of saying what will happen, and then explaining why it didn’t!”
Every day we encounter the situation where we predict the future outcomes, or we want that we could have known this earlier. Forecasting is nothing new but the old concept of getting to guess future based on previous learnings. In this blog series, I will try to capture some of the basic concepts related to Time Series Analysis and Modelling. As this forecasting and past behaviour, all revolves around time; it is also known as Time Series Analytics.
In the first blog of this series, I will discuss some of the basic concepts and techniques used in TS Analysis. Lets us start with the use cases first. Below are some of the most common use cases of TS.
- Economic Outlook
- Sales Forecasting
- Inventory planning
- Workforce planning
- Weather forecasting
- Traffic/crowd forecasting
There are two main types of forecasting.
a. Qualitative Forecasting: Used when data is not available or historical pattern does not repeat. This is generally done based on the expert’s view and is generally biased. An example is Delphi method, where experts sit together and make consensus, based on their discussion and feedbacks.
b. Quantitative Forecasting: This is based on data and repeating pattern on historical data. This type of forecasting can capture complex patterns which may not look obvious. As this is based on data, thus is not biased. An example of such forecasting is Time Series Forecasting.
In this blog series, we are going to discuss Quantitative forecasting. Let’s understand a few basic concepts related to Quantitative Forecasting.
- Time Series Data: Any data which involved time component, e.g. temperature on each day or stock price each hour
- Time Series Analysis: This is the analysis performed on TS data to get meaningful insights from it.
- Time Series Forecasting: It refers to the process of forecasting future looking at past data behaviour.
Let's understand a few terminologies used in forecasting.
Goal: It is the objective set by the business for forecasting. For example: maximize profit, optimize resources etc.
Plan: It is a set of activities that the business takes to achieve their goal.
Forecast: Prediction of future keeping business goal in mind.
Defining clear goals are the key in forecasting. There are some caveats associated with time series forecasting which revolves around steps you take while defining the problem statement.
- Granularity: More aggregated your forecast is more accurate you will be. Its simply because aggregated data has less variance and thus less noise. For example: If we want to predict airline passengers for next month, then forecasting the total number of travellers next month will be more accurate than forecasting travellers on a specific route. Again this is all derived by business requirement.
- Frequency: How frequent you want to update your forecasts to keep them relevant. As time passes, we add more information (maybe new information) which needs to be incorporated to keep predictions relevant. Let's say we want to forecast the number of TV views and frequency of updating forecast is 3 months. Due to COVID-19, people are locked in their houses for 2–3 months and has increased TV views significantly during this time. We might miss this opportunity because the frequency of forecast update is more than event duration.
- Horizon: Forecasts in earlier time frame are more accurate than far future. Say we are forecasting for the next 6 months sales, it will be more accurate in the first few months as compared to later months in future.
Time Series Analysis
Time series data is analysed to extract meaningful insight from it. This data exhibits a few characteristics:
- Level: Also known as the baseline value. This is the value which we add to all other components.
- Trend: Over the long term, TS shows some pattern of going upward or downward is known as a trend. There may be local fluctuations, but the overall sense is in some direction.
- Seasonality: It is the pattern in data which repeats over a time period. Say sales of mobile phones is higher every year during the Christmas holidays.
- Cyclicity: This is also a repetitive pattern but not periodic.
- Noise: It is the completely random fluctuations present in data, and we can not use this component to forecast into the future.
Handling Missing Values in Time Series Data:
Like any other data, TS data also has a missing value. Below are some of the techniques used for TS missing data imputation.
- Mean Value Imputation: Fill missing values with the mean of the data. But the problem is that missing value imputation does not consider the temporal nature of the data.
2. Last value forward: Fill the missing value with the last known observation.
3. Linear Interpolation: Do the regression between two missing pieces of information. (connect last known observation to current one)
4. Seasonal and Linear Interpolation: Consider trend and seasonality while doing imputation.
Outlier Detection and Treatment:
Time series data may also have outliers present. These observations need to be detected and treated. The common methods for outlier detection are
- Extreme value Detection and Removal
- Use the box plot or histogram to identify outliers.
There are many strategies for removal of outliers. Some are listed below:
- Impute outliers with a mean/median/mode
- Lower and upper-value capping
- Zero capping: Some observations can never be negative, e.g. Items sold
In this article so far, we have understood what the time series data is and how can we perform basic EDA on time series data. In the next couple of blogs in this series, we will look below topics:
- Topic 2: Time Series Decomposition and Error measurement
- Topic 3: Basic Forecast methods
- Topic 4: Autoregressive models
If you want to read other topics related to AI or AI Project Management, below are other articles: