Introduction to feature engineering for time series forecasting
By Francesca Lazzeri. This article is an extract from the book Machine Learning for Time Series Forecasting with Python, also by Lazzeri, published by Wiley.
Applying Python packages and Machine Learning to accelerate forecasts enables the scalability, performance, and accuracy of intelligent solutions that can improve business operations. At the same time, building Machine Learning (ML) models is often time consuming and complex with many factors to consider, such as iterating through algorithms, tuning ML hyperparameters, and applying feature engineering techniques. These options further multiply with time series data as data scientists must then also consider additional factors, such as trends, seasonality, holidays, and external economic variables.
Each ML algorithm expects data as input that must be formatted in a specific way, and so time series datasets generally require some cleaning and feature engineering processes before they can generate useful insights. Time series datasets may have values that are missing or may contain outliers, hence the essential need for the data preparation and cleaning phase. Good time series data preparation produces clean and well-curated data, which leads to more practical and accurate predictions.
Data preparation is the practice of transforming raw data so that data scientists can run it through ML algorithms to discover insights and, eventually, make predictions. Additionally, because time series data has a temporal property, only specific statistical methodologies are appropriate for data prepared in this way. In this article, I walk you through the most important steps to prepare your time series data for forecasting models.
In working with time series, data scientists must construct the output of their model by identifying the variable that they need to predict at a future date (e.g., future number of sales next Monday) and then leverage historical data and feature engineering to create input variables that are used to make predictions for that future date. Feature engineering efforts mainly have two goals:
- Creating the correct input dataset to feed the ML algorithm: In this case, the purpose of feature engineering in time series forecasting is to create input features from historical row data and shape the dataset as a supervised learning problem.
- Increasing the performance of ML models: The second most important goal of feature engineering is about generating valid relationships between input features and the output feature or target variable to be predicted. In this way, the performance of ML models can be improved.
In the sections to follow, I cover four categories of time features that are extremely helpful in time series scenarios:
- Date time features
- Lag features and window features
- Rolling window statistics
- Expanding window statistics
I discuss each of these time features in more detail, including explaining them with real-word examples in Python.
Date time features
Date time features are features created from the time stamp value of each observation. A few examples of these features are the integer hour, month, and day of week for each observation. Data scientists can perform transformations of these date time features using pandas and adding new columns (hour, month, and day of week columns) to their original dataset where hour, month, and day of week information is extracted from the timestamp value for each observation.
Below is some sample Python code to perform this with the ts_data set:
ts_data['hour'] = [ts_data.index[i].hour for i in range(len(ts_data))]ts_data['month'] = [ts_data.index[i].month for i in range(len(ts_data))]ts_data['dayofweek'] = [ts_data.index[i].day for i in range(len(ts_data))]
Running this example prints the first five rows of the transformed dataset:
load temp hour month dayofweek
2012-01-01 00:00:00 2,698.00 32.00 0 1 1
2012-01-01 01:00:00 2,558.00 32.67 1 1 1
2012-01-01 02:00:00 2,444.00 30.00 2 1 1
2012-01-01 03:00:00 2,402.00 31.00 3 1 1
2012-01-01 04:00:00 2,403.00 32.00 4 1 1
By leveraging this additional knowledge, such as hour, month, and day of week values, data scientists can gain additional insights on their data and on the relationship between the input features and the output feature and eventually build a better model for their time series forecasting solutions. Here are some other examples of features that can be built and generate additional and important information:
- Weekend or not
- Minutes in a day
- Daylight savings or not
- Public holiday or not
- Quarter of the year
- Hour of day
- Season of the year
As you can observe from the examples above, date time features are not limited to integer values only. Data scientists can also build binary features, such as a feature in which, if the time stamp information is before business hours, its value equals 1, and if the time stamp information is after business hours, its value equals 0. Finally, when dealing with time series data, it is important to remember all date and time properties that you can access from Timestamp or DatetimeIndex.
Date time features represent a useful way for data scientists to start their feature engineering work with time series data. In the next section, I introduce an additional approach to build input features for your dataset: lag and window features. In order to build these features, data scientists must leverage and extract the values of a series in previous or future periods.
Lag features and window features
Lag features are values at prior timesteps that are considered useful because they are created on the assumption that what happened in the past can influence or contain a sort of intrinsic information about the future. For example, it can be beneficial to generate features for sales that happened in previous days at 4:00 p.m. if you want to predict similar sales at 4:00 p.m. the next day.
An interesting category of lag features is called nested lag features. In order to create nested lag features, data scientists must identify a fixed time period in the past and then group feature values by that time period — for example, the number of items sold in the previous two hours, previous three days, and previous week.
The pandas library provides the shift() function to help create these shifted or lag features from a time series data set: This function shifts an index by the desired number of periods with an optional time frequency. The shift method accepts a freq argument which can accept a DateOffset class or a timedelta-like object or also an offset alias.
Offset alias is an important concept that data scientists can leverage when they deal with time series data because it represents the number of string aliases that are given to useful common time series frequencies, as summarized in the Table 1:
The operation of adding lag features is called the sliding window method or window features. The example above shows how to apply a sliding window method with a window width of eight. Window Features are a summary of values over a fixed window of prior timesteps.
Depending on your time series scenario, you can expand the window width and include more lagged features. A common question that data scientists ask before performing the operation of adding lag features is how large to make the window. A good approach would be to build a series of different window widths and alternatively add and remove them from the dataset to see which one has a more evident positive effect on model performance.
Understanding the sliding method is very helpful to building an additional feature method called rolling window statistics, which I discuss in the next section.
Rolling window statistics
The main goal of building and using rolling window statistics in a time series dataset is to compute statistics on the values from a given data sample by defining a range that includes the sample itself as well as some specified number of samples before and after the sample used.
A crucial step when data scientists need to compute rolling statistics is to define a rolling window of observations: At each time point, data scientists must obtain the observations in the rolling window and use them to compute the statistic they have decided to use. In the second step, they must move on to the next time point and repeat the same computation on the next window of observations.
One of the more popular rolling statistics is the moving average. This takes a moving window of time and calculates the average or the mean of that time period as the current value. Something pandas provides is a rolling() function that provides rolling window calculations, and it creates a new data structure with the window of values at each timestep. It is then possible to perform statistical functions on the window of values collected for each timestep, such as calculating the mean.
Data scientists can use the concat() function in pandas to construct a new data set with only new columns. This function concatenates pandas objects along a particular axis with an optional set logic along the other axes. It can also add a layer of hierarchical indexing on the concatenation axis, which may be useful if the labels are the same (or overlapping) on the passed axis number.
Another type of window feature that may be useful in time series forecasting scenarios is the expanding window statistics feature, which includes all previous data in the series. I discuss and show how to build it in the next section.
Expanding window statistics
Expanding window statistics consist of features that include all previous data. Something pandas offers is an expanding() function that provides expanding transformations and assembles sets of all prior values for each timestep. Python offers the same interface and capabilities for the rolling() and expanding() functions.
Below is an example of calculating the minimum, mean, and maximum values of the expanding window on the ts_data set. The ts_data set is an hourly energy demand data set that consists of three years of hourly electricity load and temperature values between 2012 and 2014.
# create expanding window features
from pandas import concatload_val = ts_data[['load']] window = load_val.expanding() new_dataframe = concat([window.min(), window.mean(), window.max(), load_val. shift(-1)], axis=1) new_dataframe.columns = ['min', 'mean', 'max', 'load+1']print(new_dataframe.head(10))
Running the example prints the first ten rows of the new dataset with the additional expanding window features:
min mean max load+1
2012-01-01 00:00:00 2,698.00 2,698.00 2,698.00 2,558.00
2012-01-01 01:00:00 2,558.00 2,628.00 2,698.00 2,444.00
2012-01-01 02:00:00 2,444.00 2,566.67 2,698.00 2,402.00
2012-01-01 03:00:00 2,402.00 2,525.50 2,698.00 2,403.00
2012-01-01 04:00:00 2,402.00 2,501.00 2,698.00 2,453.00
2012-01-01 05:00:00 2,402.00 2,493.00 2,698.00 2,560.00
2012-01-01 06:00:00 2,402.00 2,502.57 2,698.00 2,719.00
2012-01-01 07:00:00 2,402.00 2,529.62 2,719.00 2,916.00
2012-01-01 08:00:00 2,402.00 2,572.56 2,916.00 3,105.00
2012-01-01 09:00:00 2,402.00 2,625.80 3,105.00 3,174.00
In this article, I have shown how to use feature engineering to transform a time series dataset into a supervised learning dataset for use with Machine Learning and to improve the performance of ML models. Feature engineering is the process of using historical row data to create additional variables and features for the final dataset used for training a model.
In my next article, “Automated Machine Learning for time series forecasting,” I demonstrate a suite of automated methods for time series forecasting that you can test on cloud-based time series solutions.
- Francesca Lazzeri, Machine Learning for Time Series Forecasting with Python, Wiley, December 2020.
- Francesca Lazzeri, Automated Machine Learning for time series forecasting, Data Science at Microsoft on Medium, October 2021.
- Francesca Lazzeri, Python open source libraries for scaling time series forecasting solutions, Data Science at Microsoft on Medium, November 2021.