The future of the S&P 500 — Forecasts with time series modeling in Alteryx
What do finance, economics, meteorology and polemology have in common? Stock market prices, unemployment rates, temperature developments and dyadic conflicts can all be analyzed with the help of time series. One goal of time series analysis is based on historical developments to be able to make forecasts about the future course of a numerical quantity.
In this blog, I perform a time series analysis using the S&P500 data set. I will start with a visual analysis of the time series. Then I will create an ARIMA and an ETS model with the appropriate tools and compare which model provides more accurate forecasts. After deciding on a model, I will show how to create forecasts.
For my analysis I use the development of the stock market index S&P500. This can be downloaded from the website of the Federal Reserve Bank of St. Louis. See the references at the end of the blog for a more specific citation. I also use Alteryx exclusively for my analysis.
The data set looks like this:
The “SP500” column returns the average monthly value of the S&P 500. In total, data is available for 121 months — From April 2012 to April 2022.
The Time Series Tool Palette from Alteryx
The Time Series Tool Palette from Alteryx is composed of the following tools:
The ARIMA and the ETS tool create the corresponding models, which can be compared using precision measures with the TS Compare Tool. For model estimation, I do not include covariates, so I will not discuss the TS Covariate Forecast Tool. Since the TS Filler tool is used for data preparation and the data is already prepared, we will not discuss this tool in detail either. After estimating and comparing the models, we will use the TS Forecast Tool to create our forecasts for the S&P500. But before we even create a model, we should get a visual overview of the time series, for that the TS Plot Tool helps.
Visual analysis of a time series — The TS Plot Tool
In the TS Plot Tool configuration window, we select the target variable, the captured frequency at which the target variable was measured, and we optionally select when our time series begins. If we do not do the latter, then Alteryx plots in periods. Under plot type we can specify the type of plot to be output in the R output of the tool. All plot types will be shown in the I output.
Below, I look at some of these plots and analyze their progress:
The S&P 500 has been rising steadily since 2012. Although there were downward outliers in 2016, 2018 and 2020, a rising trend is clearly visible. Since you can obviously see a trend, the time series is not stationary.
A Seasonality Plot is similar to a Time Series Plot except that in a Seasonality Plot, the S&P 500 is plotted for the individual months. Each year in the data set represents a line. A Seasonality Plot can be used to find seasonal patterns and regularities in the data, as well as to identify years that do not follow this pattern.
We can glean two key pieces of information from this Seasonality Plot:
1. Seasonal patterns
There are many years where seasonal patterns can be seen. Particularly clear are the slight peaks in March, June and August.
2. Years with deviating pattern
We could already see in the Time Series Plot that the years 2016, 2018 and 2020 had downward outliers. Only in the Seasonality Plot can we see that 2016 was a year characterized by growth and that the outlier can be explained by the poor finish in 2015. If you want to learn what caused these fluctuations in 2015, just click here.
The year 2018 was stable until October, you could even say that it followed the seasonal pattern. But then the 2018 index still fell below the closing value of the previous year. The year 2020 also seemed to follow a stable rising course after a short-term crash from March to May.
The ACF and PACF have an interesting course: While the ACF is geometrically decreasing, the PACF drops abruptly to a value that is no longer significantly different from 0. This indicates that it is a non-stationary time series, which could be represented with an AR(1) model.
The decomposition of the time series into a seasonal component, a trend component and a remainder clearly shows that cyclical and trend phenomena are observable. In addition, events that influence the development of the S&P 500, which fall out of the previous pattern, seem to accumulate from 2018 onward.
From the visual analysis we can summarize the following conclusions:
· It is a non-stationary time series. This is indicated by the time series plot, the ACF and the PACF.
· The seasonality plot and the time series decomposition plot indicate the influence of seasonal effects.
· The seasonality plot and the time series decomposition plot allow us to infer events that have been accumulating since 2018 and have fallen out of previous patterns, yet still have an impact on the performance of the S&P 500.
After the initial insights and findings from the visual analysis, we can now move on to the creation of our first models.
Modeling — ARIMA and ETS
First, we need to split the dataset into a training dataset and a test dataset. While the training dataset is used to create the model, we need the test dataset to evaluate the created model. For the modeling, I use the prepared data set of the S&P 500 as input.
I split the data set with the filter tool. The training data contains all observations up to and including the year 2021. I summarize all observations that occurred in 2022 under the test data set.
From the 121 rows in our dataset, we remove the four rows for each month of 2022, leaving 117 rows for the training dataset. Even though this is a small test data set, my basic point is to show how to set up models in Alteryx and test them.
In the next step, we set up the ARIMA model. To do this, we drag the ARIMA tool onto the canvas and connect it to the T-output of the filter tool to pass the training data.
The configuration window of the ARIMA tool has four tabs where you can make changes to the settings: Required parameters, Model customization (optional), Other options, Graphics Options. However, only the “Required parameters” and “Other options” tabs are really important. Model customization is useful if you want to apply constraints to the estimated model or define a model manually. “Graphic options” only provides the option to make graphical adjustments to the output plots.
In “Required parameters” you first give the model a name and select a target variable. Then you select the frequency in which the data are available. In “Other Options”, as in the TS Plot Tool, you define the start of the time series and also the number of periods to be included in the forecast plot.
The ARIMA tool has three output anchors: The O-Output contains the model, which we need later for the TS Compare tool. The R-Output contains the estimated model, estimated coefficients, various information criteria, and accuracy measures to compare the model to other models down the road. The I-Output contains the following dashboard:
On this dashboard, one can interactively select the precision measures, see the history of the actual data and the model, and see the predicted values and the predicted confidence interval. In addition, the course of the ACF and the PACF are shown on the right.
We have created our first time series model. Now we want to add a second model to our workflow. To do this, we drag the ETS tool onto the canvas and connect the T-Output of the Filter tool to the ETS tool.
While the “Required parameters” has the same settings as the ARIMA tool, under “Other options” we can select the information criterion and perform a Box-Cox transformation.
Since the output of the ETS tool is analogous to the output of the ARIMA tool and I will discuss some components of the output in the next section, I will refrain from a detailed analysis of the output here.
Comparison of two models — The TS Compare Tool
In contrast to the other time series tools, the TS Compare Tool does not require any major settings. If you want to make graphical adjustments to the plots, you can do this in the configuration window.
This tool has two input anchors: The L-Input takes all models that you want to compare. It is important to combine the O-Output of the ARIMA and the ETS tool via a union tool.
The O-Output of the ARIMA and the ETS tool can be thought of as the two two-column tables on the left. The content of the “Object” column represents the model. By connecting it to the Union Tool, a table like the one on the right comes out. This then forms the L-Input of the TS Compare Tool.
In order to check, how good the two models are, we have to compare how the forecasts of the models differ from the actual values. This is where our test data set comes into play! We connect the F-output of our filter tool with the R-output of the TS Compare tool and get the following workflow:
The O-Output contains a table with the precision measures, which allows to check how accurate the forecasts of both models are. The I-Output gives the forecast plot of both models. Both the content from the O-Output and the content from the I-Output are combined in the R-Output. Let’s take a closer look at the results of the R-Output.
We can clearly see that the absolute values of the accuracy measures of the ARIMA model are all smaller than those of the ETS model. Also, when comparing the predicted values of the two models with the actual values, we can see that the ARIMA model also captures declines, while the ETS tool predicts steadily higher values. These indicators suggest that the ARIMA model is a more appropriate model for forecasting the S&P 500 over this period.
Now that we have decided on a model, all other tools, that aren’t useful anymore, can be removed from the workflow. The workflow we use for the forecast is as follows:
Forecast — TS Forecast Tool
To create a forecast, we add the TS Forecast Tool to the workflow and connect the O-Output of the ARIMA Tool to the Input-anchor of the TS Forecast Tool.
First, we give the forecast a name and then determine two confidence intervals, which are forecasted in addition to the point estimate. Finally, the number of periods to be forecast is selected.
The O- and I-output are again combined in the R-output. The following values are the forecasts of the ARIMA model for the 7 following periods of the training data set.
With the respective values, we get an exact overview of how the S&P 500 could develop. Furthermore, we could compare the first four periods with the values of the test data set and check how good the forecasts of the confidence intervals are.
The darker area shows the 95% confidence interval forecast, while the light gray area shows the 80% confidence interval.
I hope I was able to give you a good insight into time series analysis with Alteryx. If you liked my blog, feel free to subscribe. If you have ideas for more blogs around data analytics or feedback, feel free to write me on LinkedIn.
S&P Dow Jones Indices LLC, S&P 500 [SP500], retrieved from FRED, Federal Reserve Bank of St. Louis; https://fred.stlouisfed.org/series/SP500, April 21, 2022.