Interpreting ACF or Auto-correlation plot

Dipanwita Mallick

Published in

Analytics Vidhya

4 min readNov 25, 2020

Time series is linearly related to a lagged version of itself.

What is ACF plot ?

A time series is a sequence of measurements of the same variable(s) made over time. Usually, the measurements are made at evenly spaced times — for example, monthly or yearly. The coefficient of correlation between two values in a time series is called the autocorrelation function (ACF). In other words,

>Autocorrelation represents the degree of similarity between a given time series and a lagged version of itself over successive time intervals.

>Autocorrelation measures the relationship between a variable’s current value and its past values.

>An autocorrelation of +1 represents a perfect positive correlation, while an autocorrelation of negative 1 represents a perfect negative correlation.

Why useful ?

Help us uncover hidden patterns in our data and help us select the correct forecasting methods.
Help identify seasonality in our time series data.
Analyzing the autocorrelation function (ACF) and partial autocorrelation function (PACF) in conjunction is necessary for selecting the appropriate ARIMA model for any time series prediction.

Any assumption made by ACF ?

Weak stationary — meaning no systematic change in the mean, variance, and no systematic fluctuation.

So when performing ACF it is advisable to remove any trend present in the data and to make sure the data is stationary.

Try out with a real dataset?

data = pd.read_csv('data.csv',
                   engine='python',parse_dates=[0], 
                   index_col = 'Time', 
                  date_parser = parser)

st_date = pd.to_datetime("2008-01-01")
data = data[st_date:]

The plot of the data looks like below:

Now, before performing the ACF let’s remove the trend and see how it looks like:

#acf -> remove trend 
data["diff"] = data.diff()

ax = data.plot()
ax.legend(ncol=5, 
          loc='upper center',
          bbox_to_anchor=(0.5, 1.0),
          bbox_transform=plt.gcf().transFigure)
for yr in range(2008, 2018):
    ax.axvline(pd.to_datetime(str(yr)+"-01-01"), color ="red", linestyle = "--", alpha = 0.2)

the trend is removed using diff() method from pandas, that finds the differences between last month and the current month values

Now let’s apply the ACF:

from statsmodels.graphics.tsaplots import plot_acf
data["diff"].iloc[0] = 0
plot_acf(data["diff"])
plt.show()

x axis>lag in months, y axis>correlation coefficient

Can you see the seasonality present?

Notice how the coefficient is high at lag 3, 6,9,12. In terms of the month if I have to say then, high positive correlations for March, June, September, December, whereas Jan, Feb and April have negative correlations but that too vanishes with lag. We will focus on the points that lie beyond the blue region as they signify strong statistical significance.

Important note: make sure your data doesn’t have NA values, otherwise the ACF will fail.

Can we look at the trend and seasonality separately to dive deep into the data?

Yes, let's decompose the data. I am going to use a stats model API for this purpose but one can use NumPy and Pandas as well to decompose the three parts of a time series -trend, seasonality, residual.

from statsmodels.tsa.seasonal import seasonal_decompose

res = seasonal_decompose(data, model = "additive",period = 30)

fig, (ax1,ax2,ax3) = plt.subplots(3,1, figsize=(15,8))
res.trend.plot(ax=ax1,ylabel = "trend")
res.resid.plot(ax=ax2,ylabel = "seasoanlity")
res.seasonal.plot(ax=ax3,ylabel = "residual")
plt.show()

Notice how I chose additive instead of multiplicative since there is no exponential increase in the amplitudes over time.

Now if I run the same ACF plot on the res.seasonal component generated by the API we will get the same coefficients as before.

I hope this is helpful. Time series analysis can be confusing and time taking. So, it’s imperative to have fundamental concepts clear. I myself am in the process of learning. So, before you go do leave a comment or your valuable feedback. :)

References:

Autocorrelation in Time Series Data - DZone AI

A time series is a series of data points indexed in time. The fact that time series data is ordered makes it unique in…

dzone.com

https://github.com/bhattbhavesh91/time-series-decomposition-from-scratch/blob/master/time-series-decomposition-from-scratch.ipynb

https://github.com/ritvikmath/Time-Series-Analysis/blob/master/Time%20Series%20Data.ipynb

Interpreting ACF or Auto-correlation plot

Autocorrelation in Time Series Data - DZone AI

A time series is a series of data points indexed in time. The fact that time series data is ordered makes it unique in…

Written by Dipanwita Mallick