Interpreting ACF or Auto-correlation plot

Dipanwita Mallick
Analytics Vidhya
Published in
4 min readNov 25, 2020

Time series is linearly related to a lagged version of itself.

image courtesy

What is ACF plot ?

A time series is a sequence of measurements of the same variable(s) made over time. Usually, the measurements are made at evenly spaced times — for example, monthly or yearly. The coefficient of correlation between two values in a time series is called the autocorrelation function (ACF). In other words,

>Autocorrelation represents the degree of similarity between a given time series and a lagged version of itself over successive time intervals.

>Autocorrelation measures the relationship between a variable’s current value and its past values.

>An autocorrelation of +1 represents a perfect positive correlation, while an autocorrelation of negative 1 represents a perfect negative correlation.

Why useful ?

  1. Help us uncover hidden patterns in our data and help us select the correct forecasting methods.
  2. Help identify seasonality in our time series data.
  3. Analyzing the autocorrelation function (ACF) and partial autocorrelation function (PACF) in conjunction is necessary for selecting the appropriate ARIMA model for any time series prediction.

Any assumption made by ACF ?

Weak stationary — meaning no systematic change in the mean, variance, and no systematic fluctuation.

So when performing ACF it is advisable to remove any trend present in the data and to make sure the data is stationary.

Try out with a real dataset?

data = pd.read_csv('data.csv',
engine='python',parse_dates=[0],
index_col = 'Time',
date_parser = parser)

st_date = pd.to_datetime("2008-01-01")
data = data[st_date:]
data.head()

The plot of the data looks like below:

from 2008 to 2018

Now, before performing the ACF let’s remove the trend and see how it looks like:

#acf -> remove trend 
data["diff"] = data.diff()

ax = data.plot()
ax.legend(ncol=5,
loc='upper center',
bbox_to_anchor=(0.5, 1.0),
bbox_transform=plt.gcf().transFigure)
for yr in range(2008, 2018):
ax.axvline(pd.to_datetime(str(yr)+"-01-01"), color ="red", linestyle = "--", alpha = 0.2)
the trend is removed using diff() method from pandas, that finds the differences between last month and the current month values

Now let’s apply the ACF:

from statsmodels.graphics.tsaplots import plot_acf
data["diff"].iloc[0] = 0
plot_acf(data["diff"])
plt.show()
x axis>lag in months, y axis>correlation coefficient

Can you see the seasonality present?

Notice how the coefficient is high at lag 3, 6,9,12. In terms of the month if I have to say then, high positive correlations for March, June, September, December, whereas Jan, Feb and April have negative correlations but that too vanishes with lag. We will focus on the points that lie beyond the blue region as they signify strong statistical significance.

Important note: make sure your data doesn’t have NA values, otherwise the ACF will fail.

Can we look at the trend and seasonality separately to dive deep into the data?

Yes, let's decompose the data. I am going to use a stats model API for this purpose but one can use NumPy and Pandas as well to decompose the three parts of a time series -trend, seasonality, residual.

from statsmodels.tsa.seasonal import seasonal_decompose

res = seasonal_decompose(data, model = "additive",period = 30)

fig, (ax1,ax2,ax3) = plt.subplots(3,1, figsize=(15,8))
res.trend.plot(ax=ax1,ylabel = "trend")
res.resid.plot(ax=ax2,ylabel = "seasoanlity")
res.seasonal.plot(ax=ax3,ylabel = "residual")
plt.show()

Notice how I chose additive instead of multiplicative since there is no exponential increase in the amplitudes over time.

Now if I run the same ACF plot on the res.seasonal component generated by the API we will get the same coefficients as before.

--

--

Dipanwita Mallick
Analytics Vidhya

I am working as a Senior Data Scientist at Hewlett Packard Enterprise. I love exploring new ideas and new places !! :)