Sample size and time series models — A case study on ARIMA() processes.

Konrad Hoppe
Nerd For Tech
Published in
6 min readApr 24, 2021


Goals and contents

ARIMA timeseries models are often taught in econometrics courses as part of the regular business science curriculum and are thus put to use by sometimes inexperienced data scientists.

The intention of this case study is to understand the data generating process behind simple MA(1) models and illustrate weakness of the estimators at small sample sizes.


For the tested MA(1) model with coefficient beta=0.3, a time series length of at least 5000 observations is necessary to reach a narrower confidence interval.

The impact on the goodness of forecasts is evaluated and depends critically on the estimated coefficient.

The case

Install some libraries first:

#for confidence intervals and testing
#for plotting

Now generate some data from the process. This is better than using real-word data since we know in this laboratory setting what we are aiming at: 50 observations, simple MA(1) model with coefficient β=0.3:

n <- 50
beta <- 0.3
eps <- rnorm(n)
x <- vector(length = n)
x[1] <- eps[1]
for(i in 2:n){
x[i] <- eps[i]+beta*eps[i-1]

Now, lets do the fit and see whether the coefficient that was used to generate the data, is uncovered:

#do the fitting
fit <- arima(x,order=c(0,0,1),include.mean=FALSE)
#confidence interval around it:
confint(fit, parm=c("ma1"), level=0.95)

The output is for the coefficient test is:

z test of coefficients:Estimate Std. Error z value Pr(>|z|)
ma1 0.22259 0.15500 1.436 0.151

and the confidence interval:

2.5 %    97.5 %
ma1 -0.08121368 0.5263858

The confidence interval is rather wide after 50 observations and the estimator slightly off. Let us investigate how fast this improves: How long does the series need to be to wrap the confidence interval tight around the input parameter β=0.3.

# generate a grid for different process lengths:
n.grid <- round(1.3^(2:40))
n.length <- length(n.grid)
#reserve some space for the results
uppers <- vector(length = n.length)
lowers <- vector(length = n.length)
estimates <- vector(length = n.length)
counter <- 1
#iterate through the different process lengths and save the estimator
#as well as the confidence interval boundaries:
for(n in n.grid){
#generate the data:
eps <- rnorm(n)
x <- vector(length = n)
x[1] <- eps[1]
for(i in 2:n){
x[i] <- eps[i]+beta*eps[i-1]
fit <- arima(x,order=c(0,0,1),include.mean = FALSE)
conf <- confint(fit, parm=c("ma1"), level=0.95)
estimates[counter] <- fit$coef[1]
uppers[counter] <- conf[2]
lowers[counter] <- conf[1]

counter <- counter+1

Plotting now the development of the estimator for different timeseries lengths shows:

Illustration of estimator development for different time series lengths

Curiously, the “true” value is not even reached for large number of observations. Let’s see whether this is spurious or whether there is an estimation bias by averaging over a number of estimates with the same number of observations:

n <- 5000
n.reps <- 2000
coeffs <- vector(length=n.reps)
for(j in 1:n.reps){
#generate the data:
eps <- rnorm(n)
x <- vector(length = n)
x[1] <- eps[1]
for(i in 2:n){
x[i] <- eps[i]+beta*eps[i-1]
fit <- arima(x,order=c(0,0,1), include.mean = FALSE)
coeffs[j] <- fit$coef[1]

with output:

> summary(coeffs)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.2471 0.2909 0.3002 0.3003 0.3095 0.3441

So as expected, estimator is unbiased for large number of repititions.


Does it matter that the the number of observations need to be large to get sufficiently tight confidence intervals? What is the difference of a coefficient of β=0.3 and β=0.35, since it anyway refers to unobserved random shocks in the past?

Let us investigate the forecasting ability under these circumstances. Generate a 50,000 long time series, fit an MA(1) process to it and create a 5 step forecast:

n <- 50000
eps <- rnorm(n)
x <- vector(length = n)
x[1] <- eps[1]
for(i in 2:n){
x[i] <- eps[i]+beta*eps[i-1]
#fit the MA(1) model to the process:
fit <- arima(x,order=c(0,0,1),include.mean=FALSE)
#generate the 5 step prediction:
predict(fit, n.ahead=5)


Time Series:
Start = 50001
End = 50005
Frequency = 1
[1] 0.1125964 0.0000000 0.0000000 0.0000000 0.0000000
Time Series:
Start = 50001
End = 50005
Frequency = 1
[1] 0.9961955 1.0415099 1.0415099 1.0415099 1.0415099

Excursus: Forecasting MA models

It’s curious that the forecast of aboves model breaks off after one period. Let us quickly investigate what is going on here.

The MA(1) model in its simplest form as we are using it, is given by

Thus, the one period forecast is given by

All forecasts are conditional on the past

Basic assumption of the MA(p) process is that the error terms are iid. with


So the remaining term in above equation,

can be calculated recursively. Therefore, rewrite the process description Xt using Lag-operator notation(

and thus

Notice now that the series expansion of 1/(1−x)=1+x+x2+x3+⋯ can be transformed by replacing x with −x to

thus, we can calculate ϵt|It:

Since we know the series of {Xt}, we can also calcuate the unobserved series of {ϵt}. Notice further that an MA(1) process can only be forecasted one step, since

This is because the random error terms are independently distributed.

Let us come back to the starting point. The reason we went down this excursus was to understand whether the predictor quality is actually an issue in forecasting MA(1) models. In our concrete case, the difference between β=0.3 and β=0.35 can be calculated:

forecast.errors <- vector(length=1000)
#construct the vector of betas
beta03.vec <- (-0.3)^(0:(n-1))
beta035.vec <- (-0.35)^(0:(n-1))
for(i in 1:1000){
#one step forecast (as described above):
X03 <- sum(beta03.vec*rev(x))*0.3
X035 <- sum(beta035.vec*rev(x))*0.35
#percentage error:
forecast.errors[i] <- (X035/X03-1)^2

The result is 0.32 — So the forecast deviates by ∼32% (RMSE) when the β is slightly differently estimated.


  • MA(p) models can only be forecasted for p periods.
  • Although only one parameter is estimated, these models need sufficient sample sizes for narrow confidence intervals
  • It is important to review confidence intervals width and investigate the impact on forecast quality
  • For classical operations research tasks, there are better fitting models: often volume curves can be predicted better by investigating the underlying drivers of the curve, rather than following a purely univariate time series approach.



Konrad Hoppe
Nerd For Tech

applied AI / strategy consultant / aspiring XC rider on weekends //

Recommended from Medium


See more recommendations