Nerd For Tech
Published in

Nerd For Tech

Sample size and time series models — A case study on ARIMA() processes.

Goals and contents

ARIMA timeseries models are often taught in econometrics courses as part of the regular business science curriculum and are thus put to use by sometimes inexperienced data scientists.

The intention of this case study is to understand the data generating process behind simple MA(1) models and illustrate weakness of the estimators at small sample sizes.


For the tested MA(1) model with coefficient beta=0.3, a time series length of at least 5000 observations is necessary to reach a narrower confidence interval.

The impact on the goodness of forecasts is evaluated and depends critically on the estimated coefficient.

The case

Install some libraries first:

#for confidence intervals and testing
#for plotting

Now generate some data from the process. This is better than using real-word data since we know in this laboratory setting what we are aiming at: 50 observations, simple MA(1) model with coefficient β=0.3:

n <- 50
beta <- 0.3
eps <- rnorm(n)
x <- vector(length = n)
x[1] <- eps[1]
for(i in 2:n){
x[i] <- eps[i]+beta*eps[i-1]

Now, lets do the fit and see whether the coefficient that was used to generate the data, is uncovered:

#do the fitting
fit <- arima(x,order=c(0,0,1),include.mean=FALSE)
#confidence interval around it:
confint(fit, parm=c("ma1"), level=0.95)

The output is for the coefficient test is:

z test of coefficients:Estimate Std. Error z value Pr(>|z|)
ma1 0.22259 0.15500 1.436 0.151

and the confidence interval:

2.5 %    97.5 %
ma1 -0.08121368 0.5263858

The confidence interval is rather wide after 50 observations and the estimator slightly off. Let us investigate how fast this improves: How long does the series need to be to wrap the confidence interval tight around the input parameter β=0.3.

# generate a grid for different process lengths:
n.grid <- round(1.3^(2:40))
n.length <- length(n.grid)
#reserve some space for the results
uppers <- vector(length = n.length)
lowers <- vector(length = n.length)
estimates <- vector(length = n.length)
counter <- 1
#iterate through the different process lengths and save the estimator
#as well as the confidence interval boundaries:
for(n in n.grid){
#generate the data:
eps <- rnorm(n)
x <- vector(length = n)
x[1] <- eps[1]
for(i in 2:n){
x[i] <- eps[i]+beta*eps[i-1]
fit <- arima(x,order=c(0,0,1),include.mean = FALSE)
conf <- confint(fit, parm=c("ma1"), level=0.95)
estimates[counter] <- fit$coef[1]
uppers[counter] <- conf[2]
lowers[counter] <- conf[1]

counter <- counter+1

Plotting now the development of the estimator for different timeseries lengths shows:

Illustration of estimator development for different time series lengths

Curiously, the “true” value is not even reached for large number of observations. Let’s see whether this is spurious or whether there is an estimation bias by averaging over a number of estimates with the same number of observations:

n <- 5000
n.reps <- 2000
coeffs <- vector(length=n.reps)
for(j in 1:n.reps){
#generate the data:
eps <- rnorm(n)
x <- vector(length = n)
x[1] <- eps[1]
for(i in 2:n){
x[i] <- eps[i]+beta*eps[i-1]
fit <- arima(x,order=c(0,0,1), include.mean = FALSE)
coeffs[j] <- fit$coef[1]

with output:

> summary(coeffs)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.2471 0.2909 0.3002 0.3003 0.3095 0.3441

So as expected, estimator is unbiased for large number of repititions.


Does it matter that the the number of observations need to be large to get sufficiently tight confidence intervals? What is the difference of a coefficient of β=0.3 and β=0.35, since it anyway refers to unobserved random shocks in the past?

Let us investigate the forecasting ability under these circumstances. Generate a 50,000 long time series, fit an MA(1) process to it and create a 5 step forecast:

n <- 50000
eps <- rnorm(n)
x <- vector(length = n)
x[1] <- eps[1]
for(i in 2:n){
x[i] <- eps[i]+beta*eps[i-1]
#fit the MA(1) model to the process:
fit <- arima(x,order=c(0,0,1),include.mean=FALSE)
#generate the 5 step prediction:
predict(fit, n.ahead=5)


Time Series:
Start = 50001
End = 50005
Frequency = 1
[1] 0.1125964 0.0000000 0.0000000 0.0000000 0.0000000
Time Series:
Start = 50001
End = 50005
Frequency = 1
[1] 0.9961955 1.0415099 1.0415099 1.0415099 1.0415099

Excursus: Forecasting MA models

It’s curious that the forecast of aboves model breaks off after one period. Let us quickly investigate what is going on here.

The MA(1) model in its simplest form as we are using it, is given by

Thus, the one period forecast is given by

All forecasts are conditional on the past

Basic assumption of the MA(p) process is that the error terms are iid. with


So the remaining term in above equation,

can be calculated recursively. Therefore, rewrite the process description Xt using Lag-operator notation(

and thus

Notice now that the series expansion of 1/(1−x)=1+x+x2+x3+⋯ can be transformed by replacing x with −x to

thus, we can calculate ϵt|It:

Since we know the series of {Xt}, we can also calcuate the unobserved series of {ϵt}. Notice further that an MA(1) process can only be forecasted one step, since

This is because the random error terms are independently distributed.

Let us come back to the starting point. The reason we went down this excursus was to understand whether the predictor quality is actually an issue in forecasting MA(1) models. In our concrete case, the difference between β=0.3 and β=0.35 can be calculated:

forecast.errors <- vector(length=1000)
#construct the vector of betas
beta03.vec <- (-0.3)^(0:(n-1))
beta035.vec <- (-0.35)^(0:(n-1))
for(i in 1:1000){
#one step forecast (as described above):
X03 <- sum(beta03.vec*rev(x))*0.3
X035 <- sum(beta035.vec*rev(x))*0.35
#percentage error:
forecast.errors[i] <- (X035/X03-1)^2

The result is 0.32 — So the forecast deviates by ∼32% (RMSE) when the β is slightly differently estimated.


  • MA(p) models can only be forecasted for p periods.
  • Although only one parameter is estimated, these models need sufficient sample sizes for narrow confidence intervals
  • It is important to review confidence intervals width and investigate the impact on forecast quality
  • For classical operations research tasks, there are better fitting models: often volume curves can be predicted better by investigating the underlying drivers of the curve, rather than following a purely univariate time series approach.




NFT is an Educational Media House. Our mission is to bring the invaluable knowledge and experiences of experts from all over the world to the novice. To know more about us, visit

Recommended from Medium

Telegram Network Visualization — Tracing Forwards and Mentions

Graph network showing Telegram channels that are of interest to cryptocurrencies pump-and-dump groups

Be Proactive on Customer Churning

Order within Chaos(Orbs): Emerging patterns in Path of Exile’s volatile economy

My Journey from Food Scientist to Data Scientist

Evaluation Metrics for Classification Explained

The 7 Best Data Science and Machine Learning Podcasts

Machine Learning on Databricks — part 2: modeling experiments

Data Science for Sustainability

Photo of hot air balloons representing data and data science for sustainbility and sustainable development.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Konrad Hoppe

Konrad Hoppe

applied AI / strategy consultant / aspiring XC rider on weekends //

More from Medium

The “One Minute Map” — static and interactive maps creation with the R tmap package

From Sensing to Sensemaking: Analyzing, Visualizing, and Modeling New York City’s Buses

Importing Inquisit data files into SPSS

Feature Engineering for Machine Learning in R