Visualization of Time Series in R: 3

Manish Gyawali
6 min readJul 31, 2023

--

Let’s recap some of the theory. A white noise series is a random sequence of temporal data in which there is no recognizable pattern. A random walk is also temporal data that has been generated randomly in which the only thing that can be said is that the value of a data point at time period t+1 can best be predicted from the value of the data point at time period t.

v(t+1) = v(t) + e, where v(t+1) is the value of the data point at time t, v(t) is the value of the data point at time t and e is an unobserved error term. Hence, e = v(t-1) -v(t) .

Assume that e is taken from a standard normal distribution. Hence, it has an expected value of 0 and a standard deviation of 1. We know that e is therefore distributed as white noise. Hence, the difference, v(t-1)-v(t) is distributed as white noise.

Using a for loop to create a random walk

With that definition in mind, let us create a random walk series from first principles. We use a for loop because for this as it allows us to link previous and current values of the data point, that is v(t) and v(t+1) or v(t-1) and v(t).

If you haven’t used a loop before, there are three basic steps: first, you need to initialize a value. Then, set the loop to run a prespecified number of times. Then, when inside the loop, update the value by incrementing it by a specified amount. In our case, each time you run the loop, you get a value for a new time period that depends on the value for the time period immediately preceding it incremented by a single draw from a standard normal distribution.

The code is below.

#INPUT
v <- c(0) # initialize a value
for(t in 1:100) { # set the number of loops
v[t+1] = v[t] + rnorm(1) # the new value equals the old value plus
## the value of a random draw from a SND.
}
#OUTPUT
head(v,10) #v is now a vector consisting of a sequence of numbers from RW

[1] 0.000000000 0.725411850 0.984181088 1.742253387 3.713209656 4.125795166 4.860516082
[8] 3.546741507 4.034720766 3.476251666

Now let’s plot it. Remember when you are using lattice to first convert the series to a time series by using the ts() command. If you are using plot() from base R, you don’t have to do anything to it.

xyplot(ts(v), 
main = list(label = "Random Walk obtained from 1st Principles", cex = 0.75))
Random walk sequence generated using a for loop

Notice that this does look like a random walk. There are periods of apparent trends.

Using cumsum() to create a random walk

Instead of using a for loop, you can use the command cumsum() to create a random walk. It is a more elegant method but it essentially does the same thing. Cumsum() cumulatively adds values, so if we specify random draws from any probability distribution, it will successively add the values from that distribution.

v <- c(0,cumsum(rnorm(100))) # generate a sequence of random numbers 
## using initial value 0
xyplot(ts(v))
Random walk plot generated using the command cumsum()

Testing for white noise and random walk series:

An easy, visual way to test whether a series is a white noise or random walk is to do an autocorrelation function (ACF) test. Autocorrelation is, as the name suggests, self correlation, i.e., the degree of correlation between current and past values of a variable.

Lagged values

If, at time t, we consider values of a variable at time t-n, then the latter value is said to be lagged by n periods. Hence, the degree of autocorrelation should diminish with increasing lags.

In a white noise series, there is no autocorrelation whatsoever. In a random walk series, there could be autocorrelation. To test this, we can use a a figure called a correlogram. An autocorrelation function plot (ACF plot) is a kind of correlogram in which correlations between lagged values of a series are shown for a number of lags.

If the series is white noise, there are no significant lags (except for the first one). If the series is non-stationary, there is correlation among the lagged values. However, the correlations diminish exponentially.

We can use the acf() function from base R to obtain the ACF plot. There is no need therefore to convert the series to a time series.

acf(cumsum(rnorm(100)), main = "ACF Plot of a random walk series")

The figure above shows the ACF plot for a random walk series. 20 lags are shown in the figure. Notice that all lags lie above the upper dotted blue line. That shows that they are significant.

By contrast, very few lags of a white noise series are significant.

acf(rnorm(100), main = "ACF Plot of a white noise series") # white noise

Note that there are no significant lags. (Lag 0 does not count!)

Another way to check for whether a series is white noise or not is to use the Ljung-Box test. This is a statistical test in which the null hypothesis is that there is no autocorrelation in the series. Hence, if the test gives us large p-values, there is no reason to reject the null hypothesis of no autocorrelation. Only if the p-values are close to zero do can we reject it.

In R, we use the Box.test() function, additionally specifying that the type is ‘Ljung-Box’ to specify that we want the Ljung-Box test. Further, we make modifications in the command (see code) to ensure that only the p-value is obtained.

Let’s try first with a series we know to be white noise.

# test whether a series is WN
# INPUT
# set a seed, so that the same sequence, and hence the same value
## is generated every time
set.seed(123)
Box.test(rnorm(100), type = "Ljung-Box")$p.value

# OUTPUT
0.7950082

Notice the large p-value. That means we cannot reject the null hypothesis of zero autocorrelation.

Note: for certain seeds, you may get a small p-value, but that is highly unlikely. To convince yourself randomly select ten different seed numbers and then perform the Box-Ljung test. The average p-value will be large.

If you are not convinced, look at the following code and its output. In Part I of the code we have a for loop that generates a random sample of 50 different seed numbers. Then the Box-Ljung test is performed for a white noise series corresponding to each particular seed and the p-value extracted as before.

Then, in Part II of the code, we obtain the number of series for which the p-value is less than 0.05.

# Verifying Box-Lung test for different seed values. Can be omitted without
## loss of continuity

# INPUT

# PART I
## For loop
# Initialize the p-value
p_val <- c()
# sample.int(N) gives you 50 DIFFERENT integer values
for(x in sample.int(50)){
# seed corresponding to x
set.seed(x)
# p-value corresponding to Box-Ljung test
p_val[x] <- Box.test(rnorm(100),type = "Ljung-Box")$p.value
}

# PART II
## How many p_values out of 50 are less than 0.05?
sum(p_val < 0.05)

# OUTPUT
3 # Hence, only 3/50 or 6% of p-values are 'small'.

Now let’s try the Ljung-Box test with a series that we know is a random walk.

# INPUT
set.seed(123)
Box.test(cumsum(rnorm(100)), type = "Ljung-Box")$p.value

# OUTPUT
0

The p-value is zero and so we can safely reject the null hypothesis of zero autocorrelation.

--

--