Fractional differencing as viewed from the frequency domain

With applications in Fintech

30 min readDec 20, 2023

Discrete steps, instead of a continuous hill. Photo by Scott Webb on Unsplash

Fractional differencing (fracdiff) of time series(es?) has attracted attention in time series pre-processing and modeling. There are many good write-ups and videos online about this concept (some links provided in this article), but they sometimes seem to be lacking treatment of the fractional differencing procedure as a digital filter, and the frequency domain characteristics thereof. By analyzing fracdiff in a filter-focused fashion, it may give us other filtering ideas to help improve model quality.

The Excel file to go along with this article:

https://github.com/diffent/excel/blob/fracdiffArticle/fracdiffC.xlsm

Updated Dec 22 2023: Be sure to check out the Excel example followup to this article!

https://medium.com/@nttp/fractional-differencing-example-in-excel-226cb52f3523

Since fracdiff is known to be a method of making a time series stationary while keeping as much historical information as possible (versus ordinary differencing), it seems as if this filtering method should be a “go to” approach to achieve price series stationarity; versus the often default method of taking first and maybe second or even third differences of a time series to make that series stationary in a time window of interest.

Both ordinary differencing and fractional differencing can be thought of as digital filters that are applied to raw time series data, with ordinary differencing being a special case of the fractional kind.

Here is a good quote from another fine article on this topic:

It is highly surprising that, about a generation after the introduction of Arfima [see our Note 1] models, the concept of fractional differencing has seemingly not gained wide-spread traction in Finance. To an extent that what many may attribute to the ‘efficiency of markets’ might not be but an artefact [sic, see our Note 6] of the voluntary cancellation of information by unsuitable data preprocessing.

S. Kuttruf, https://towardsdatascience.com/preserving-memory-in-stationary-time-series-6842f7581800

As Kuttruf notes, such attention on fracdiff is warranted because this method keeps more historical information in a resultant time series than is kept by ordinary differencing (or, equivalently, returns computations on price series; returns being just dynamically normalized differences), while still attempting to take a non-stationary time series and make it stationary (enough) for a robust model to be built from it. In other words, fracdiff is a filter that can let more low frequency information pass through it versus ordinary differencing.

Another detailed discussion on the derivation of fractional differences can be found here:

http://kidquant.blogspot.com/2019/02/welcome-and-introduction-to-fractional.html

And, there are nice animated graphics in the following video for understanding the continuous variable fractional derivative, which is directly related to fractional differencing of discrete-time data that we study here. Though this is a rather extensive diversion from the topic at hand, it may provide some insights into the concept. You may want to bookmark it for later:

https://www.youtube.com/watch?v=2dwQUUDt5Is

Ordinary differencing

When we take an ordinary first difference of, let’s say, a price series (equivalent to continuous time’s first derivative that we all know from Calculus 1), we normally think of the following simple relation:

D(t) = P(t) - P(t-1) [Equation 1]

Note that an ordinary continuous derivative is defined as something like ∆Y/∆X, as the (functionally linked to ∆Y) ∆X approaches zero. However, in discrete time, the tiniest change in time is one time step, no matter what the time step represents physically (in our example, one trading day), as captured in Equation 1. Hence, for our discrete time case, we just set the ∆X (denominator) to 1.0… implying one time step. So the ∆Y/∆X ratio simplifies to just the numerator ∆Y.

Fractional differencing

We won’t get into the mathematical derivation of the concepts of “half derivatives” or “fractional derivatives,” since the other references linked here do a fine job of that. Rather, we will focus on some implications of fracdiff and maybe uncover some intuitions related to the method.

We do find it curious that the above linked fractional derivative video claims that — while there is an easy geometric interpretation of an ordinary derivative (a slope at a point on a curve, or a tangent hyperplane to a hypersurface in the multi dimensional case… “rise over run” or “tilt of a plane” type of thing, which we learn in Calculus 1 or even algebra) — there is no similar physical or geometric interpretation of a fractional derivative.

Hence, we propose this interpretation, as least for the discrete time cases studied in this article:

1) Fracdiff is a quantity that shows you differences from the near past as well as the far past, all rolled up into one weighted sum, with less influence the farther back you go in time. Some of the comments on the linked video seem to hint at the interpretation that we describe in this article. This lack of colloquial description in the video may be due to the examples shown of taking fractional derivatives of low order smooth polynomials (which might be thought of as gradually changing curves with primarily low frequency components in them… Y gradually changing over X in the video’s examples) instead of the complicated asset price time series(es?) that we deal with, with information at many different frequencies embedded in them.

2) Fracdiff is a particular type of digital filter that applies weights computed in a specified manner to lags of a time series, to filter that time series: Either by reducing low frequency, slowly changing behavior, or augmenting it; all in a smoothly changing manner as we adjust d, the fractional difference order.

Vocabulary clarification

Note that a “fractional derivative” is not the same thing as a “partial derivative,” which deals with finding a derivative in one or more directions of a multi-dimensional function. Mix those up and you’ll be in a heap of confusion. Having said that, I’ll bet that someone has defined a partial fractional derivative… hmm… Or would it be a fractional partial derivative? Yes, probably the latter. These seem like they might be useful in image processing or 3D/ND data processing if we view them as digital filters as we do here (digital filters being valuable for processing images and video motion, with prime examples being some of the filters that your Instagram or phone cameras allow you to apply to images or videos). And wow, so if we have two partial derivatives 𝜕z/𝜕x and 𝜕z/𝜕y, we could potentially have different fractional orders on each. Okay, okay… we are ski-ing ahead of our skills again and are in hazard of rolling down the mountain of ideas in a puff of snow-like flakes illuminated by the bright winter sun of reality; so let’s slow down a bit. Fractional partial derivatives are an investigation for another time.

Let us begin…

To start with fractional differencing, we extend the above [Equation 1] to:

D(t) = A*P(t) + B*P(t-1) + C*P(t-2) + …

(with A B C etc as constants)

Then for ordinary differencing,

A = 1
B = -1

with C and all the remaining coefficients set to 0:

D(t) = 1*P(t) + (-1)*P(t-1)

What fractional differencing does is assign other specific non-zero values to B, C, D, etc. And to make working with the coefficients easier, we will write them as an array w[.] (w standing for “weight” in this case to match a diagram which we will show shortly):

D(t) = w[0]*P(t) + w[1]*P(t-1) + w[2]*P(t-2) + … [Equation 2]

Hence, we recognize that the fractional difference at each time step is a dot product of the two vectors: 1) all the discrete lags of P (as far back in time as we have available at that time step), “dotted with” 2) the coefficient array w[ ].

Let’s say that again but slightly differently: Each time step of the output is the result of a different vector dot product. And recall that the result of a vector dot product is a scalar (single floating point value in our computerized case).

Figure 0: Example of computing the first two elements in a fracdiff output

In the following, symbols representing vectors are shown in bold here, instead of putting the little vector indicator arrows on top of the symbols. w is a constant vector, but at every output point D(t), we refer to P(t) (and back in time, implied in the dot operation).

Ah ah, but we have left out one thing: As we go back in time in our time-limited series, the total remaining length of P(t) decreases, and so we end up having to discard the tail end of the w weight array, since there are fewer and fewer coordinates to multiply before the final dot product sum-up as we go back in time in the price series. This is an “implementation detail,” as goes a common theoretical dismissal; but it also results in some possibly model-disrupting end conditions as we get to the end of P(t).

Figure 0B: Fractional difference order 0.5 computed from some TSLA price data using the GRETL application. Beginning of data series at left, “start of data” outliers from the fracdiff function.

After computing the fracdiff on some price data, we may need to throw away some points at the end of the resultant array (towards the left of the above chart, “old data”) before we do any further modeling on them, because those left-most points only get the first few elements (or a single element, in the case of the ultimate point [Note 5]) of the weight array applied to them. Those few points on the left of the chart are only a portion of a fractional difference [Note 7], and probably should be treated as discardable outliers… but we need to add them back in when we then re-convert a fractional difference back to the original (price) units, so we may not want to discard them entirely.

D(t) = w • P(t… backwards to t0) [Equation 3] rewrite of Equation 2 in vector notation. Left side of the equal sign is a scalar.

All of the fracdiff coefficients aside from w[0] are less than 0 when d > 0. The ordinary difference discussed above is also this same dot product, but with coefficients w[2] and back in time set to 0. So to simplify this, we just say that D(t) = P(t)-P(t-1) for an ordinary difference. Why bother doing all those multiplies by zero then summing up a bunch of zeros, right?

We borrow this coefficient chart from the above noted article by Kuttruf:

Figure 1: Lag coefficients for fracdiff at selected difference orders

The coefficient chart makes everything clear and easy to visualize. Instead of just taking the full P(t-1) term of the series and

1) subtracting it from P(t), [this is ordinary differencing]

we instead:

2) subtract a smaller fraction of P(t-1),

and then:

subtract successively smaller amounts of the older lags.

In this manner, we retain information from the older lags in our new “difference-like” quantity.

Qualitative view of filter coefficients

Though those coefficients [in Figure 1] are graphed versus discrete lags (discrete X points on the graph), one can visualize a downward spike from 1.0 to -1.0, which then gradually decays asymptotically up to zero, all coefficients staying less than 0 during the decay, with no oscillation on the way back up above zero. In other words, we only subtract lags older than P(t); we never add back in any of the older lags for the fracdiff filter. This is not a universal characteristic of these lag-based digital filters, however. There are filters that add back in older values (rolling window / moving average filters are a good example of these).

Filter coefficient formula

The formulation of the factors to subtract (the filter coefficients, that is) is shown by Kuttruf, and is a simple(?) recurrence relation. This relation may be “clear as mud” until you run some examples.

Figure 1A: Fracdiff k-th coefficient (weight per lag) formula, verbatim from Kuttruf (d = difference order).

We posted some C code online to generate these coefficients if you want to experiment:

https://github.com/diffent/fracdiff

And there are other codes online to generate these coefficients and perform the full fracdiff operation [Note 3]. Additionally, we have translated our C code for fracdiff — which originated from some Python code someone posted online (references in our C code comments) — into an Excel user defined array function, which we will demonstrate shortly.

Here are a couple of examples of fracdiff filter coefficients with the difference level d set to 0.8 (which is close to an ordinary difference where d = 1, but allowing more of the older data in… allowing through the filter a bit more lower frequency information than the ordinary difference does); and compared with d set to 0.5 (a “half difference,” equivalent to a “half derivative” in continuous time). d = 0.5 lets even more of the older data into the result versus d = 0.8 (and hence more lower frequencies). You can see this from the coefficient magnitudes:

Figure 2: Fractional difference weight comparisons for d = 0.8 versus d = 0.5

Let’s compare the weight w[2] for both cases:

When d (difference level) = 0.8, w[2] = -0.08
When d (difference level) = 0.5, w[2] = -0.125

That is: In the d = 0.5 case, we give more weight (-0.125) to the 2nd lag and all older lags than we do for the d = 0.8 case (weight = -0.08)

Again we remind: The fractional difference is a measure of how the current value changed versus:

a lot of the immediately prior value,
and somewhat less of the next prior value,
and somewhat even less than the next-next prior value,

etc…

How much more or less we take of the older values depends on the setting of d.

Just a weighted sum

But hold on a moment: This is just a weighted sum of prior price values. But what is a run-of-the-mill moving average, or even the more fancy exponentially weighed moving average? Why… they are also weighted sums of the prior values. It just so happens that for the moving averages, all coefficients on lags are greater than 0 (and in the case of simple moving averages, all the coefficients are the same, at least within the rolling window; they are zero outside the rolling window, hence throwing out older information that fracdiff keeps). Whereas, here with fracdiff, only one of the coefficients is greater than 0 (the first one). Hence, moving averages of all types, along with fractional differencing, are in the same family of operations that can be applied to time series. They are a specific class of digital filters that weight prior values according to some formula, then sum them up.

Econometricians will notice that these terms which are a weight multiplied by a lag of the original series are just AR terms (Auto Regressive terms). And since I was confused by this phrase when I first learned it way-back-when, I’ll try to help the newbies here: The term “auto” does not mean “automatic” or anything like that, but it means “same” or “self.” “Auto-regression” in this case means “regression based on prior values of the same data series.”

But instead of computing the coefficients of these terms to fit our data like we do in a true auto-regression model, we are specifying the coefficients a priori in fracdiff, and then maybe later fitting another AR model on top of the output from the first AR computation when we build models from this fractionally differenced data. Wow! So many AR terms! Again, it’s not really auto-regression in the pre-filtering case, because there is no regression involved; but they are AR-style terms. Could these two AR modeling steps be merged? We won’t discard that idea, since it seems feasible at first glance (famous last words!), and maybe this is how some of the packaged ARFIMA codes work (FI = Fractionally Integrated); but in this article, we leave them as two separate steps.

Fracdiff filter properties

Those familiar with digital filters will see that the coefficients specified by the recurrence relation [Figure 1A] are just one particular type of digital filter that we can apply to any time series data set (price or otherwise). We could put different numbers into the w[.] array to get different filters. Some filters will enhance our model and make it forecast better; most filters will not.

We won’t get into it in this article because it is far too long already, but who is to say that the particular decay of the fracdiff cofficients back to zero is the ideal decay rate? There may be other filter coefficients similar to fracdiff’s (or not) that can be applied to give better models. We hope to get into this soon, and just leave you with the hint: “lindiff.”

In fact, one of our favorite free econometric software packages Gretl puts the fracdiff operation in its “filter” menu:

Figure 3: Gretl software, menu for Fractional difference

The Gretl user interface only lets the difference order to be set > 0, but if you use the functional form of fracdiff in the command line window of Gretl, you can use negative values of d (which give fractional or integer sums / discrete time definite integrals). This allows you to “undo” fractional differencing, much like integration undoes differentiation, or summing undoes ordinary differencing.

The fracdiff family of fine filters

fracdiff itself is not just one function, but is a family of digital filters, depending on what we set the (floating point) difference level to. If we set d = 0, we get a pass-thru filter: Output = input. If we set d = 1, we get the ordinary difference filter P(t)-P(t-1) as noted earlier and in Kuttruf’s article. If we set d = -1, we get a pure integrator or cumulative sum in discrete time: This “undoes” a fracdiff when d was set to 1. It is the inverse function of when fracdiff d is set to 1, much like differentiation and integration are inverses (less a constant of course, but that constant gets dealt with in our particular configurations here of definite integrals / cumulative sums).

Let’s look at this latter case (cumulative sum). For this case, all weights get computed to be 1.0 from the recurrence relation. You can either trust us on this, or run one of the coefficient generator codes yourself.

So then we get:

D(t) = 1*P(t) + 1*P(t-1) + 1*P(t-2) + … all terms are multiplied by 1 and summed. This is merely a cumulative sum or definite integral [Note 8] if our time variable happens to be continuous (which it (almost?) never is in “real world data” code, but… one can imagine).

Figure 4: Fracdiff of order -1 (cumulative sum) of some TSLA price data. Where did all the noise go? Hint: It’s still there, but you can’t see it at this Y scale.

Sign reminder

Be sure to notice the somewhat counter-intuitive situation with the signs: If d = -1 , all coefficients become 1. That is how fracdiff is traditionally formulated, so… We are pretty sure that this relation could be formulated the reverse way as a fractional integral or fractional sum (fracsum?), so that (postive) d = 1 yielded all 1.0 filter weights. However, we don’t want to go counter to traditional formulations here, which would confuse us and our readers even more than we are already confused.

But back to fracdiff: If we examine the fracdiff results a little closer, when d, the difference order, starts to move above zero, we will get more of a high pass filter effect (high frequencies / fast changes get through the filter, low frequency long-term behavior gets attenuated). This is exemplified by the localization-in-time effect of the basic simple difference when d = 1. Since the simple difference only accounts for the current value and the immediately prior value, all “slow change trends” due to older values is lost. Longer term, low frequency behavior that manifests in gradual changes over time gets thrown out.

And if we instead set d to be less than zero, we get low-pass filters that reduce high frequency information (some of which might be considered as either noise or otherwise un-modeled, rapidly changing behavior). This we see an example of in Figure 4. With d less than zero, we get smoothing filters similar in concept to moving averages, but with more of a stack-up effect (that is, the cumulative sum does not divide by the number of points in the filter window). Moving averages get rid of noise and jitter in raw data, as our readers probably know, even if that jitter might contain useful / future-predictive information.

The naming convention in filters seems to be to emphasize which frequencies are (partially or wholly) passed through the filter, rather than which frequencies are stopped or attenuated, with the possible exception of the term “band stop” filter.
Low pass: get rid of / attenuate high frequencies
High pass: get rid of / attenuate low frequencies

In fracdiff, the more d goes below zero, the more high frequency information we attenuate. Integrating a series (fractionally or not) tends to make it less stationary (highlighting or amplifying long-term trends as we see in Figure 4), while differentiating a series (d > 0) tends to make it more stationary, because differentiating removes long-term slow-moving trends.

More specific filter frequency characteristics

We claim that the fractional difference operation is a filter… what are its frequency characteristics, then? Which frequencies are attenuated and which amplified? We can visualize this from Fourier transform theory. In reality, we need to use discrete Fourier transform methods (DFT) for discrete time series [DFT] instead of continuous time Fourier methods that we show here, but we can transliterate dX/dY into ∆X/∆Y in our minds for now. There are of course details specific to DFT that need to be handled, which is why a discrete treatment is needed in the first place; but that is leading us down too divergent a path for this article.

Looking at a handy continuous time Fourier transform table:

https://ethz.ch/content/dam/ethz/special-interest/baug/ibk/structural-mechanics-dam/education/identmeth/fourier.pdf

… we see that differentiation and integration are called out there. Excellent. We don’t have to do the Fourier transform math ourselves.

Or do we? When working in discrete time, there may be “issues,” as the loaded one word saying goes:

https://math.stackexchange.com/questions/1657756/is-there-in-discrete-fourier-transform-a-theorem-that-corresponds-to-transforms

Your conclusions were all wrong, Ryan…
— Cap’n Marko Aleksandrovich Ramius, Krasnyy Oktyabr

But all is not lost: The rocket scientists had something to say about this also!

https://ntrs.nasa.gov/citations/19860001358

Nonetheless, we forge ahead with the continuous time transform table if for nothing more than inspiration. We find often that theoretical concepts can give ideas for code that are useful, if not fully corresponding to the constraints of the theories. Probably should dig into that heated Stack Exchange conversation and do the math soon enough, though.

The following two Figures are snapshots from the above transform table link.

Figure 5A: Continuous time Fourier transform of differentiation (transform on the right). Omega ω is frequency. The (jω)^n piece on the right is what we are interested in.

Figure 5B: Continuous time Fourier transform of definite integration to a time t (transform on the right). Omega ω is frequency. The 1/jω piece on the right is what we are interested in. Notice how if n = -1 in Figure 5A, the (jω)^n portion in 5A is identical to 1/jω in 5B. As we change n from positive to negative, the above formulations transition seem to smoothly from differentiation to integration, aside from the “pesky” added term at the end of 5B (right). It seems to be a constant term since pi*F(0) is a constant, and a Dirac delta at all frequencies is a constant function 1. Or is it? Why would they write that as a delta function? Hmm… maybe need to think about that constant bit more… There’s always constants involved in this integration business. As the great Yogi Berra apocryphally said:

In theory, theory is the same as practice. In practice, it ain’t.

In these formulas, let’s just use a floating point value instead of an integer for the differentiation or integration order n. Is this allowed, mathematically?

The best we can say for now is: Perhaps. When writing code in discrete time environments, sometimes the best we can do is get inspiration from continuous theoretical math, without necessarily meeting all the assumptions of that math.

Okay so now we are blowing two assumptions out of the water: 1) we are in discrete time, not continuous time, and 2) we are putting real numbers (or floating point values) where the Fourier transform table showed integers, for difference and integration level.

But as a potential corrective— and what the symbol pushers sometimes have a more difficult time doing — we can do experiments and tests of our ideas in code to see if they make sense.

In actual use, the magnitude versus frequency response curves seem to represent what we are seeing experimentally, at least directionally. But theoretically? “Needs more work.”

In both cases Figure 5A and 5B, the curves go “flat” if we set d to 0 (pass-thru filter), which is expected. No change input to output of fracdiff when we set order d = 0.

The second chart Figure 6B shows similar frequency response information, but for the case when the d differencing order is set to less than zero, to yield fractional integration, or, in our discrete time cases, fractional summing. In this latter case, lower frequencies (to the left on the horizontal axis, toward the origin of the graph) are amplified and higher frequencies (to the right) are attenuated. This is exactly what we see when we do cumulative summing. This is also useful when we want to turn a fractionally differenced series back into its original units. It is probably useful for other things too, but maybe not for the cases that we are discussing in this article.

Note also that, aside from the case of frequency = 0 in the differentiator [Figure 6A] (freq = 0 = constant average value of, say, a price series, “DC” or direct current in engineering terms), all frequencies are merely magnified or reduced. They are not eliminated entirely. This means that they can be retrieved, and the original series reconstructed by an inverse operation.

Magnitude or phase?

The alert reader will notice that these are magnitude response charts [Note 4], indicating how much the amplitude (value) of a function is magnified or attenuated at all frequencies of note.

On the X axes of Figures 6, we don’t specify specific frequency values in Hertz or any other type of specific frequency unit in these charts, as the charts are just conceptual for now to give you a feel for the concept. Seeing that we are working with daily data in this example and there are 60*60*24 seconds in a day, we would probably be looking at millihertz or microhertz frequency units, with the 1 Hertz unit being defined as 1/seconds.

In the charts, we show frequency on the horizontal axis as an Anglicized w (normally noted as lower case Greek omega [ω], probably because we were lazy when making the slides and also didn’t want to bother the viewer with too many Greek symbols). This frequency w in the charts should not be confused with our code-defined w[.] filter weight array earlier in this article… an unfortunate notational overlap, which we will correct in due time.

Phase / time shifts

Filters often shift values in time in addition to changing magnitudes at different frequencies (shifting where peaks and valleys occur, for example).

A simple example of phase shifting can be seen when applying the commonly used “simple moving average” filter that you often see on stock market charting sites. Since these filters take historical values and average them, it takes a while (in the time series stepping domain) for any spikes in the data to get reflected in the filtered value. When a spike occurs, the spike has to “overcome” the weight of the historical data in its sliding data window. Hence, spikes in the filtered data get shifted in time in the output to be later than where they actually happened. Not such a great thing for analysis, but that is reality [Note 2].

Vox 1: “Hey man… something unusual and possibly important happened.”
Vox 2: “Dude, no way!”
Vox 1: “Let’s move the data for that event to a different time! Yeah!”
Vox2: “Okay! Wait… what?”

That’s what ordinary moving averages do, so let the filterer beware.

As we mentioned earlier in this article, a simple moving average filter is merely:

output(t) = A*P(t) + B*P(t-1) + C*P(t-2) + …

The number of terms = ntimesteps in the filter window, with A = B = C = … = 1/ntimesteps.

As we noted earlier, a simple moving average is close in form to our fracdiff filter, but with all coefficients the same (and often fewer coefficients or steps than our input time series; that is, we might have 1000 points but compute a 10 day moving average… 10 << 1000). So fracdiff is itself a moving average filter of sorts, but with the averaging window going all the way back to the start of the data series (only the more recent time end of the window is moving), and non-constant weights [Note 9]. In short, fracdiff will also shift important peaks in time.

This shifting of peaks when a filter is applied is called the “phase response” of a filter, where a “phase shift” is related to a shift in time. It is not as simple as all that, as filters can shift different frequencies by different amounts, just as they can attenuate or magnify different frequencies by various amounts. But again, reality can be complicated, so: We try to deal with this complexity as necessary, but we will not dwell on phase or peak shifting in these examples.

Practitioners have most likely noticed that this happens, but may not know that there is a whole body of knowledge related to this concept (“phase shifting due to digital filters”). Here is a whole book chapter on the topic:

https://link.springer.com/chapter/10.1007/978-3-030-49256-4_11

As the link states, sometimes phase shifting is what you want to do. It is not always a side effect of filtering that annoys us.

As an illustration, we show a graphic from another site that discusses these moving average filters (different of course from our fracdiff filter that is our topic here), and we mark with blue arrows one example where a peak shift occurs:

Figure 6: Snapshot from finance.yahoo.com with simple moving average filter on top of raw data. Note how peaks are shifted to the right (more current) than when they occurred. The highest peak shift is shown approximately. One can also visualize that the overall shape (smoothed) of the original data is shifted to the right.

Phase shifting due to filtering is a complicated topic in itself, and so we leave further details out of this introductory discussion; we just have to remember that such shifts do occur, manifesting back in our data’s time domain as shifts in time of peaks and valleys [and other intermediate peak-to-valley transition behavior] of the output versus the input.

A peek at an Excel example:

How might we apply fractional difference filtering to achieve better forecasts?

We provide an Excel sheet and macros to demonstrate 1) interactive use of the fracdiff filter, and 2) how we can adjust the fractional difference level to get varied out-of-sample results: Some better, some worse.

This sheet is similar in concept to our extensive explanation of using the Excel LINEST ordinary linear regression estimator for forecasting returns via prior return moments. So if you understand that, you will understand this.

In a future article, we will walk new users through this sheet more thoroughly, but here are the important parts in summary form if you want to experiment with the sheet now, in columns left to right:

Column B: Raw TSLA price data, most recent value on top.

(You can put your own price data in column B for other assets; just make sure that the most recent point is at the top; and if the data length is longer than we have here, you will need to modify the formulas that reference this column and fill in blank cells where needed.)

Column D: fracdiff computed by our own user defined macro (array function)

=fracdiff(B2:B1257,$T$2)

The difference level or order is set in T2 (set it to 0 to get original data back, no filtering; set it to 1 to get ordinary difference; set it to -1 to get a pure cumulative sum [equivalent to a definite integral operation in discrete time as described in [Note 8]).

Column E: fracdiff inverse check (should get back values to match original price data column B, within some small tolerance); this serves as one check on our math in the macros. You can clear this column to speed up sheet evaluation, since it is mainly for a check.

Column F: Ordinary daily price difference computed the normal Excel formula way. If we set fracdiff order (cell T2) to 1, fracdiff column D should match this column F. This serves as another check on our fracdiff macro code.

Columns G thru K: 1 through 5 day lags of fracdiff column for later regression model construction (described below), with mask flags from W11 to AB11 (for mask ordering, recall issues with LINEST reversing coefficients versus input columns).

Column M: Linear AR(5) model evaluated (from LINEST estimates, discussed later in this section). These are in the same units as our fracdiff column D. This is just an ordinary least squares approximation to our data in column D, so column M values will typically not match column D values exactly, unless we set the number of points in the regression to be too small (and this is a bad thing for the model to match our data exactly; avoiding overfit being a goal of modeling).

Column N: Model estimates from Column M fractionally differenced additionally to get it to order 1 (= daily difference). In other words, if we had the main fractional order set to 0.8 in T2, this column would apply fracdiff again at an order of 0.2 to convert the estimated data to an ordinary daily difference. Ordinary daily difference is what we want to forecast in this sheet, so we have to get out of the somewhat non-intuitive fractional difference space and back to daily differences.

“I need to estimate what’s gonna happen tomorrow, not some weird fraction of tomorrow and portions of various yesterdays.”
— some day trader, probably

Note: When we convert back to daily differences for out-of-sample checks, it may be better to use the actual data for the in-sample (training) chunk of data, instead of using the in-sample model forecasts. We leave this for future enhancement.

Column O: Direction check of our model fits and forecasts versus reality, referring to the “converted back to” daily difference results in column N.

Column S: Two parameter settings followed by some outputs

Parameter settings:

S2: How far to push down the block of data to fit (that is, the training data) so that we can reserve some near-term data as out-of-sample test data. If we set this to 0, we fit to the most recent data we have, and there is no test data. Hence, test data metrics show as errors in this case. This is fine. We color code the data blocks sent to the regression with a conditional background color format to indicate this push-down, but we don’t bother to stop the color coding at the end of the regression window. This can be added later.

S14: Number of points to send to the LINEST regression.

Outputs:

S18: Of the withheld recent test points, how many of them gave a directionally correct result for daily change in price?

S21: Percent directionally correct in the test point window.

Column T

T2: Fractional difference order to use. 1 should yield ordinary differencing in column D. 0 should yield the price data itself in column D (pass-thru filter). You can go outside the [0,1] range for this value if you want. For this TSLA data, we see from experimenting that fracdiff levels above 1 might be apropos.

T7 and T8: R2 of fit (training data) in fractional difference space, and also a different R2 of fit after converting the data back to daily difference space. Take care which columns are referenced in these. T8 references columns D and M, and T9 references columns F and N.

T10 and T11: Similar to above but for the withheld recent test data set (n points withheld given by S2).

Yellow rectangular range of cells surrounded by some annotations:

W2:AB6 are the LINEST linear estimator coefficient outputs for a 5 lag AR model, with manually maskable coefficients. For more info on how we set this block up, and the issue with coefficient reversals, check out our prior article, since the concepts are the same.

W9:AB9 are computed T scores for each variable, estimating statistical significance of each lag. For low abs(T) < 1.5, the associated variable (annotated in row 1) is probably not significant and can be discarded from the model by setting its related mask to 0 (see our importance of T scores article):

https://medium.com/@nttp/return-forecasting-with-moments-part-3-the-importance-of-t-scores-6a766ee4cffd

This is less of an issue in the current model (versus our return moments study), since we are just trying to show the effects of fracdiff filtering on a simple model, rather than building a complete model.

W11:AA11 Variable masks. The user can manually mask out particular lags — if they show to have low T scores or for any other reason — by setting one or more of these masks to 0. The constant coefficient is always in the model, not controlled by a maks flag, but you can edit this by changing the penultimate (2nd to last) [Note 5] argument of LINEST in Cell W2 to FALSE (then pressing ctrl-shift-enter) to enter it into the sheet.

=LINEST(OFFSET(D2,$S$2,0,$S$14),OFFSET(G2:K2,$S$2,0,$S$14),TRUE,TRUE)

In this sheet, we allow you to move the regression window back in time and withhold more recent data to test, but we do not have an automated backtester built in, nor an automated variable eliminator as in our prior moments forecasting experiment. This is because our AR(5) model is built just to see if we can get better OOS results by adjusting the fracdiff order, and is probably leaving a lot of information “on the table” (e.g. it is leaving out the MA terms of an ARMA model, which often capture important behavior).

Excel goals

The point of this sheet is to allow the user to experiment with fracdiff levels interactively, along with the number of points sent to the linear regression, the number of points withheld for testing, and with particular lags masked out… to see if by tuning the fracdiff level we can get better out-of-sample forecasts than when using standard differencing as measured by OOS R2 or OOS percent directionally correct. The baseline model can be made by setting T2 to 1, to yield ordinary differencing. Our final variable lag5 often shows as significant in our modeling of this particular TSLA data set, so there may be lags farther back than 5 days that also have significance. Since we provide the Excel file, you can add more lags to the model if you want, though it will require some “insert sheet column” operations and other adjustments.

Excel macro overview

There are 2 macros of importance, findWeights_ffd (internal, not sheet callable) and fracdiff [sheet callable by =fracdiff(range,order)], in addition to one test macro to remind you how to make a user defined function that outputs a column array (which we needed to do for fracdiff) and one dummy macro which is defined as a subroutine (Sub in VisualBasic), because the View Macros option in Excel doesn’t seem to show the functions; it only lists the subroutines; you might get a little lost trying to find the macro editor if we don’t provide this dummy macro. We certainly did.

See cell D2 where this formula fracdiff is entered as an array formula all down the column range D2:D1257 (remember to press ctr-shift-enter to update the formula if you want to change it).

=fracdiff(B2:B1257,$T$2)

This is our user defined function, not a built-in Excel function.

Note that is an array formula, so you cannot merely copy the formula down column D, or it will only return the first output element of the result in every cell (and it will take a long time to re-compute the sheet, because fracdiff is compute-intensive even running one iteration of it).

We have not runtime-optimized the fracdiff code; this could be done by using more specific variable types instead of the Visual Basic Variant type in the macros, and (probably more importantly), truncating the fracdiff weight array early as noted above along with possibly other Excel VB tricks.

Summary

From the GPU-related link in [Note 9]:

Integer Differencing Causes Unnecessary Memory Loss

We try to make the point in this article that this long-term memory loss is directly related to the over-attenuation of low frequency information from the source data when we use integer differencing. We use continuous time Fourier transform tables as an inspiration to try to figure out what type of filtering we are actually doing with this fracdiff operation, but acknowledge that an analyst may need to dig deeper into Discrete Fourier Transforms to get better estimates of what precisely is going on in the frequency domain with fracdiff.

Further reading

A nice write-up I just saw on our very own Medium platform, on this topic but in continuous X or time domain. Caution: A lot of math is involved in this one. But go ahead, you can do it!

https://medium.com/intuition/what-exactly-is-a-fractional-derivative-14d450106221

Notes

[Note 1] https://www.stata.com/features/overview/arfima/

[Note 2] A trick that we learned from some old NASA filter code is that if you are not so concerned with real-time operation and are just looking at a snapshot of a time series, you can run the same filter (let’s say: of some types; this may not work for all filters) forwards then backwards on the same time series, and then the peaks first get shifted one way, but then they get re-shifted back to where they belong. Voila! Peaks of filtered data are in the same place as in the raw data. Adjustments need to be made to the filter to get equivalent results to a single pass filter.

[Note 3] https://gretl.sourceforge.net/gretl-help/funcref.html#fracdiff

[Note 4] The Fourier transform of a non trivial real-valued function (“real” in the mathematical sense; represented as a single floating point number in the computer) is a complex-valued function (x + iy with i being the imaginary unit sqrt(-1); often notated as x + jy in engineering, because i is traditionally reserved to represent electrical current). This complex function’s value can be instead and is often represented as a magnitude (floating point) and a phase angle (floating point):

https://gubner.ece.wisc.edu/notes/MagnitudeAndPhaseOfComplexNumbers.pdf

This mag/phase equivalent representation assists in better understanding the effects of applying a Fourier transform versus looking at the rectangular coordinates of the (x,iy). The above link contains more than you need to know. Just think of (mag, phase) as the polar coordinates of the (x,y) and that will get you going.

[Note 5] My engineering colleague sometimes suggests using the words: ultimate, penultimate, parapenultimate, and anteparapenultimate (when apropos) for the final item in a list and the prior three items in that list, respectively. We find this amusing, yet keep such definitions contained in the safe magnetic bottle of a Note so that our readers can know of these terms, yet still allow us to maintain control over the narrative without it “going critical.” There are some of us who find it engaging to have a vocabulary lesson when doing engineering, and vice-versa.

Vox 1: “You cannot write anteparapenultimate; no one will understand you.
Vox 2: “If they take care to learn English properly, they would.”
Vox 1: “Would what?”
Vox 2: “Understand me. I will not spoon feed.”

Point taken.

[Note 6] I found it interesting to learn that “artefact” is the British spelling of what us Yanks spell as “artifact.” Maybe you will too.

[Note 7] Every point older than the most recent point in the resultant fracdiff array also has this issue of being only a partial sum of the weights • the input data points if we compute a weight array the same length as the full input array. This becomes more of an issue when the d (difference level) moves down from 1 and starts approaching 0, then moves to negative values (negative d implying a fractional sum), because the coefficients in the weight array are non-trivial (far from zero) in absolute value in this fractional sum case.

[Note 8] In engineering and finance, the terms “integrator” and “integral” are often used in discussions in a “fast and loose” manner, which can be confusing at first for one who has studied symbol-based integration at university (Calc 2 type of thing).

What integration often means in these informal discussions involving data at discrete and often equal time steps is the definite integral, often from zero to a particular time (let’s say t):

definite integral 0 to t ((f(x)) dx)

At equal discrete time steps t0, t1, …, tN, this is equivalent to f(t0) + f(t1) + f(t2) + … + f(tN), because we assume a time step dx of 1. So the “little rectangles” that we sum-up in an integral all have widths of 1.

That is, we just evaluate our function at each time step and then sum those results up, to a specified time step.

[Note 9] In practice, the coefficients of the fracdiff method decay to very close to zero for values of d closer to 1 (or higher) than to 0. Hence, one can truncate the vector dot product window after some number of steps, once the weight values gets close enough to zero (within some tolerance), yielding an approximated fractional difference filter. Here is a nice article on this early truncation with respect to computing fracdiff on GPUs:

https://www.deeplearningwizard.com/machine_learning/gpu/gpu_fractional_differencing/

For computational efficiency, we want to stop the calculation of weights when the weights are too small (below a certain small predetermined small value).

(quote is from the above link)

We have the beginnings of this code (caution: not tested) for early truncation of the fracdiff calc in our Excel User Defined Function (VB macro code), but we have not exposed this to the spreadsheet function yet in this first release.

Fractional differencing as viewed from the frequency domain

With applications in Fintech

Written by NTTP