Uncovering dynamic information exchange between financial assets: transfer entropy introduction

Neuri Research
neuri-ai
Published in
9 min readSep 9, 2019

In financial markets, different assets can be co-dependent and as such influence each other directly or indirectly, driven by a range of factors. These interactions may be nonlinear, unstable and long-ranged. Such inter-asset relationships play a critical role in determining risk, since generally speaking, an uncorrelated portfolio of assets has a lower risk profile than a correlated one. Correlation functions however, when used to study the dependencies internal to the market, are unable to accurately capture the true interactions since they are described by linear models, and also do not specify the direction of influence by one asset on another. In this blog post, we will explore transfer entropy (TE) as a tool to explain which aspects of the market drive another, and analyse financial time series data using an effective network model that uses the concept of multivariate transfer entropy.

Transfer entropy as a directional measure of information dynamics

Transfer entropy is viewed as a model in a wider perspective of studying distributed intrinsic computation in complex systems [1]. A multivariate time series of states is understood to be a complex system, and the computation of the next state can be considered to be the output of a local computation within the system at a specific time. From the perspective of the dynamics of the target variable, the transfer entropy aims to quantify the information transfer that would be included in the model by accounting for the past influence of the source, having considered the past of the target. The transfer entropy thus quantifies the amount of information about the next observation Xₙ of process 𝑋 that can be found in observation Yₙ of process 𝑌, in the context of the past state 𝑿ₙ(𝑘) = {𝑋ₙ₋ₖ₊₁, … , Xₙ₋₁,Xₙ}.

Transfer entropy from source Y to target X, with target history length k. (Figure source: [7])

In addition to accounting for redundant information from the past, conditioning on target past allows the transfer entropy to include complementary information from source and target past. This information transfer quantified by the transfer entropy is asymmetric in source and target. The transfer entropy can hence be used to detect such asymmetries in a system, as well as to distinguish between driving and responding forces [2].

While T(k) is the average information transfer from source to destination that helps predict next value in context of some finite past, we can also define the local transfer entropy t(k) (as shown in the above figure) as the information transfer from a given source to target, relevant in predicting the immediate next value in context of past. The local information-theoretic measure is constructed by extracting the log term from a globally averaged measure. The transfer entropy metric is a global average of this local transfer entropy at each observation [3]. The local transfer entropy may be either positive or negative (with the source being either informative or mis-informative respectively), and it specifies the dynamics of information transfer in time.

Pairwise transfer entropy can be used to understand statistical coherence between different time series in the context of finance. The concept of pairwise transfer entropy has been applied earlier in the analysis of financial time series, towards examining interactions between huge markets, such as between the Dow Jones and the DAX stock indices [4], as well as towards analyzing information flow over individual stocks, considering for example, stocks in the NYSE [5], or companies with largest market capitalization across the world [6].

Conditional transfer entropy , Multivariate transfer entropy and Effective Network Inference

In this post we will consider using a multivariate source approach, that goes beyond the pairwise approach which considers a single source at a time.

To define the notion of conditional transfer entropy, we can consider how much information about the next observation Xₙ of process X can be found in observation Yₙ of process Y, in context of the past state 𝑿ₙ(𝑘) = {𝑋ₙ₋ₖ₊₁, … , Xₙ₋₁,Xₙ} and observation Zₙ of process Z. Mathematically, conditional transfer entropy is similar to the transfer entropy apart from the fact that it involves additionally conditioning on Z:

The additional conditioning removes redundancies between source Y and the conditional Z, such as common driver effects (Z being a common driver for X and Y) or pathway effects (such as a transfer between Y and X via Z). The conditional transfer entropy also includes synergies between Y and Z, such as the synergistic conditional transfer entropy involved in a XOR operation Xₜ = Yₜ-₁ XOR Zₜ-₁ , that would otherwise not be captured in pairwise transfer entropy measures.

Modelling information processing in X, considering two sources (Figure source: [7])

It can be seen that the information in the next state of the destination X is decomposed to a sum (as described in the figure above) of the stored information from its past, the incremental contribution of each source in the set of causal information contributors to X, and the remaining uncertainty after all these sources have been considered.

Given the time series for each of a set of variables, we seek to represent the inferred directed relationships between these variables, with a minimum network model for the time series of the nodes. This approach of effective network inference can reflect dynamic changes in the regime of the system. Iterative or greedy approaches with conditional transfer entropy can both capture synergies and get rid of unwanted redundancies, while being able to capture non-linear interactions. In our task of effective network analysis, the goal is to identify the set of source variables for X which maximizes the collective transfer entropy from it to X, subject to the condition that each source in the set incrementally adds information to X in a statistically significant fashion. In case of the example with two sources (as in the figure above) our goal is to infer the parent set {Y, Z} for X.

We will now make use of an open-sourced package, IDTxl toolkit [8] that enables us to perform such information dynamics analysis. A greedy approach to infer parent set, using the following steps, is implemented in the IDTxl toolkit:

  1. Embed target past
  2. Set parent set P as the empty set.
  3. Evaluate TE (source -> target | P), i.e. the TE from each source to target, conditioned on P
  4. Add the source with the maximum conditional transfer entropy to P, if the p-value is statistically significant
  5. Go to step 2, if a new parent is added, else go to step 5
  6. Prune redundant links in the final parent set.
  7. Perform statistical test on the entire parent set.
General non-uniform source and target embedding depicted for effective network inference while considering a single target. The max time horizon is also known as the max lag window in the context of IDTxl, and may be different for the source and target (Figure source: [7])

Multivariate transfer entropy analysis on selected US Equity data

Let’s consider the following set of 20 stocks included in the S&P 500 index from NYSE (.N)and NASDAQ (.OQ) exchanges: Adobe Systems Incorporated (ADBE.OQ), Air Products and Chemicals Inc (APD.N), CME Group Inc (CME.OQ), ConocoPhillips (COP.N), Walt Disney Company (The)(DIS.N), Duke Energy Corporation (DUK.N), Intuit Inc (INTU.OQ), Mosaic Company (The) (MOS.N), Microsoft Corporation (MSFT.OQ), NRG Energy Inc (NRG.N), ONEOK Inc (OKE.N), O’Reilly Automotive Inc (ORLY.OQ), Red Hat Inc (RHT.N), Starbucks Corporation (SBUX.OQ), Sherwin-Williams Company (The) (SHW.N), Sempra Energy (SRE.N), Union Pacific Corporation (UNP.N), VeriSign Inc (VRSN.OQ), Xilinx Inc (XLNX.OQ), Exxon Mobil Corporation (XOM.N).

The data used in this analysis is a mix of U.S. equities on tick level (trade by trade) aggregated at an hourly frequency, from Jul 2008 till Dec 2018. The TE is analyzed using a model-free KSG (Kraskov) estimator for non-linear continuous data. We use the log-returns of the closing prices, defined by Rₜ = ln(Pₜ) − ln(Pₜ₋ ), where Pₜ is the closing price of the stock at time t and Pₜ₋₁ is the closing price of the same stock at time t-1. In case of non-stationary statistics, the TE formula may be applied over ensembles, which is not applicable to financial time series data since there is only one time record of prices for a given share.

While deciding on the window size for a time-window over which we average the local transfer entropy values, one of the considerations is to keep the size of the window not too large so that the statistics are approximately stationary within the window, but not too small either such that the estimation of TE is unreliable. Furthermore, the window for averaging is kept large enough such that the TE values generated by averaging local TE values over the window are positive.

In this example the maximum time horizon, or the maximum lag window over source and target, is fixed at 10 time steps. The statistical tests use a p-value of 0.005.

Local transfer entropy (changing over time) for the set of assets. Note that local transfer entropy can take on negative as well as positive values (informative flow vs misinformative flow), indicating local dynamics of information transfer.

Averaging the local transfer entropy values, using a cumulative time window starting from 2009 until 2018, can enable us to see the dynamic interactions between the assets alongside the evolution of the financial market, with the information flow between nodes given by average TE.

TE values (averaged over a cumulative moving window) above a threshold of 0.015 are plotted as a directed graph with the values changing over time. The size of the shaded purple circle around the asset name is an indication of total degree (in+out) of the asset. The darker the color of the edges, stronger is the interaction in terms of average TE. The direction of information flow is shown by the direction of arrow.

Across time scale, we see a persistent source -> target information transfer in the direction, SRE.N -> NRG.N and XLNX.OQ -> DIS.N, among others. In case of the first pair of assets, it is interesting to note that both Sempra Energy and NRG belong to the energy sector and additionally, have a common investment management firm associated with them. However, the directed flow between Xylinx and Walt Disney belonging to very different sectors is surprising. We also note from the analysis above that on increasing the size of the time-window over which we average, the mean and maximum of the transfer entropy values shows a steady decreasing trend from 2009 to 2013, after which it stabilizes. Using a moving window of variable length over time gives an insight into the evolving dynamics of the relationships among various assets.

Calculation of multivariate TE is computationally very expensive, especially with increase in the number of assets. However, here we handled this by making use of parallel processing (checkout SCOOP [10]) to distribute the network analysis for various targets, and assign it to workers over a GPU cluster. With the GPU estimator, the number of TE calculations scales as cubic in number of assets in the worst case of a fully connected network, and each calculation scales as O(N²), where N is the number of samples in the time series.

Performing a similar analysis on larger asset universe, could potentially lead to discovering clusters of assets that highly influence each other with changes in market conditions, or similarly others clusters that may be relatively insulated in terms of information flow. This property of the transfer entropy to identify sources and sinks in the information transfer between entities of a multi-asset financial universe can be an important tool for portfolio selection. We will delve into this further in a later article.

About the author

Arun is a research engineer at Neuri Pte Ltd.

References:

  1. Bossomaier, T., Barnett, L., Harré, M., Lizier, J.T., An Introduction to Transfer Entropy: Information Flow in Complex Systems, Springer, 2016
  2. Kwon, O. and Oh, G. (2012), Asymmetric Information Flow between Market Index and Individual Stocks in Several Stock Markets; Europhysics Letters, 97(2), 28007–28007
  3. Joseph T. Lizier, Mikhail Prokopenko, and Albert Y. Zomaya (2008), Local information transfer as a spatiotemporal filter for complex systems, Phys. Rev. E 77, 026110
  4. R. Marschinski, H. Kantz (2002), Analysing the information flow between financial time series, European Physical Journal B 30 (2002) 275–281
  5. Seung Ki Baek, Woo-Sung Jung, Okyu Kwon, Hie-Tae Moon (2005), Transfer Entropy Analysis of the Stock Market., ArXiv.org: physics/0509014v2
  6. L Sandoval (2014), Structure of a global network of financial companies based on transfer entropy, Entropy 16 (8), 4443–4482
  7. https://github.com/jlizier/jidt/blob/master/course/
  8. P. Wollstadt, J. T. Lizier, R. Vicente, C. Finn, M. Martinez-Zarzuela, P. Mediano, L. Novelli, M. Wibral (2018). IDTxl: The Information Dynamics Toolkit xl: a Python package for the efficient analysis of multivariate information dynamics in networks. ArXiv preprint: https://arxiv.org/abs/1807.10459.
  9. Joseph T. Lizier and Mikail Rubinov. Multivariate construction of effective computational networks from observational data. Max Planck Institute: Preprint, Preprint no. 25, 2012
  10. Hold-Geoffroy, Yannick and Gagnon, Olivier and Parizeau, Marc. Once you SCOOP, no need to fork, Proceedings of the 2014 Annual Conference on Extreme Science and Engineering Discovery Environment, Article №60, 2014, ACM.

--

--