Using Information Theory to uncover nonlinear relationships between cryptocurrencies and sentiment

7 min readNov 26, 2021

Reference: Keskin Z, Aste T. 2020, Information-theoretic measures for nonlinear causality detection: application to social media sentiment and cryptocurrency prices. R. Soc. Open Sci. 7: 200863, https://arxiv.org/abs/1906.05740.

Causality due to sentiment information propagation

In financial time-series analysis, we can think about relationships as being based on a certain network structure and dynamics within this network. There is a multitude of linear and nonlinear methodologies to uncover these relationships: regressions, information theory, dynamic systems, etc. Today we will look at a point of contact between regression-based and information-theoretic approaches to linear causality, and one potential extension to a nonlinear case.

Causality in price discovery is both a fascinating topic from the academic point of view for understanding information propagation, as well as from a financial practitioner’s perspective for risk and portfolio management.

In the broader literature on financial price discovery, the causality between sentiment and returns is widely studied and of interest to those trading and investing based on market dynamics. In this week’s paper review, we narrow this down and look at these market dynamics in information-theoretic terms. If you want a Primer on the topic, check out this post on the Thermodynamics of Financial Markets, as well as this other one on Transfer Entropy in Financial Assets.

Transfer Entropy as a causality measure

In the reviewed paper, the concept of Transfer Entropy (TE) is used to study the flow of information in digital assets markets. There’s a look at both a parametric causality view on the linear information transfer, as well as an information-theoretic non-parametric version for non-linear causality. There are two questions this paper answers: does the knowledge of sentiment reduces uncertainty in explaining returns? And does taking into account non-linearity have any impact? To explain what non-linearity means, I’ll use a very clear description from the authors: “If the response in the dependent series scales as a linear multiple of the driving signal, this relationship is described as a linear coupling. If instead, the response follows some other function of the signal, the relationship is nonlinear.” Why are we interested in linear vs non-linear? Because if we assume linear dynamics in price series, then they are easier to model, but we limit ourselves to a more restricted class of market dynamic behavior (i.e. we get a much more limited picture).

But what exactly is TE? Through the conditional mutual information, TE measures the extent to which a driving variable and its past values reduce uncertainty in a dependent variable. It’s a model-free statistic, which is great especially when we don’t know what the variable distribution looks like. The reason to use TE is that the authors want to uncover the bi-directional effects of cryptocurrency sentiment on returns. Note that in the whole paper “causality” means “statistical causality” in the Granger Causality (GC) sense, i.e. we are answering the question of whether a certain driving time-series and its lagged values are useful in forecasting another, dependent, time-series, and not whether one is causing the other.

In other terms, we are looking for a mutual directional causation between two or more coupled variables.

For the linear parametric case, the authors use a VAR (Vector Auto-Regressive) model. Note that there are non-parametric and non-linear versions of GC, and as a further research direction, it would be interesting to see a comparison between those and the non-parametric information-theoretic measures presented here. Another interesting extension of the VAR model, still parametric, is VECM, which helps to separate short- and long-term dynamics of the price transmission mechanisms, and which can be either linear or non-linear based on the linearity of the error correction term. Again, the paper is a great source of information and can lead to lots of interesting additional questions.

Connecting variance and uncertainty

In the linear GC case, causality is detected by modeling the autoregressive data generating process for the dependent variable with and without the driving variable. If the residuals between these two models are statistically different, then there’s a linear statistical causality between the two variables. Earlier I’ve mentioned that we will talk about a point of contact between parametric regression-based and the information-theoretic non-parametric approaches to linear causality:

In the case of variables following a Gaussian distribution, TE and GC are actually equivalent, and both reduce to the logarithmic function of the variance.

Next, the authors extend the causality to the non-linear case by considering uncertainty rather than just variability. In Finance we usually conflate these two terms, because, as mentioned, they can be represented by the same measure in the case of the Gaussian distribution. So what’s the difference between the two? The variability is quantified by a distribution of frequencies of multiple instances of a certain quantity (think about standard deviation). In simple terms, this means there are multiple instances of some quantity and these instances can take different values depending on the conditions these variables are observed at. Meanwhile, uncertainty is represented by a variable’s entropy, a probability distribution dependent on the information we have about the likelihood of the true uncertainty. This is the non-parametric case, where we don’t have the true variance. The authors represent the nonlinear TE through a non-parametric estimation of the probability density function and I’ll leave the details to the paper linked at the top. In brief, the authors use a multidimensional histogram approach to estimate the density functions. They also highlight drawbacks due to the bias-variance trade-off based on the bin size: too fine of a partition with a limited sample size introduces bias, as more fine-grained data represents more information and thus higher entropy values. To mitigate this problem, the authors apply a quantization method. Next, as TE in absolute terms has limited interpretability for comparisons, the authors analyze Effective TE (ETE), i.e. TE value normalized by obtaining an average TE calculated on shuffled time-series. This data shuffling eliminates causality in the statistical (Granger) sense, as it doesn’t consider contemporaneous values of the driving and the dependent variables.

To validate their technique, the authors use synthetic data, showing that both the linear and nonlinear information-theoretic approaches are able to detect synthetic causal relationships.

Data and code

The price data is a collection of hourly prices for four cryptocurrencies (BTC, ETH, XRP, LTC) from August 2014 to September 2018 (source). The data is then aggregated in rollings windows of 24 months, rolled by 1 month at a time. Sentiment data is extracted with NLP on tweets aggregated over the hour preceding the price sampling and mapped to the cryptos through certain tags. And the code (thank you, authors!): https://github.com/ZacKeskin/PyCausality

Results

The authors detect significant information transfer from sentiment to price in XRP and LTC, while from price to sentiment in BTC and ETH. And that by itself is already a very interesting result and helps us to get an understanding of where the sentiment is the driving force. There’s also evidence that non-linear causality is stronger than a linear one. Why is that? A simple answer would be that returns series are not Gaussian, so the equivalence between Granger (prediction based) and information-theoretic (divergence based) causalities falls apart. But as we’ve seen in our previous paper review (on an overlapping dataset), it seems that crypto returns are more Gaussian than traditional asset returns. So what goes with that result? I don’t know, but I’ll share our findings once we get to it. It would have helped to have some distributional analysis of the used dataset (throwing in also some stationarity considerations). Feel free to send me a note if you have a hunch and want to explore this together.

Going a bit deeper into the results, for BTC there seems to be a strong causality between sentiment and returns in both directions, with net information transfer greater from returns to sentiment. For LTC, there’s a similar result in terms of causality strength but the opposite in terms of direction (i.e. sentiment seems to drive LTC returns). LTC has also the strongest among all of the other cryptocurrency pairs analyzed here, and it has a significant linear component in the information transfer. XRP is more geared towards sentiment statistically causing returns. This sentiment-to-price and price-to-sentiment impact grow in time, becoming much stronger in the last part of the dataset (2018), especially for LTC. Interestingly, for ETH the opposite is true, with the causality significance dropping in the latter part of the dataset. Why? The question remains unanswered, but it might show that ETH is not just a BTC proxy and that it is driven by its own factors that seem to be less driven by sentiment.

For practitioners, the authors don’t offer an estimate of the causality strength, suggesting though that the TE significance levels might be used to quantify the strength as a proxy. Note also that the discovered causality relationships are dynamic and change throughout time, showing that it is necessary to monitor them if using such measures for trading and portfolio selection.

What’s next?

The paper shows some interesting results and leads to lots of questions. On our side, we are interested to look at the mutual information, entropy, and transfer entropy measures to assess causality in risk and volatility, and how to use it for portfolio construction.

Further research questions inspired by this paper:

We would also like to see a comparison of this paper’s results with the non-parametric and non-linear versions of GC, which would be an interesting MSc thesis topic.
Another question we’d like to look into is how we can apply the same causality considerations to a further decomposition of returns into risk factors.

Disclaimer

Not a financial advice, solicitation, or sale of any investment product. The information provided to you is for illustrative purposes and is not binding on Cloudwall Capital. This does not constitute financial advice or form any recommendation, or solicitation to purchase any financial product. The information should not be relied upon as a replacement from your financial advisor. You should seek advice from your independent financial advisor at all times. We do not assume any fiduciary responsibility or liability for any consequences financial or otherwise arising from the reliance on such information.

You may view this for information purposes only. Copy, distribution, or reproduction of all or any portion of this article without explicit written consent from Cloudwal is not allowed.