Data Analysis on Crypto Events

By Fabio Tiriticco on ALTCOIN MAGAZINE

Fabio Tiriticco
The Dark Side
Published in
10 min readFeb 21, 2019

--

I have been collecting data about events in the crypto world — such as conferences, releases or forks — and price fluctuations of the related cryptocurrency since July 2018, with the purpose of seeking some predictability patterns. This work is the Caterina project. This final article presents some of the conclusions I was able to draw.

TL;DR: There is not enough evidence to isolate reproducible patterns outside of coincidental behavior. I had lots of fun setting up the entire project from both Data and Tech standpoints. Goal setting, event-storming, domain modeling and crafting the architecture. Data analysis was definitely out of my comfort zone and a great learning experience.

OMG. What now?

Agile anyone?

The previous chapter in this project, published in August 2018, was about running my services on the cloud. I then left the machine running and gathering data: crypto events and corresponding price fluctuations. I would check on the tech side of things every now and then, but I only took a deeper look at the data 5 months later, in December 2018. I wish I had done it earlier.

A mismatch between event data and price data had prevented the system from pairing events with price fluctuations in some cases. Coins often have a name and an symbol (like ‘Bitcoin’ and ‘BTC’). Unfortunately those are not exactly standard, and the two external services about events and prices don’t always match. For instance, look at these two pseudo-JSON objects:

I assumed that the symbol would be used everywhere. I quickly realized that I had missed on a lot of data as the rates API wouldn’t return anything corresponding to "MIOTA” 😒.

I first looked at data after 5 months. I wish I had done it earlier.

Fortunately, the majority of events wasn’t affected by this. I fixed and redeployed everything. A few months later, it was finally time to look again and try to extract information from my data set.

What are we looking for?

The main goal of this project is to make safe predictions about price fluctuations based on events-related patterns. Timing-wise, I limited my research at the two days before and after the day of the event. The only two inputs are the currency (BTC, ETH…) and the event category (conference, release..). Ideally, I would be looking for a function f so that

f(coin, event) = p

where p is the price variation over the 96 hours around the event. The time interval has been chosen to try and limit the scope of our measurements to the immediate effect of the happening event, just the “before and after”.

In reality, the outcome is more of a confidence interval. Something like, “given an event about this coin, the price variation will be in this range with this confidence”.

f(coin, event) = (price range, confidence)

Down the rabbit hole

A coarse starting point for this could be checking the mean value of prices before and after the event. For example, if on the left we have the price variation the mean values are indicated by the straight lines on the right.

For a real image, this is the fluctuation of the æternity coin price on the days before and after a release event. Mean values before and after are shown. The mean of the after-event prices is lower than for the before-event values.

We might look at the variation of the mean values over all the events we collected. How do mean values behave for the entirety of the events? The following image tells us something.

It looks like we might have spotted a positive trend: the mean of the values after the event is 75% of times higher than before. Let’s limit our scope and look at the mean variation for a couple of specific events (Conferences and Exchange listings) and to the coins that boost the largest market caps (Bitcoin and Ethereum).

The charts above shows that even if the size of the negative chunk varies substantially, the mean price before and after the event time goes up in most cases — we’re quite confident about it now.

The mean price of the coin goes up around the event time in most cases.

Would this be enough information for us to just throw our money at an asset right before an event? Probably not.

Variation we can believe in

Even though we know that the average price evaluated over 48 hours during and after the event went up from the previous 48 hours, the mean value doesn’t give us any information as to when then price is up or down.

Variance (V) and Standard Deviation (STD) are two measures that can be extracted from a set of values to extract an indication of how much these values are different from one another. STD is simply the square root of V. Check out this article for a visual explanation of the two.
In reality I wasn’t satisfied with these indicators because they are independent of the original magnitude of data. For instance,

To solve this issue, we can derive another method, as the percentage of STD with respect to the mean of the original data set. Let’s call it PSM:

PSM = Standard Deviation * 100 / Mean

I find the PSM more insightful than Variance or STD as the PSM reflects more adequately the variation with respect to the mean value:

The following histogram shows the distribution of the PSM for all sections — before and after events, for all events and coins (about 3300 in total).

PSM values seem quite contained overall, less than 3% in the majority of cases. So far we have little variation, probable average value increase.
Are we ready to throw our money at the crypto market?

What am I risking exactly?

There is an important factor that we haven’t taken into account yet: what does it mean to buy and sell? Let’s start with the simple case: we might buy an asset once before the event and sell it in its entirety the day after the event. If we perform this strategy of “buy once, sell once”, we need to consider the fluctuations within our buying and selling windows.

In the most basic form, profit (or losses) are measured from the moment we acquire an asset to the moment we sell. The best and worst scenario are depicted in the image below:

We should aim for the left image, and avoid the right image like the plague, but alas, these fluctuations are downright unpredictable. Maybe we can resort to data again to assess the average safety of such an operation — we might even reveal some pattern.

For each event, we can check the best and worst possible buy/sell combinations. Imagine that for an event, the worst case would be a loss of 20EUR, and the best case would be a gain of 40EUR, for a total spread of ABS(-20) + 40 = 60EUR . Another parameter we can look at is what I called Positiveness, which tells us how much of the spread is in the positive.
The spread’s lower bound is zero while positiveness can vary from zero (meaning we’ll have a loss no matter what) and 100 (we’ll have a profit in any case).

It looks like these parameters could give us some tangible insight in our quest for reproducible patterns. Let me just evaluate spread and positiveness for each event and check their distribution.

The spread chart doesn’t add much to the picture, but the positiveness chart is quite interesting. It shows a quasi-normal distribution with the most frequent values occurring around 50%. Maybe we can venture towards a first data-driven conclusion: it seems that events related to cryptocurrencies don’t affect their price all that much, or at least not for the majority of the combinations [coin + event].

It seems that events related to cryptocurrencies don’t affect their price all that much.

At the same time, we see that a small desirable group does exist: a sizeable chunk of events display a positiveness greater than 90%.
Can we find a revealing pattern that reliably links events to such positiveness?

Data is never too much

The previous histograms were made over the entire data set, with no filters applied to coin or event categories. In order to have a vague feeling of reliability, I tried only selecting events which satisfy the following properties:

  • positiveness > 85%
  • event category appearing at least 5 times in the entire data set
  • related coin appearing at least 5 times in the entire data set

Which narrowed down the event list to a whopping … 32 items.
It might still be worth an experiment, but these numbers are too tiny to support evidence to any kind of reliable pattern. Moreover, the few coins left are cryptocurrencies with minuscule value in Euros, a huge spread and minimal market cap (as in, very little trade), which reinforces the feeling that those events be just coincidental.

Conclusions

The main issue in this project is the very, very slow growth rate of the data set: in about 6 months, I only collected ~2000 events, spread over ~500 different coins and ~20 event categories. Definitely not a lot of data. There is no reason to expect an acceleration in data growth anytime soon. I will leave the services running and maybe give it another look in a few more months.

In the previous section, I reasoned about spread and positiveness in terms of ‘best’ and ‘worst’ combinations. We could refine this thinking and further research some statistics to instead consider the probability of a “70% chance of profit”, or “limiting the chance of loss below 15%”.
These approaches sound promising but, again, I feel that they require a lot more data than what I could gather.

Learnings

Data-based decisions are the result of very complex processes, incredibly interesting and very fascinating.

  • Data analysis is HARD. If being deeply familiar with the statistical theory is paramount, mastering analytics and visualization tools is equally important. If both tool sets at your disposal are poor (like mine) they won’t get you far.
  • Even with such basic experience secured, the translation of data into information, then into knowledge and finally into wisdom, is a creative art.
  • Data analysis takes TIME! I used to work as a researcher just out of college, and admittedly I had forgotten just how long it takes to format the charts right, to clean data from noise and all that.
  • Data is never too much. In the beginning, I decided to only collect some things (coin price and event category) because I wanted to keep things simple. I regretted this later, during data analysis. I wish I had more data to try and find more correlations. Even with a view of never using it, the cost of collection would have been very low anyway.
  • On the tech side, the processes and tools I used— from event-first domain driven design to the entire implementation cycle — proved to be great choices. I used Spark a little at the end but didn’t get to the ML part and that’s a pity.

Project conclusion

This post officially concludes my Caterina side-project. I wish I was able to run all my side-projects to a conclusion, any conclusion! 😄
I am really happy with what I have learned along the way. I have done things ‘the proper way’ and this is a luxury that often only happens in your own little world. On to the next side project!

Tools used:

  • Event-first DDD / Event Storming
  • Scala + Akka
  • Docker + Kubernetes
  • Spark + Zeppelin + Google Sheets

Follow us on Twitter, InvestFeed, Facebook, Instagram, LinkedIn, and join our Discord and Telegram.

Read about our upcoming Altcoin Magazine Mastermind Event here.

--

--

Fabio Tiriticco
The Dark Side

Tech Lead & Community guy. Looking for a DevRel role!