Is cryptoasset categorisation market driven ?

Amir Elayyan
Protos Asset Management
7 min readNov 5, 2018

--

To better understand coin correlations, we developed a clustering analysis on over 100 cryptocurrencies and determined clusters of crypto assets that move in tandem. The motivation behind this study was to see whether cryptocurrencies are independent assets with unique movements based on their value proposition, or are they correlated around certain market metrics between themselves. We found more than one cluster, which implies that crypto assets do not exclusively follow Bitcoin’s movement. The result of the top level clustering also revealed that comparable architecture and token economics do not explain comparable market movement.

Ever since cryptocurrencies began finding their way to market, the world has asked: how do we place a value on them? A clear and unbiased benchmark — clustering — could be used to answer that question in evaluating new decentralized projects in the crypto economy. Clustering commonly occurs around crypto token type, such as platform tokens, utility tokens, brand tokens, and security tokens. A common form of clusters are the following:

OnChainFX Categorization

Yet these are not the only clusters that may appear. Looking closer, clustering shows which cryptocurrencies move in tandem at the top of the market cap, and it is worth investigating what similarities in movement might tell us.

Since clusters seem to be used to formulate trading strategies, it is important to understand if fundamental similarities, grouped into clusters, are backed by market metrics. Identifying and understanding the relationships among assets is critical to devising a successful value strategy in the cryptocurrency space and could help make the cryptocurrency space far more viable for investors. Examining natural clusters of coins moving in tandem states can clarify their relationships and serve as a base for strategy.

Developing a clustering method

Time series clustering can be considered as finding a function:

Figure1: tsfresh-based clustering visualisation as a graph

where T is timeline length and K is a particular cluster. This should be conducted with representation of time series as a set of selected features of fixed size D independent of T.

Each graph’s node represents a coin. When two coins belong to one cluster, they are connected with the edge. One can apply standard clustering algorithms with this representation, and for the purpose of this study, we identified multiple time series describing each coin and constructed derivative parameters to define these series.
Next, we devised a method of moving from simple to complex in terms of identifying clusters:
We used common, standard features for each series (parameter): Means, Medians, Standard deviations, Skewness, and Kurtosis.
We used tsfresh library to automate the process of features extraction.
We applied both approaches to series fragmented by state of BTC.

Spatial Clustering of Applications with Noise (DBSCAN)

The purpose of this study was to identify clusters without applying presupposed views of what they should look like. DBSCAN, a clustering algorithm viewing clusters simply as areas of high density separated by areas of low density, allowed us to accomplish our goal. Not only can DBSCAN be applied quickly to facilitate trading and scale easily, the simple, if generic, function allows clusters found by DBSCAN to take any shape. This is opposed to methods such as the k-means method, which assumes that clusters are only convex . Choosing DBSCAN has allowed our study data obtained to be quite accurate.

The central component of the DBSCAN is the concept of core samples, samples that occur in areas of high density. A cluster is therefore a set of core samples, each close to one another (measured by a distance measure), and a set of non-core samples that are close to a core sample (but are not themselves core samples). There are two parameters to the algorithm, min_samples and eps, which define formally what is meant by density. Higher min_samples or lower eps indicate higher density necessary to form a cluster.

An additional advantage of using DBSCAN is our ability to calculate an estimated number of clusters that it permits. Using DBSCAN, top-level clusters could be obtained using data across all given coins. When the Extractor function of basic features was applied to each large cluster identified, clustering then became possible inside top-level clusters.

This quite basic method of features extraction for the development of coin profiles can be scaled: for example, features can be extracted for different periods of time, forming wider sets of features for each coin. Alternatively, a measure of similarity can be found for different periods and compiled in unified metrics across all periods. In short, clustering can be conducted across multiple variables as inputs. The next step in the process of determining the best clustering results was to perform clustering relying on extracted features and additionally to use tsfresh library as an alternative approach.

Clustering Inside Top-Level Clusters

To accomplish clustering inside top-level clusters identified using DBSCAN, Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) was used to perform the DBSCAN over varying epsilon values and to integrate the results to find a clustering providing the best stability over epsilon. HDBSCAN allowed us to find clusters of varying densities (unlike DBSCAN), and to be more robust in terms of parameter selection.
The limitations of this approach are that it is highly general. It does not detect mutual dependencies, following trends or other complex arrangements. Essentially, it is clustering by basic time series characteristics. Top-level clustering is a rather necessary measure, and the list of features describing each time series is short; however, it can easily be extended with additional features.
As an extension of the DBSCAN method, HDBSCAN groups together points that are closely packed together (points with many nearby neighbors), marking as outliers points that lie alone in low-density regions (whose nearest neighbors are too far away). This allows “unclustered coins” to appear: they are outliers categorized under the -1 label. This group consists of two types of coins:

  1. Coins about which a large amount of data has not been collected
  2. Actual outliers: each unclustered coin should be considered special, including the crypto-headliners with the largest market caps which appeared there. By several different sets of parameters such as mean, median, deviation, kurtosis, skewness, these headliners are actual outliers.

Market States and Clustering

To ascertain market states, it is necessary to obtain periods of different market states in accordance with bullish, bearish or stable BTC trends. To perform clustering with the basic features extractor, it was first necessary to perform top-level clustering, separating coins into groups corresponding with the presence of data.

The result of the top level clustering revealed three separate market states with corresponding time periods, necessitating the creation of three sets of clusters in accordance with states. The combination of using algorithm and passed parameters indeed allowed us to identify one brief bearish period (early 2018 following the steep fall in BTC).

Graphic representation of bull, bear and stable Bitcoin trend periods

Clustering also appeared in bullish periods, with more clusters appearing in these periods overall. In stable periods (sideways trading), clustering also appeared. This was significant for the following reasons:
Several stable periods were detected on the BTC timeline. This part of the sample, therefore, was by far the most representative.
In contrast to bullish and bearish BTC states, the stable or sideways trading state can be postulated intuitively. The terms bullish and bearish are blurred, and there are no strong definitions of them because they are not really understood as such until years later, looking back — but everyone can see when BTC is in a flat or sideways trading phase. In this context, the stable trading market offers the most predictive value. Suggesting that stable trading market offers the most predictive value is based on following observation: while market (and especially BTC price) is stable, people are ready to invest in altcoins. Previous observations also reveal that certain altcoins have good correlation with BTC stable state. And vice versa: periods of strongly marked BTC growth and drops often lead to bad predictive value.

As stated before, there are several top-level clusters that were built in accordance to data presence. Three different clusters were obtained in frames of each big cluster with our own features extraction algorithm: bullish, bearish and stable periods. So we have 3 different clusters made for each big cluster. These illustrations aim is to show which coins are connected the most. As we can see above, some coins may belong to different clusters using one approach but to another cluster as well in another approach.

Conclusion

  • Fundamentals do not appear to explain crypto price dynamics and according clusters
  • Similarities in the architecture of certain coins cannot conclusively imply contrasting market behavior between groups of clusters.
  • Coins’ actual use differs quite significantly from the advertised use of most projects.

What we’ve discovered is that, at this point in time, coins with comparable software architecture and token economics, were not proven to have comparable market movement.

Coins are truly investment vehicles that are quite largely speculative , and it will be fascinating to look back at these results in the future and see if the market behavior has changed into becoming more categorized as presented through the fundamental architecture analysis.

Entirely new metrics might have to be considered in order to understand more accurately the cryptocurrency market behavior.

For additional details regarding this analysis, please visit GitHub

--

--