Ethereum Address Ranking

cyber•Drop

Published in

cyber•Drop

7 min readNov 11, 2019

Anatoli Belchikov, Serge Nedashkovsky, Dima Starodubcev

For us, ranking addresses by significance in the Ethereum ecosystem has become an interesting and fascinating experiment. By the time our customer’s network was launched, it became clear that the method we had chosen was far from perfect, and a different distribution method was chosen.

Task description

We were tasked with identifying “influencers” of the Ethereum network. “Influencers” are addresses that most often interact with a group of tightly coupled addresses. In mathematical terms: addresses associated with the largest number of addresses in their own community (subgraph with dense internal connections and sparser connections between other subgraphs).

The address ranking algorithm for identifying “influencers” should have the following properties:

Speed
Ranks of vertices need to be calculated for a large number of them, at the time of 01.03.2019 in the Ethereum network there were about 931M transactions and more than 66M addresses. Such amounts of data limit the use of many graph algorithms.
Repeatability
When the algorithm is executed for the same data, the same result will be obtained.
Predictability
Based on ranks, it can be assumed that there are connections between vertices which connections were not known.

SpringRank was proposed as such an algorithm because it has all of the above properties. This makes SpringRank an interesting tool for finding “influencers” on the Ethereum network, as will be described later.

SpringRank algorithm is described in Appendix 1.

Search for “influencers”

The results of ranking by the SpringRank algorithm have two properties:

Between vertices with similar ranks are likely to have greater number of connections.
It is more likely to have connections in the pair coming from the vertex with a higher rank.

Because “influencers” addresses have the highest number of transactions, their rank will be close to the ranks of other related addresses. Our method of searching for “influencers” involves searching for clusters of ranks and selecting vertices that are in the centers of these clusters.

The method of density clustering adapted to a significant number of clusters (more than 100K) was used to search for clusters of addresses. From each cluster, a group of addresses was selected which ranks are as close as possible to each other, and its size is proportional to the number of addresses of the corresponding cluster.

For example, the process of processing the address 0xab5801a7d398351b8be11c439e05c5b3259aec9b, allegedly owned by Vitaly Buterin, is shown. The figure below shows this address and its corresponding cluster.

Below is a graph of transactions among some addresses whose ranks are located in the center of this cluster. You can see that addresses often interact with each other, i.e. form a community. You can also see that the address in question sends and receives transactions from other addresses, being the “influencer” in this community.

Community leaders, which are dominated by outbound transactions, are the addresses sought. This condition was chosen because clusters with a predominance of outgoing transactions have a higher rank than clusters with a predominance of incoming transactions.

The dependence of the share of outgoing transactions on the rank of the vertex is shown in the figure below. Similarly, we can imagine the dependence of the share of outgoing transactions on the rank of the cluster center.

The share of distributed reputation for the address was calculated by the formula:

Where r — the rank of the vertex,

maximum and minimum ranks among all addresses.

Implementation

We decided that the final list should contain no more than 1M addresses of “influencers”.

From 66M Ethereum addresses, we selected addresses that meet the conditions:

Non-zero volume of ETH sent during the last year;
Balance more than 0.001 ETH;
It is a wallet.
We made the assumption that only wallet owners will be able to take advantage of this reputation.

We got a list of 4M addresses.

Next, the ranks were calculated by the SpringRank algorithm for all Ethereum addresses. The existing implementation was accelerated on the GPU using the AMGX library, which allowed us to calculate the ranks for all Ethereum addresses in less than 20 minutes of the algorithm. The implementation can be found in the repositories:

A GPU implementation of the KDE algorithm was used to find rank clusters. The density estimation results were interpolated to make it easier to find local minima and local maxima by sliding window. Implemented in this way clustering turned out to be orders of magnitude faster than the existing algorithm MeanShift, which allowed to find hundreds of thousands of clusters in an acceptable time. Implementations of KDE and clustering algorithms can be found in repositories:

github.com/dchao34/bbrcit_kde — KDE for CUDA;
github.com/belya/ethereum-rank-airdrop/blob/master/ranks-filtering-process.ipynb — KDE-based clustering.

Distribution of addresses by significance

Among the addresses, the first 20k addresses received more than 80% of the distributed significance.

The table below shows the top 10 addresses in the Ethereum network by distributed significance.

Possible tasks

What can be done to rank addresses by significance.

Other methods using SpringRank:

Strengthening the predictive ability of the SpringRank method;
Implementation of regularization (L2) to obtain a more stable result;
Implementation of hybrid algorithms for identifying influencers (SpringRank & Graph analysis algorithms).

Address ranking based on identifying behavior patterns:

investor;
developer;
experimenter;
financial center;
trader;
stock exchange;
and others.

Measurement of activity addresses:

creation and calls of smart contracts;
total investment;
turnover in ETH and ERC20 tokens;
activity when receiving airdrop;
and others.

Appendix 1. Description of SpringRank algorithm

The SpringRank algorithm solves the problem of ranking the vertices of oriented multigraph. The algorithm assumes finding of ranks through energy minimization for the system of springs with identical length. Each spring corresponds to an arc and is stretched between points whose only coordinate corresponds to the rank of the beginning and end of the arc. The spring coefficients are proportional to the arc weights between the respective vertices. The description above can be represented by the total energy of all springs. :

The minima of this expression are at the points where the derivative is zero, i.e. they are solutions of the following system of equations:

Where

- the diagonal matrices of the in- and outdegree of the vertices, respectively, 1 — a vector that consists of ones, and A — the adjacency matrix of the graph.

It is worth noting that, based on the definition, the matrix

has the rank

(the inequality is strict if the graph is not connected), and there is a line on which all solutions of the system of equations lie (1). The article proposes to choose the solution in which one of the ranks

Thus, the algorithm has one local minimum.

It should be noted that the generative model described in the article about Spring Rank allows us to use rank values to calculate the number and direction of arcs between vertices by formulas:

I.e. the number of arcs between vertices

increases as their rank difference decreases, and the number of arcs from a vertex with a lower rank is less than the number of arcs in the opposite direction (these values are equal to

and