Empirical Velocity Estimates and Artificial Volumes in Ethereum

Published in

Logos Network

12 min readFeb 26, 2019

Eliminating self-churn from Ethereum transactions.

In a previous article, we discussed token velocity and its impact on token valuation. In particular, I argued qualitatively that there are many costs associated with swapping tokens that make it unrealistic to expect velocities above 100. This can be quantified with a demand-for-money model such as Baumol-Tobin, but ultimately we end up relying on several murky assumptions. Furthermore, there are key considerations around the correlation or dependence between velocity and token GDP that complicates an analytical approach.

A more revealing, if less intellectually satisfying, approach is to empirically observe velocities of different assets. The easiest point of comparison is the US dollar, for which velocity data is readily available. Recently, M1 USD velocity has hovered around 5.5. Since the Federal Reserve has pretty good insight into USD transactions, these estimates are likely accurate.

Similarly, we can naively estimate cryptoasset velocity in a relatively straightforward manner. It is easy to observe total Bitcoin transaction volumes and total Bitcoin supply and take their ratio, arriving at a velocity of around 2.1. However, there are a few issues with this calculation. For example, large numbers of Bitcoins are “hodl’d”, meaning they are effectively removed from circulation. This tends to artificially depress velocity. On the other hand, many Bitcoin transactions are not real transactions where one person is sending value to another person (more on this in a bit). This tends to artificially increase naive velocity.

Can we do better? Yes! Many of the issues arising from using Bitcoin as a reference arise because it is quite a poor payment rail (for target transaction use cases) right now — fees are high, finality times are slow, the currency is volatile, and so on. Luckily, we now have many better assets to reference — stablecoins. Since there is no possibility of their price increasing, there are no rational “hodl’ers” to account for.¹ Furthermore, they are more directly akin to true money, where holders are plausibly using it in real transactions.

To eliminate any further noise, it makes sense to set aside Tether — recognizing that there is some controversy around some of Tether’s funds, without commenting on the validity of these claims — and algorithmic stablecoins like DAI — where there are complicating factors around creation, redemption, and loans. We are then left with the audited full reserve stablecoins. Now that they all have existed for a few months, we have good, but still limited, data for USDC, GUSD, PAX, and TUSD.

But there are still issues with the artificial transactions. If a user sends money between 2 addresses he owns, those transactions shouldn’t be counted in our volume numbers. Before diving into the data, then, it is worth first clearly defining our goal for calculating velocity.

The Goal: Real Economic Transactions

What do we want in our estimate of velocity? Most superficially, we are looking for the value that makes the basic M*V = P*Q true.² At a deeper level, P*Q represents the real economic transactions that are denominated in a money asset, such as the stablecoin. Regular users of the money asset will hold on to some inventory, and a marginal user that wishes to acquire the asset to transact will increase the value of M (whether through additional token issuance or value appreciation). Given that we can observe M, what we want is an accurate estimate of P*Q — the bona fide economic transaction volume on the network.

The Problem: Self Churn and Artificial Volume

As mentioned previously, the naive analysis is complicated by the fact that we know that many of the transactions observable on a blockchain are not real economic transactions.

The most common example of a transaction that we want to exclude is what has been dubbed “self churn”. In a UTXO network like Bitcoin, transactions have a change output that represents funds returned to the sender. Similarly, users will often consolidate or sweep their outstanding UTXOs into a single UTXO to clean up their wallet. Obviously, none of these transactions should be accurately counted as part of P*Q.

Self churn was perhaps first formalized by Meiklejohn et al. in 2013. Subsequently, it has become common for analysts to use heuristics to remove self churn from the system. For example, BlockSci (section 3.4) proposes to eliminate (1) outputs controlled by an address linked to an input address (eliminates change transactions) and (2) outputs that are spent within k blocks. Similar heuristics have been used by Blockchain.com and Coinmetrics.

Note that the heuristics are just that, and they are subject to false positives and false negatives; nevertheless, they are very useful in deriving order-of-magnitude estimates for transaction volume and velocity. BlockSci, for example, found that the naive method overstates BTC velocity by a factor of 4!

There is an issue when trying to apply these same heuristics to our selected stablecoins — the heuristics only work for a UTXO token, but the stablecoins are all ERC-20 tokens running on Ethereum, which uses an account system. This means there is no way to trace individual coins through the system, and we can’t know if a particular send/receive within k blocks used the same tokens.

There has been very little work on how to identify self churn and other artificial volume in Ethereum or other account-based tokens. Coinmetrics proposed a “virtual UTXO” mapping, but this is not sufficiently deterministic to apply the UTXO heuristics. Coinmetrics also published a case study on eliminating transactions attributable to the mixer that dominated transactions in 2017, but none of the stablecoins we are interested in were active during that period, and, besides, it does not actually eliminate many other sources of artificial transactions.

The goal of this article is to lay out a methodology for identifying the artificial volumes in ERC-20 stablecoins that can give us better estimates of P*Q and V.

The Data: BigQuery

All transaction data for decentralized networks can be downloaded from a full node, but that can take days (and is expensive) to download. A better alternative is to use Google BigQuery, which contains SQL-ized Ethereum data updated on a daily basis.

Here’s an example of how you can scrape all PAX transactions:

SELECTDATE(block_timestamp) as TxDate,token_transfers.*FROM`bigquery-public-data.ethereum_blockchain.token_transfers` AS token_transfersWHEREtoken_transfers.token_address = ‘0x8e870d67f660d95d5be530380d0ec0bd388289e1’ORDER BYblock_number# GUSD: 0x056fd409e1d7a124bd7017459dfea2f387b6d5cd# PAX: 0x8e870d67f660d95d5be530380d0ec0bd388289e1# TUSD: 0x8dd5fbce2f6a956c3022ba3663759011dd51e73e# USDC: 0xa0b86991c6218b36c1d19d4a2e9eb0ce3606eb48

You can find the token addresses on Etherscan.

From there, it’s easy to download the data into a CSV or pull it directly into your preferred data analysis environment (I like Python).

Identifying Artificial Volume In Ethereum Stablecoins

While none of the UTXO heuristics work directly with Ethereum’s account system, we can adapt several of them to filter out many of the artificial transactions. Furthermore, we can dig into the specific stablecoin ecosystem to identify some additional sources of inflated volume.

I have uploaded a Python script to my Github that shows how these filters can be applied in practice. It is also handy for exploring stablecoins data generally. There are many cool insights that we may write about later, including inventory churn, transaction sizes, daily active users, and more.

Filter 1: Remove self-sends

There are a small number of transactions whose sending and receiving addresses are the same. By definition, no funds have changed hands, so we cut these transactions out.

Filter 2: Remove Minting and Burning Activity

All the stablecoins of interest have minting and burning features. When a user deposits fiat at the stablecoin issuer, an equivalent number of stablecoins are minted and delivered to the user. Similarly, when a user sends stablecoins to the issuer, the stablecoins are burned and an equivalent value of fiat is credited to their account.

This minting and burning activity appears as a real transaction in the blockchain, but it should not be counted for our purposes. In reality, it is just an exchange of one form of money for the other, but ownership never changes. As an extreme example, imagine a user mints and then immediately burns $1mm repeatedly 1000 times. That apparent $2bn in volume (both minting and burning are counted) is obviously not related to P*Q.

We can easily identify minting and burning transactions by those that are sent from or to the root address (0x0000000000000000000000000000000000000000), respectively.

Filter 3: Identifying Intermediate Mints and Burns

Stablecoin issuers typically keep a “working capital” supply of stablecoins to make creation and redemption smoother. Thus, the lifecycle of a new coin involves minting by the token issuer to an intermediate address controlled by the issuer, and then a send to the user. As a result, the direct minting and burning filter only captures half the artificial volumes. While the token issuer effectively “finances” the stablecoins initially, it still is just a user switching from one form of USD to another rather than any real economic transactions.

Minting and burning addresses for various stablecoins

As shown in the table, the number of unique minting and burning addresses for all four stablecoins is minuscule compared to the total number of unique addresses involved in all transactions.

We can remove this artificial volume by removing any transactions sent from the intermediate minting addresses and any sent to the intermediate burning addresses.

Filter 4: Exclude Known Exchanges

Deposits into and withdrawals from exchanges cause multiple sources of artificial volume that do not reflect actual economic transactions.

First, a send into or withdrawal from an exchange is just moving money from one account to another controlled by the same user. While the stablecoin is then typically used to trade against other tokens, none of these transactions reflect those we are interested in — payment for a real good or service. Similarly, we don’t count brokerage account deposits or stock trading volumes in estimates of the USD M1 velocity or US GDP. There is some potential for false positives when people use the exchange as their liquid transactional wallet, but on a volume weighted basis these should be small compared to normal exchange usage.

Second, exchanges typically generate unique addresses for a particular user to deposit into (to make it easy for the exchange to track user activity), and then they are immediately sent to a consolidated, commingled account. See, for example, these deposits of USDC into Huobi. These clearly are spurious transactions for the purposes of calculating P*Q.

Third, there has been an idiosyncratic arbitrage opportunity related to how exchanges handle stablecoins and competitive practices of stablecoin issuers that has led to additional artificial volumes, particularly for GUSD and PAX. You can find a summary of these issues here and here.

Given these issues, we can justify eliminating all transactions involving known exchange addresses. Since exchanges generate unique addresses for specific users, it’s difficult to identify all addresses, but some of the later heuristics will help. Even some of the commingled addresses are not correctly identified. We can conclude that our estimates of volumes, therefore are generous, and the resulting estimates of velocity are conservative (biased high).

TokenAnalyst gives some background on identifying exchange addresses, but I decided to rely on Etherscan’s exchange mapping.

Filter 5: Remove Temporary Addresses

The last heuristic aims to replicate the standard UTXO assumption of removing temporary UTXOs.

We do so by finding all addresses that currently have a balance of 0 and the time difference between its first receive and its last send is less than k blocks. Here k must be larger than in Bitcoin due to the faster blocktime. I choose 60 blocks, or about 15 minutes, which is a standard confirmation wait time in Ethereum.

This heuristic will have more false negatives than its UTXO equivalent due to the structure of addresses. For example, an exchange deposit address (unique to the user) that is used twice, one day apart, and whose balance is immediately swept into a commingled address will not be flagged. Nevertheless, it is difficult to distinguish between these false negatives and legitimate addresses. Further work could explore matching up IN and OUT transactions and percentage of life at zero balance.

Result: Adjusted Stablecoin Velocity Estimates

The charts below show the rolling 30 day velocity estimates (in annualized terms) estimated first under the naive methodology (total volume / outstanding supply) and second using the self-churn filters.

Velocities without removing self-churn appear to be ~10x USD velocity.

Velocities with self-churn removed are in line with or lower than USD velocity.

The difference is quite stark. While the naive method indicates velocities ranging from 25 to 60, removing the artificial volumes shows a range of 2 to 10, in line with historically observed values for USD.

Why the high velocities early?

One interesting feature on both charts is the tendency for velocity to spike soon after the stablecoin debuts. There are a number of reasons for this unrelated to any real economic velocity.

Once the token is launched, significant quantities are minted and sent to early partners. While our minting filter can eliminate the initial transaction, it is very possible that those early partners then shifted around or tested the functionality of the tokens.

Additionally, token supply has historically been lowest at launch. Any transfer has an outsized impact during this period versus later periods. Partners that are testing integrations or even the stablecoin issuer itself sending test transactions will have their transactions magnified, and our filters cannot catch most of this activity.

In our internal models, we mitigate this issue by weighting by outstanding supply. Alternatively, the measurements can be smoothed by calculating total volume divided by average supply over the window. For simplicity, though, we stick with the basic calculation in this article and instead focus on the filters.

Conclusion

Velocity and token GDP are critical inputs to valuation models, and analytical models have led to wildly variable and unrealistic estimates. While data is limited, we can look at empirical evidence to get a better idea of what values velocity might take. Stablecoins are the closest asset we have to real-world use and eliminate many of the confounding factors like “hodl’ing”, and so they are the best source of empirical estimates.

Nevertheless, most of the transactions are not of interest in a velocity estimate for an economic valuation model. While some work has been done to identify such artificial volumes in Bitcoin and UTXO systems, little research has been done into account based systems like Ethereum and ERC-20-based stablecoins. We present a set of heuristic filters that can eliminate many, but not all, “self churn” transactions. We further estimate that these heuristics are conservative (more false negatives than false positives), meaning that velocities are overestimated. In spite of this, the filters are able to remove up to 90% of transaction volumes from the estimate.

There is a caveat to drawing definitive conclusions from these velocity estimates of 2 to 10. Ethereum and, by extension, its stablecoins have very limited usefulness for most real economic transactions. High fees, slow confirmation times, user experience frictions, and relative immaturity are all significant barriers.

Indeed, our conservative heuristics reveal that almost all transactions are not real economic transfers, and I would not be surprised if bona fide volumes are close to zero. This meshes with what we hear from the stablecoin issuers — right now, they are targeting traders and the nascent (but basically non-existent) smart contract payout economy rather than real payments until better platforms are found.

Going forward, ecosystem infrastructure will continue to improve, while networks like Logos will make transfers vastly more efficient, cost effective, and rapid, opening up many real-world payments use cases. These advances will plausibly reduce user frictions and increase velocity. As the cost of converting decreases, demand for working capital and thus average holding period decreases.

Nevertheless, we can confidently say that empirical evidence currently shows that decentralized stablecoins so far do not have meaningfully higher velocities than fiat currencies. To the extent that conclusions can be drawn, data supports the low velocity camp rather than the high velocity camp. The primary contribution of the article, the framework to eliminate artificial volumes in account-based systems, will be useful in monitoring changes as the ecosystem matures.

[1] This is a generalization; near stablecoin creation, there are potentially unanticipated nuances resulting from the issuer bootstrapping the network.

[2] Even if you are skeptical of projecting velocity values, accurately estimating P*Q should be of interest independent of model choice.

If you’d like to keep up with what we are doing: