Good Whale Hunting

“Ignorance is the parent of fear.” — Herman Melville, Moby-Dick

In the heart of the sea (*)

This is the first part of a series of 3 articles (part 2, part3) presenting the results of an analysis of the summer 2015 spam attack (a.k.a Moby Dick). This analysis was performed in collaboration with Antoine Le Calvez from p2sh.info.

Beyond the confirmation of known facts already described in academic literature or reported by users on social networks, our hope is that you’ll find in these articles a few new insights and may be a new perspective about this event.

Note that this study was realized on a best effort basis and is by no means exhaustive. No need to say that all errors or omissions are mine. More than ever, we encourage everybody (especially academics) to investigate the subject on their own.

A bit of historical background

Between June and September 2015, the Bitcoin network was victim of several spam attacks (sometimes called “stress tests”) provoking hundred of thousands of transactions and several mempool backlogs.

As a starting point, we’ll describe these attacks as a set of 4 periods of intense activity during Summer 2015:

  • wave 1: between 16/06/2015 and 01/07/2015,
  • wave 2: between 06/07/2015 and 17/07/2015,
  • wave 3: between 25/07/2015 and 09/08/2015,
  • wave 4: between 01/09/2015 and 07/09/2015

Waves 1 and 4 were “officiallyclaimed by a startup named coinwallet.eu.

Wave 2 has been documented on the bitcoin wiki but its perpetrator remains unknown.

Wave 3 seems the most mysterious. To our knowledge, little has been written about it with the exception of a (excellent) paper published by the Data Science Institute from the Imperial College London:

This second attack occurred in two phases as shown by the change in gradient of the number of records in the UTXO set in Figure 9. The attack had a limited impact on the backlog of transactions in the mempool, but a very pernicious effect on the number of UTXOs. By studying the block visualizations over this period, we can see that a very different algorithm was used, generating a “cancerous tumor” structure. This attack is very much one of data density rather than transaction rate and probably conducted by an entirely separate second party. It is also obvious to note the point at which a simple constant parameter in the algorithm was amended to increase the data density of this attack in its second phase, shown in Figure 10.

How to draw the first map of unknown territories ?

No need to say that a sailor needs a map before taking to the open sea. In our case, it means that we need to get a (rough) idea of the set of transactions composing these spam attacks.

A first manual inspection of the bitcoin blockchain teach us that these attacks were mostly performed thanks to long chains of fan-out transactions splitting a few inputs into many dust outputs. Moreover, we can notice that dust outputs generated by these transactions have a same denomination (usually 1,000 or 10,000 satoshis) with the exception of one output which is used as input of the next transaction in the chain.

But as I wrote in a previous tweet, fan-out transactions aren’t the whole story. This kind of attack is composed of 2 phases, the second phase consisting in fan-in transactions which gather the dust outputs. So, we clearly need to add these fan-in transactions to our map.

Practically speaking, we have decided to look for all transactions included in blocks between 08/06/2015 and 07/09/2015 for which all outputs have a same denomination (1,000 or 10,000 satoshis) with the exception of a single output having a different amount. We consider the matching transactions as the set of fan-in transactions composing the first phase of these attacks.

Then, we define phase 2 transactions as the set of all transactions consuming dust outputs generated by phase 1 transactions.

These are 2 very simple (and certainly imperfect) heuristics but that should be enough to give us a first view of the extent of Moby Dick and a lower bound of its impact on the bitcoin blockchain.

Note: Taxonomy used in these articles
We’ll classify the transactions composing these attacks along 3 dimensions
1. the Phase
* Phase 1= fan-out transactions creating dust outputs
* Phase 2 = fan-in transactions gathering dust outputs
2. the associated pattern of spam
* Attack A = dust outputs of 10,000 satoshis
* Attack B = dust outputs of 1,000 satoshis
3. the associated wave
We define a wave as a time period associated to a noticeable increase of the space consumed by spam transactions.

Captain Ahab’s map

Running the 2 heuristics mentioned above gives us the following results

Main figures describing the 2 phases of the 2 attacks

The first insight is an estimate of the space consumed in the blockchain by the complete spam attacks: around 2.9GB.

If this figure of 2.9GB is greek to you, here are a few elements of comparison:

  • it represents around 2.28% of the size of the complete blockchain to date (measured on 09/08/2017) or 7.68% of its size on 01/07/2015 (beginning of the spam attacks),
  • it’s equivalent to the size of the bitcoin blocks mined during December 2015,
  • it’s equivalent to the size of the bitcoin blocks for the 20 first days of May 2017 (the busiest month ever in terms of on-chain activity).

Not so bad for an attack performed by an individual or a private entity !

Let’s note that this figure is a lower bound. It doesn’t take into account additional transactions used for the gathering of dust outputs (more on this in the third part) nor others patterns of fan-out transactions that we may have missed. For the record, we have detected (unfortunately too late for an inclusion in this analysis) occurrences of long chains of transactions with dust outputs of 10,073 satoshis which seem to qualify them as another attack performed on September 2015.

As expected, we can observe that phase 2 (gathering of dust outputs) has consumed more space than phase 1 (creation of dust outputs) but a more interesting insight is certainly that Attack B was, by far, the most damaging in terms of space consumed, number of dust output generated or cost efficiency.

At last, we can estimate that the amount of fees paid to the miners is around 274BTC. Here, it’s important to note that not all of these fees were paid by the attackers. Indeed, part of the fees associated to the second phase were paid by services and users targeted by the dust outputs. Anyway, the most surprising insight is certainly the very low fees paid for the transactions during the second phase (1 or 2 orders of magnitude under the average fee rate during the same period).

As a complement, we estimate that 1.87 millions of dust outputs remain unspent to date (around 3.6% of the UTXOs).

A good story needs a good timeline

To get a better understanding of the dynamics of these attacks, we’re going to plot 2 charts displaying the cumulated size of the fan-out transactions as they were included in blocks.

Timeline of Attack A (phase 1)
Timeline of Attack B (phase 1)

These charts suggest that Attack A may have been used as a first attempt but later abandoned for the more efficient model provided by Attack B, the latter reaching its maximal impact during the third wave.

Since we’ve also identified the transactions collecting the dust outputs, we can plot two similar charts for the second phase

Timeline of Attack A (phase 2)
Timeline of Attack B (phase 2)

There are a few observations to be made here:

  • some phase 2 transactions were created during waves 1–4, amplifying the effects on the backlog during this period. Moreover, as we’ll see in the third part of this series, some of the collected dust outputs were reused as a source of funding for the next waves of fan-out transactions,
  • phase 2 of wave 4 alone (claimed by coinwallet.eu) added around 400MB in September 2015, followed by 2 additional “minor” waves adding around 200MB in October and November.
  • the last 2 waves of fan-out transactions occurred far later in time (March 2016 and S2 2016)

Conclusion

In this first part, we’ve computed an estimate of the extent of the spam attack initiated during Summer 2015:

  • around 2.9GB of space consumed (2.28% of the size of the Bitcoin blockchain to date) by 1.3 millions of transactions,
  • more than 16 millions transaction outputs generated with around 11.6% which are still unspent,
  • around 274BTC spent in fees for an average fee rate of 9.4 sat/byte.

We have observed multiple patterns of attacks repeated during several waves but the most “surprising” observation is certainly the existence of the late 7th and 8th waves which have added around 1GB of fan-in transactions during 2016 (around 700-800MB during S2 2016). Understanding why these last waves happened so late will be the subject of the next post.

(*) Thanks to @desantis for this perfect cover picture