A Very Dusty Yom-Kippur (dust attack post-mortem)

Shai (Deshe) Wyborski
18 min readSep 28, 2023

--

I was very tired this Sunday morning. I have just arrived home after flying halfway around the world to Hong Kong, participating in two and half days of an extremely intense conference, and catching a flight back home on the very night of the second day.

“You know,” I told my significant other, “I think I deserve a little break. This Yom-Kippur will be all about spending time with ourselves and our friends, no Kaspa for the next 24 hours.” So, in a sense, I feel like I jinxed the following into happening.

For those not in the know, Jewish holy days are not from dusk to dawn but span sunset to sunset. So on Sunday, around 18:00, all religious folk (including, but not limited to, Yonatan Sompolinsky and Michael Sutton) signed off for around 25 hours. Me and my girlfriend took the more secular route and had some friends over for 24 hours of drink, snacks, and unsavory activities such as putting together a 2000-piece puzzle.

Ori Newman messaged me around 00:15, but I only noticed around 00:45. “Are you following the ongoing attack?”, “no, what is it, another DDoS?”, “no, someone is spamming the blockchain. They are trying to increase the UTXO set”, “what, like a dust attack?”, “yes”, “oh shit”.

But What Is a “Dust Attack”

In UTXO coins (such as Bitcoin, Dogecoin, BCH, BSV, Litecoin, Cardano, Nervos, Ergo, etc.), the ownership of a coin is stored in something called an unspent transaction input (UTXO). Each transaction consumes some UTXOs and creates some UTXOs. The list of UTXOs is called the UTXO set, and it represents how the circulating supply is distributed among all addresses.

The term dust essentially means a UTXO with a very small balance (typically, smaller than the fee required to spend it). Such balances could be created organically, or maliciously.

Two common attacks leverage dust. One such attack is called dusting and is carried out by placing trace amounts of coins into existing wallets to track them and harm their privacy. This is not the type of attack we are talking about today.

The attack we consider is a spamming attack. Essentially, the attacker abuses the fact that the transaction fees are very low to create many many new balances. The motivation for the attack is that the UTXO model requires all nodes to maintain a copy of the entire UTXO set. Hence, filling the UTXO set with crap increases the storage requirements for running a node indefinitely.

The attack on Kaspa populated each block with five transactions; each transaction consumed a single UTXO and produced 238 UTXOs. That is, each block increased the UTXO set by 1185 entries. Each UTXO is about 85 bytes long, so each block contributed roughly 100KB to the UTXO set, which aggregates to around 5GB a day. So if we let the attack run for, say, a week undisturbed, the size of the UTXO set would increase by 35GB indefinitely.

In other words, when fighting dust attacks, time is of the essence. The longer it takes to negate the attack, the more lasting damage it would have.

The attacker was wise to choose Yom Kippur for the attack. Plausibly, they were aware of the fact that Yoni and Michael would not be available for at least 18 hours since the attack started. They were also aware that combating the attack would require making tough calls that would have consequences on the network and that making such calls might be difficult without the entire circle of leading contributors in attendance. Maybe she hoped that having fewer hands on deck would increase the chance a wrong call would be made or that it would altogether delay a decision, greatly increasing the permanent damage incurred by the attack.

Combatting a Spam Attack

What we had to do at this point was to come up with a new version that would prevent the attack ASAP and hope that the miners on the network would quickly update their nodes. But what is the correct way to prevent the attack?

The first thing to understand is that a solution must not be consensus-breaking. That is, we are not allowed, in any way, to change the rules under which a block (and consequentially, a transaction) is considered valid. Consensus-breaking changes are often called forks (of which there are two flavors: soft-forks and hard-forks). Forks are not something to be deployed overnight; they are typically planned meticulously for months and even years and are released with a generous heads-up, providing all ecosystem constituents sufficient time to prepare. Forks are rare occasions that usually aggregate many important-yet-consensus-breaking changes into a single update.

So what can we do? What tools are at our disposal? Well, we can suggest to miners what transactions they should and should not include in their blocks. We can also suggest to node operators in general what transactions they should accept and retransmit and what transactions they should reject and drop. We suggest new policies by releasing a version that implements them, and node operators accept our suggestion by installing the new version.

The key point here is that not everyone has to follow our suggestions. Since the update is not a fork, blocks created by unpatched nodes are still considered valid, and they are not excluded in any way. This is very important, as it prevents a network split. However, this comes with a cost: our solution is only as good as the fraction of hashrate mined to patched nodes. If only 90% of the miners (in terms of hashrate) updated their nodes, then the attack is only reduced by 90%, and the required storage still increases by 500MB a day.

Another key point is that the changes we make must avoid breaking things. That is, we want to avoid circumstances where transactions generated by parts of the ecosphere will suddenly be rejected. Imagine, for example, we decided to reject any transaction that creates more UTXOs than it consumes. This might sound reasonable, but that is actually a terrible solution: typically, when a user creates a transaction, they don’t pay exactly the same amount that they have in the UTXO they are spending. This is handled by creating a change UTXO. If Alice has a 1000 Kaspa UTXO, and she sends 500 thereof to Bob, Alice’s wallet will create a transaction with one output (consuming Alice’s UTXO) and two outputs: one paying 500 Kaspa to Bob, and another one paying the remaining 500 Kaspa back to Alice (it will actually pay a little less: in the UTXO model, the fee is paid by having the transaction consume more than it spends, the miner is allowed to claim the difference along the block reward in the coinbase transaction. That’s another clever way to prevent dust: all of the profits from the block are concentrated into a single UTXO, regardless of how many transactions were included in the block). The bottom line is that while this solution will stop the attack, it will also break most wallets. And there are other crucial activities to keep in mind besides wallets, such as exchange withdrawals (that are often batched) or cashback from mining pools.

A third key point is that our fix must actually fix the problem. This might sound obvious, but it isn’t. We can’t roll out one update after the other until we get it right. We only have one chance. Maybe two. Anything beyond that will delay the resolution of the attack and rightfully portray us in a negative light.

To understand how easy it is to come up with a solution that seems reasonable but doesn’t actually solve anything, consider another (bad) solution: allow UTXOs that have one more output than they have inputs. That is, allow increasing the size of the UTXO set, but only by one entry at a time. Would that work? No. No dice. The computation is a bit complicated, but the bottom line is that a block can contain about 150 1-to-2 transactions, so the attack will still increase the UTXO set by 150 entries per block. This is about 15% of the original attack. It might sound “good enough”, but it isn’t. It still allows the attacker to indefinitely increase the UTXO set by 750MB/day for free.

The fourth key point is that you can’t prevent the attack, you can only make it costly. At the end of the day, you can’t prevent situations where the block is packed with transactions that increase the UTXO set because they are characteristic of everyday use. What you can do is make these transactions costly for a spammer. That is, set a fee policy charging for such transactions, in a way that does not affect most users, but does affect spammers. It is hard to estimate exactly how much an attack should cost, but a good guideline is the million-dollar rule: the damage that an attacker with a million dollars worth of resources could cause should be inconsequential. When designing the solution, we aimed making a 1GB increase in the UTXO set cost around 20 million Kaspa.

So to recap, our goal is to come up with new policies for transaction inclusion such that: most (hopefully, all) transactions currently created by the ecosystem will still go through without requiring any adaptation, yet deliberately spamming the UTXO set requires paying huge fees.

From an Initial Idea to a Full Solution

In this section, I want to walk you through the back-and-forth Ori and I had, both in a private conversation and in the #development channel on the Discord server, since we started to discuss a solution around 1AM, and until we finally agreed on one around 5:30AM. If you just want to see the final solution, you can skip to the bottom of this section.

The initial idea was simple: charge an increased fee of 1 Kaspa for excess UTXOs. That is, if a transaction, say, consumes 4 UTXOs, but produces 7 UTXOs, it must pay a fee of at least 3 Kaspa.

The problem: this solution breaks everything! For example, all 1-to-2 transactions generated by wallets will be rejected, since they don’t pay enough fees. This could be adjusted later, but that’s not good enough, since: 1. we want to avoid breaking stuff, and 2. a 1 kaspa fee is way too high for everyday uses, but a lower fee will make spam attacks too cheap.

For brevity, let us call a transaction irregular if it has more outputs than inputs, but it doesn’t pay a 1 kaspa fee per excess output. For example, a transaction with 4 inputs and 7 outputs is irregular if it pays a fee of less than 3 Kaspa.

Solution: still allow posting irregular transactions, but include only one such transaction per block. This means that, for example, 1-to-2 transaction created by wallets will still be accepted, but only one such transaction will be posted per block. All other transactions in the block must still pay fee for excess UTXOs.

Problem: this still allows spamming! An attacker can still post irregular transactions with one input and many outputs, and once in a while, one of these transactions will be included in a block. If the attacker manages to create 1000 UTXOs once every two or three (or ten, or twenty) blocks, then we are almost back to square one!

Solution: Limit irregular transactions to have at most two excess outputs. This will allow most of the network activity, including 1-to-2 txns created by wallets. Any logic that creates transactions with more than two excess outputs could either pay the fee or break the single transaction into a chain of transactions, each containing only some of the outputs. This means that the amount of new UTXOs that are created “for free” on each block is at most two, which is well within the realms of the natural growth of the UTXO set.

Problem: This opens a denial-of-service attack. An attacker can post thousands of irregular transactions, making the wait times for including irregular transactions very high.

Solution: Prioritize spending older money. To delay a legitimate transaction, an attacker must post many transactions, and each of these transactions must spend a UTXO older than the one spent by the legitimate transaction. In order to delay a transaction created a year ago by a single day, an attacker must own at least 86400 UTXOs created more than a year ago.

Problem: But the dust attacker now has access to 57 million UTXOs created before 28/9/23. Couldn’t she delay spending all irregular transactions created later than that for a few weeks?

Solution: Prioritize the irregular transaction whose newest input is the oldest. The UTXOs created by the attacker have such a small value that to actually spend them — even paying the low pre-patch fees — she must also use an additional UTXO of greater value. Always looking at the newest input implies that these minuscule UTXOs do not actually prolong the attack.

Problem: But what about pools that split the block reward UTXO directly to many users? Wouldn’t this change harm their ability to pay their users?

Solution: Provide an exemption from all of the above to a transaction that has at least one coinbase input (a coinbase input is a UTXO created by a block reward instead of by a regular transaction. Fortunately, the UTXO set keeps track of whether the UTXO is coinbase since coinbase UTXOs have slightly different rules). This will actually provide a lot of leeway to pools. They can do whatever they want without paying excess fees if they include at least one coinbase UTXO (of which they have plenty) in the transaction.

So to sum it up, we have three types of transactions:

A transaction is called regular if one of the following holds:

  1. It has no excess outputs, and pays the same fees as it would have before the update, or
  2. It has excess outputs and pays a fee of at least one Kaspa per excess output, or
  3. It pays the same fees as it would have before the update, and at least one of its inputs is a coinbase UTXO.

A transaction is called irregular if all of the following hold:

  1. It has at least one and most two excess outputs,
  2. It pays at least as much fee as it would have paid before the update, but less than a fee of one Kaspe per excess output, and
  3. It has no coinbase input

A transaction is rejected if it is neither regular nor irregular.

The change to the gossip rules (that is, the rules by which nodes transmit transactions to each other) is: only accept and transmit regular and irregular transactions, reject rejected transactions.

The change to the block construction rules: include one irregular transaction per block, if there is more than one irregular transaction, choose the one whose latest input is the earliest. Occupy the rest of the blocks with the same rules as before the update, except only allowing regular transactions.

Note that these are not consensus rules; blocks that do not follow these rules are still valid. The best we can hope for is that many mining nodes will update fast, decreasing the fraction of miners who mine spammy blocks, and many nodes, in general, will also update fast, making it more difficult for such transactions to propagate to miners.

Rolling it Out

Around 5:30AM we decided that we were satisfied with this solution, but also that we were extremely tired, and the desire to have this behind us might affect our judgment. To mitigate this risk, we decided to linger on it for half an hour. We also described the solution to #development to see if anyone has any reservations or sees a flaw that we missed. By 6 AM we were content that our solution is adequate and started the process of rolling it out. Ori set out to actually code it while I started writing a public announcement. He then reviewed my announcement while I reviewed his code (a rite I haven’t practiced since 2015, especially not after a white night).

A few technical complications later, we finally managed to roll out a version. At 8:30 AM, we posted an announcement calling all users, and especially miners, to update their nodes ASAP.

C̶o̶n̶t̶e̶n̶t̶ ̶w̶i̶t̶h̶ ̶h̶a̶n̶d̶l̶i̶n̶g̶ ̶t̶h̶i̶s̶ ̶a̶t̶t̶a̶c̶k̶ ̶w̶e̶l̶l̶,̶ ̶w̶e̶ ̶f̶i̶n̶a̶l̶l̶y̶ ̶w̶e̶n̶t̶ ̶t̶o̶ ̶s̶l̶e̶e̶p However, the attack at this point was far from over. It remained to monitor whether our solution was sufficiently adopted, whether it actually solved the problem and whether it broke stuff.

Too impatient to sync a node of my own, I located a public node with an open RPC port. I wrote a script that samples blocks from that node and tracks the percentage of blocks that contain spam (my definition of a “spam block” was a block with more than 200 outputs, but there was a very clear dichotomy between spam blocks with 1190 outputs and standard blocks with at most 10 outputs). This allowed me to get an estimate of the adoption coverage, and it was intense.

By 15:00, it seemed that the number of spam blocks has already reduced to less than 50%. At this point I realized that I can improve the accuracy of my script in several ways, and found out that, actually, the number is more around 35%. That’s a start, but this steal means that the attack creates 1.5GB of UTXO junk daily. However, there wasn’t much to do but sit and wait. It has been only 6.5 hours since the patch was posted, a fraction of a business day.

On the other hand, it seems that most of the ecosystem remained intact. The sore exception is the logic KuCoin uses to process withdrawals, which relies crucially on few-to-many transactions.

18 hours after the announcement, the attack already went below 10%, and by midnight it was already around 2%. By this time, Michael and Yoni emerged from their Jewish slumber and were brought up to speed. For me, after working the attack for 36 hours straight, it seemed like a good time to go to sleep.

The next morning, the capacity of the attack was down to 1.5%. Very good progress, but not quite enough, as it still creates 75MB/day of spam. It isn’t much, yeah, but it is something. Besides, by this point, it was a matter of principle.

Most of my efforts the next day (and the day after) were dedicated to locating sources of spam blocks and having them update their versions. Right off the bat, I knew that locating solo miners is nearly impossible. Fortunately, it seemed that around 97% of the spam blocks were created either by publicly known pools or by mining rented on NiceHash.

Honestly, the fact that several pools failed to update their nodes kind of pissed me off. Some of the pools were running versions several months old. Getting in touch with the operators of these pools was extremely difficult. I tried approaching via Telegram, Discord and any other means at my disposal. When this didn’t work, I publicly called them out for using outdated nodes, whereby they are not only spamming our network but also decreasing the expected profit of their users (since spam blocks are much heavier than typical blocks and thus take longer to propagate).

Nevertheless, some pools remained outdated for over 48 hours after announcing the patch. But with some patience, we eventually managed to get all these pools to update.

The spam originating from NiceHash was a whole different story. NiceHash does not run Kaspa nodes; they provide a renting platform for people who run complete mining operations and, in particular, run their own nodes. I was able to contact their system administration team, and together, we were able to cross-reference the data I accumulated about spam blocks with their customers. Obviously, they did not disclose any customer information to me, but they were more than willing to relay requests to update nodes and even checked the possibility of blocking service for outdated nodes if they were observed to spam the network. Though it took a while, all the largest renters on NiceHash eventually upgraded their nodes. As of now, my measurements show that around 99.9% of blocks are produced by updated nodes.

On the other front, Michael brought in his network forensic skills (I didn’t know he had any, but at this point, I am no longer surprised to learn about new Michael skills) and managed to trace the spam route back to a particular node that seems to generate most of it. The nice thing about finding the node is that it lets us see how well our solution is deployed. On my end, I see 0% spam blocks for a while now, but is it because all users updated their nodes or just due to the attacker turning it off? Will we suddenly see 3% or more of spam blocks when she decides to turn it back on? Tracking the spamming node, we see that 0% spam holds even though the node is still pumping out spam transactions like crazy. We know for a fact that not all miners are updated, so the only conclusion is that sufficiently many (not necessarily mining) nodes are updated to prevent the transactions from propagating, even though there are technically still miners who will include them.

Besides that, having the IP of the spamming node is not particularly useful. For good measure, I e-mailed the abuse department of the entity in charge of the address pool, informing them of a spam attack. Maybe they’ll do something about it. Most likely they won’t.

The Tidal Wave Following The Earthquake

Kaspa has a pruning mechanism. Every 24 hours, the mechanism processes 24 hours worth of information, stores whatever needs to be stored, and erases the rest. This happens around 5PM Israel time.

You’d think that this means that every 5PM the data of the last 24 hours is processed, but that’s not the case. For reasons I will not go into here, this happens with a delay of two days. In particular, in Wednesday, 5PM, the pruning mechanism was scheduled to process all information accumulated between Sunday 5PM and Monday 5PM.

Part of this processing is to store all UTXOs created during this period into the database. For other reasons I won’t go into right now, the Go implementation does this entire write in a single atomic query.

This is a bit heavy on the nodes in everyday use, but since the query is insanely large, we expected it to make many nodes freeze for several minutes. What we didn’t expect is that the command is so large that it will actually cause nodes to run out of memory.

I think what happened next is best described with a picture:

In more detail, it seems that a lot but not all of the network nodes have crashed. It seems that 16GB of RAM was just short of handling the db call, and nodes with 32GB handled it just fine.

The rust implementation solves this problem. It amortizes the db write into many small commands, one for each block. Hence, all rust nodes, even those running on weak hardware, did not freeze or crash.

That being said, it was pleasant to see how the entire ecosystem came back online within less than two hours.

But wait! After the network started recovering, we were suddenly bombarded with spammy blocks again! At a rate of 3%! Looking into the matter, it seemed that these blocks originated from a pool that was not problematic so far. Moreover, they were running a very old version. I reached out to them and they responded immediately, but argued that all their nodes are up to date, showing me screenshots validating that. What is going on? Is someone impersonating them to throw me off? Well, it turned out that the 5 PM crash somehow caused their stratum to reroute to an old backup server. They fixed it immediately, and the spam disappeared.

24 hours later, another, smaller tidal wave was expected. According to Michael's computations, it should be at least three times smaller than the first one. We concluded that most, if not all, nodes should withstand it without crashing, though they might freeze for a little bit. That being said, a new Rust version of the node was released for the mainnet. It seems that the impact of the second write was marginal.

Aftermath, Conclusion, Afterthoughts

At this point, I feel comfortable to say that the attack has been completely blocked. Only a negligible fraction of nodes remain outdated, and it seems that propagating spam blocks to them is very difficult. The attack's damage has been contained and boils down to 5GB of junk UTXO data we will have to carry permanently (at least until someone solves the problem of storing the UTXO set). While annoying, that’s not too bad. At high throughputs, we expect storage requirements to be at least around 200GB, so a few gigs of extra weight is not what’s going to make the difference.

This stupid attack gave us a run for our money. It not only stress-tested the network but the devs as well, and I think we handled it with a stride. In such situations, it is easy to make the wrong call and harm the network or to choose to stall while the damage accumulates. I take pride in the fact that we came up with an adequate, far from obvious solution that required making tough calls, and that we came up with it fast, and followed it through.

The engagement of the community was, as always, impressive. Not only did many people participate in the discussion, but people were lining up to help in any way possible: pushing miners to update ASAP, explaining what’s going on to concerned users, communicating with pools and exchanges that needed support adjusting to the changes, and so on.

The “tidal wave” gave us an opportunity to test the much-improved pruning processing of the Rust implementation under a huge load in a real-world environment and see it passing the test with flying colors.

Going forward, we still have to come up with a more permanent solution to dust. One that does not rely on miner cooperation outside of consensus and does not limit users who want to create few-to-many transactions. The current solution affords us enough time to do so leisurely since we know that it provides good protection against dust attacks in the future.

Personally, for me, this was probably the most action-packed and exciting night in my entire professional life. It is a privilege to be on Kaspa’s first line of defense, and I hope I proved myself worthy of it.

--

--

Shai (Deshe) Wyborski
Shai (Deshe) Wyborski

Written by Shai (Deshe) Wyborski

Ph.D quantum cryptography from HUJI. I study blockchains, quantum cryptography, and the relations thereof. Primordial kaspoid.

Responses (4)