Dust protection on the IOTA Network — an ELI12

Werner der Champ
12 min readNov 10, 2021

Since it was introduced in the Chrysalis part 2 network upgrade, the dust protection has always been a central point of discussion. This didn’t change at all with the new dust protection RFC being released. In this blogpost I will try to explain why it is critical to have on the IOTA network, how the new proposal works, some examples to better understand and how this will affect the usage of the ledger

Edit: This article is outdated, the new version can be found here

Scarce resources in a DLT

In order to stay up and running, a node has to constantly provide the CPU power, network bandwidth, RAM and disk storage that is currently required by the network. For example, if a node has insufficient CPU power during a spam attack, it will start to fall behind. Limiting the maximum MPS via congestion control is one important key to this. It is very effective in preventing the network from consuming too much CPU, bandwidth and RAM on the nodes. For a more detailed description about congestion control I recommend reading Luigi’s blog post on this topic. However, congestion control is not very effective to prevent excessive disk usage, so we need additional measures here. Each node has two different databases, we have to take a look at

The tangle database

The tangle database stores all messages as they arrive. If another node requests a message in order to sync up with the network, it is read from this database and sends it. It also tracks some additional data like the confirmation status of a message and is used to validate token transactions.
Since this database can be pruned, its size can be adjusted pretty freely, depending on much history you want to keep. It is possible to run a node with just ~10 minutes of tangle history, in which case you would have an extremely small tangle database. The other extreme would be a permanode that stores the entire history of the tangle, but might require Terrabytes of disk space in the future (currently, the requirement is ~500GB). With the latest Hornet version, users can simply input the maximum size they want their tangle database to have, if the size is reached, the node prunes some messages to get below the threshold. As a result, this database isn’t too big of a problem.

The ledger state database

Just like bitcoin, IOTA is an UTXO based ledger, that works with outputs. Think of an output like a check, written to your IOTA address with a specific amount of tokens. Of course, you can have multiple checks to the same address, your net worth is the sum of all checks written to all of your addresses. If you want to make a value transaction, you cash in one or multiple checks and create new ones. As you have to create checks with exactly the balance of the checks you just cashed in, usually this results in one check to the user you want to send tokens to and one back to yourself with the remainder. For example, in the graphic below, transaction D spends a 15Mi output that was created in transaction B, then creates two new outputs, one with 10Mi to blue and one with 5Mi back to itself. Of course, just like you can only cash a check once, you can only spend an output once. Transactions that try to use an output that was already spent are rejected (see Transaction F2)

Explaining the UTXO flow in the tangle (Graphic by JSto)

To maintain all balances, a node must therefore keep track of all unspent transaction outputs (UTXOs). This list of outputs is called the ledger state, as it states all token balances at a specific time. This state is also downloaded when you first setup your node and backed up to your disk when your node performs a snapshot. Whenever a value transaction is confirmed, all outputs it spent are deleted from the database and the newly created ones are added. If more outputs are created than are spent, the ledger state grows in size. However, we obviously cannot prune the ledger state database as that would remove tokens out of wallets and could also lead to different nodes having different balances for the same address

As a result, if this database grows too large, we really have a problem. Not only because we can’t prune it, but also because the only way to reduce the size is by token holders actively spending multiple outputs into a single one. This issue exists on all current DLTs, with Bitcoin having a state of about 4,3 GB with 7,6 million unspent outputs. On more utilized DLTs like Ethereum or Solana, the state goes well into the hundreds of gigabytes. The current ledger state of the chrysalis network is just at 10MB. However, with more utilization and additional data like NFTs or native tokens, it is just a matter of time until this increases. As IOTA also does not have any fees, it is pretty cheap to add new outputs. With just about $50 investment, you could create over 30 million 1i outputs to random addresses. Depending on the resources you have available for POW this might take you a few months, but nobody could ever clean it up. Therefore, with the migration to chrysalis, a dust protection was enforced to mitigate this problem, which increases the cost to create the same amount of outputs by a factor of about 90.000.

Short recap

  • We need to ensure that nodes do not get overloaded by too high resource requirements, in this case too much required disk space for the ledger state. This is also important as every new node has to download the state first before it can synchronize
  • The ledger state can neither be pruned nor easily cleaned up. IOTAs feeless makes it easy to inflate it
  • The current dust protection mitigates this by requiring a deposit for accepting dust transactions, thus making such attacks costly

Also note that only value transactions use the ledger state database. Plain data messages just send data, which can be easily pruned.
Therefore, if you just send data you are not affected by dust protection!

How does the dust protection work

The current system

In the current IOTA network, the minimum amount to be sent is 1Mi. If you want to receive microtransactions, you have to create a dust allowance output, which functions the same as a regular output but has a flag set that it permits dust to be sent to this address. For each Mi this dust output holds, 10 outputs <1Mi can be accepted on this address, up to a maximum of 100. This limit can easily be restocked by making a transaction that spends all dust outputs into a singular new output, creating space for further micro-outputs.

However, this system has a few problems. First, a spammer could just fill up your allowance with 1i outputs constantly. While it can be cleaned up by you, it can be hard to actually receive the microtransaction you want because any allowance is instantly filled. Furthermore the network has to constantly check if you are allowed to receive dust. This requires total ordering of the network, and therefore does not work without the coordinator.

The newly proposed system

With the new system, whoever creates an output has to pay a deposit in IOTA tokens. As outputs can become more complex in the new system, the deposit is based on how much storage the output uses. Output sizes differ a little, it takes a lot more disk space to store an NFT than to store a simple IOTA transaction, so NFTs require a higher deposit. When you spend the output, your deposit is fully refunded. Therefore, IOTA stays a feeless protocol, every token you spend on storage is refunded when you free the storage again.

The deposit is made by simply adding some IOTA tokens into the output. So if you want to send me an NFT, you also have to send me some of your IOTAs to pay for the storage the output consumes in the ledger state database. The same goes for native token transfers. If you just send around IOTAs, those outputs might already have enough balance to back themselves.

To express it differently, you basically stake your IOTAs whenever you send them, the reward is permanent (remember, we can’t prune this database) disk space in the ledger state database until you spend the output. This also adds the use case to permanently store data in the ledger state, for example for DIDs. But in order to do so, you have to add some IOTA tokens to stake for the token. Sounds like a fair deal, doesn’t it?

Bytes and virtual bytes

The storage requirement of an output is calculated in virtual bytes (v_bytes). While one could assume that the output could just be written to disk, we also need to be able to find any output quickly without looking through the entire database. Like in a scientific book, we therefore keep an index that shows us on which page the output is. However, maintaining this index is much more costly, since constantly has to be updated and adjusted for maintaining high read and write speeds. A larger index also slows down the database, just like you would also need longer to find what you are looking for if the index is 50 pages instead of 10. Therefore, writing an index (aka database key) is ten times more expensive than just writing data.

Weight for key and data (extract from the Dust Protection RFC)

Because keys have a much higher weight, they play a major role in the v_byte size of an output. This becomes very visible with the simple output, that just has the outputID, the address where it goes so, and the amount you are sending. The outputID is needed to quickly find an output when somebody tries to spend it, as this is how you specify which output you want to spend. Because users want to see their balance, we need to create a separate index so we can find all outputs that belong to a specific address. As the index is stored in addition to the data, the effective weight of this field becomes 11.

v_byte costs for a simple Output that just sends IOTA (modified extract from the Dust Protection RFC)

Examples for the dust protection

Assumptions

At the moment, it is not decided how many IOTAs you will have to deposit per virtual byte consumed. This value will be a protocol value, which means it can be adjusted by just changing this value in the config of every node during a protocol update. To still give some kind of example how high the deposit might be, I have chosen three possible byte costs and calculated the required deposits. We also look how big the ledger state will get if 20% of all IOTA tokens are used to pay for space in the ledger state. The cost for the outputs are without any extra features, we are also using regular addresses where aliases would be possible. Alias addresses are actually smaller, so you would have to make a smaller deposit for these ones (around -185Ki). Here are my scenarios

  • A cost of 1400i/v_byte. This would result in a ledger size of 360GB at 20% funds utilized. With this value the minimum deposit of a simple output is around 1Mi, similar to how the current system works. This also makes it easy to compare the different outputs. Despite this, it is more like a worst-case-scenario and in my opinion the highest acceptable value.
  • A cost of 500i/v_byte. This would result in a ledger size of 1TB at 20% funds utilized. Maybe a good compromise between a big database and small deposits?
  • A cost of 125i/v_byte. This would result in a ledger size of 4TB at 20% funds utilized. This is a rather low example and would result in rather high requirements for nodes. I don’t think we should go much lower than this, take this as a possible lower bound.

Let’s take the example I want to send you a native token, but I don’t want you to spend it in the next 30 days. So I choose an extended output (716 v_byte), add the native token (+70) and the timelock block (+5). The size of the output would be 716+70+5=791 v_byte. At a cost of 500i/v_byte I would have to put 395,5Ki into the output to fulfil the dust protection requirement.

Microtransactions using the new system

With the new system, native token transfers and NFTs are microtransactions at well, they all need to carry some IOTA around to pay their cost in the ledger state. If you want to send somebody a large NFT, you might have to add quite a few Miotas. So how can you get these tokens back?

The answer is simple, you add a return block. This block makes it, that the recipient has to pay you your deposit (or a part of it) back. As he creates a new output, he now has to pay the deposit for the NFT by himself, possibly repeating this process if he wants to send the NFT again. In reality you might also add an expiration time, so you receive your money back in a reasonable time if your recipient does not respond.

Lets take the example below. Alice wants to send Bob a single IOTA. Instead of the token, Alice could also send a native token or an NFT, same procedure. Let’s assume the output requires 1Mi of deposit. Alice adds a return block to receive her 1Mi back, an expiration block so Bob has to claim the IOTA within 3 days. If Bob spends the output, he has to make a transaction that refunds Alice the 1Mi. Bob also has to ensure that he is fulfilling the dust requirements now, so he might to have some IOTA tokens out of his own pocket.

If Bob does not respond, let’s say he was on vacation, it expires and Alice regains ownership of the 1Mi+1i output and can spend it as if it always was her own. After Bob returns she could just resend the tokens.

Image from the dust protection RFC

If the two are in regular exchange, they could also move to layer 2 of course. Besides the obvious way of using a smart contract, Alice could also setup a payment channel to Bob by making use of the timelocks. I expect a lot of microtransaction use cases to utilize layer 2, especially if they have to happen fast.

Conclusion

The new dust protection offers a well regulated cost for writing data into the ledger state. As the deposit is refundable, users have an incentive to clean up the ledger database to have their tokens available. It also creates a great use case for the IOTA token, making it mandatory to own if you want to keep permanent data inside the ledger.

But what if the value of the token rises

This is a major concern in the community, and it isn’t for nothing. A higher token price would cause deposits to become more expensive as the cost of writing data into the ledger state increases. The good thing is, we can always decide to decrease the cost per byte in the future if for example SSDs get even more cheaper and faster. While increasing it is theoretically possible, it creates some issues, for example already created outputs would not be affected by this, thus creating an incentive to not move these tokens to keep the lower price.

Ultimately, sharding is the only permanent solution to this problem. While the current system with currently used hardware could easily deal with 10 million users, it gets really tricky when reaching the 3-digit millions. Also, not even 10.000 MPS will be enough in this system unless you make Solana-style requirements for validators (and even that might not be enough). It is just not possible to maintain a database of billions of users/devices with millions of messages in a second on a single machine in a decentralized way. While IOTAs technology does not have a fixed MPS limit, your hardware surely does.

With this, we are back at the reason for the dust protection, the limited resources of nodes. But what actually should be the requirement to run a node? Higher requirements allow IOTA to reach more messages per second and a lower deposit to write into the ledger state. But the higher the requirements, the fewer people will set up nodes, which makes the network more centralized. Even worse, on IOTA there are no rewards for validators, so there isn’t a monetary incentive generated by the protocol. This is the classic case of the blockchain trilemma — decentralization versus scalability. You cannot desire 5000 MPS but at the same time you want to run a node on your Raspberry Pi.

The blockchain trilemma — choose 2 out of 3

Several members of the X-Teams, IF and myself have shared the same opinion for a while now: The community has to decide how high we should put the requirements. So with the new RFCs out, I think it is time to kick-off this discussion. How much would you pay per month for a VPS? How many MPS should IOTA be able to perform? Should we rather work with lower MPS and nodes are cheap? Or do we want high MPS and fewer nodes are fine? I am looking forward to see your responses on Discord and on Reddit (link to post)

--

--