Utreexo: Compressing Fully Validating Bitcoin Nodes

Published in

Interdax Blog

10 min readDec 1, 2020

In this article, we examine a scaling solution for Bitcoin known as Utreexo, which drastically reduces the storage requirements for running a fully validating node.

With the price of bitcoin nearing all-time highs, an influx of new participants are set to enter the Bitcoin network for the first time. While there are already millions of entities using Bitcoin, the number of operational nodes is far less than you’d think. Ideally, each user should run their own to ensure the network is as secure and private as possible.

Figure 1 shows that the number of Bitcoin Core nodes is just above the level seen towards the end of 2017, and until recently, stagnated. As the price of bitcoin continues to rise and sets the stage for another bullish run, ideally we’d want to see the number of nodes increase in tandem.

Figure 1: Historical Number of Bitcoin Core Nodes

But to run your own Bitcoin node, you have to download the entire transaction history from the genesis block until the present, verify all the signatures/transactions and store all the blocks ever produced. This process is known as the initial block download.

Because every single on-chain transaction is etched into the Bitcoin blockchain forever, the initial block download can take a few days, or even longer, depending on the device you are using. With the total size of the blockchain greater than 300GB (and always growing), this represents a significant barrier to setting up a node for much of the global population.

The total size of the blockchain includes the full transaction history (which always increases) and the current state (also known as the UTXO set). As shown by Figure 2, the size of the ledger has increased from less than 50GB in 2015 to more than 300GB in 2020.

Figure 2: Total Size of the Bitcoin Blockchain

The transaction history accounts for most of the blockchain data, while the chain state (also known as the UTXO set) is currently around 4GB. However, keeping track of the chain state can be more difficult than the history.

As Figure 3 below shows, the number of UTXOs has grown rapidly in the past as bitcoin experienced waves of adoption. As more people and organisations use Bitcoin, all of them will use at least one UTXO, so being able to manage and store this data will become increasingly important.

Figure 3: Total Number of UTXOs

Pruned nodes go some way towards solving the growing size of the blockchain, where only a certain number of the most recent blocks are stored. But if every node became a pruned node, the blockchain history would be lost forever and new nodes would not be able to join the network.

Along with pruned nodes, there are also full nodes and light nodes:

Full nodes: are the most secure way to join the Bitcoin network, which independently validate every transaction/block after the initial block download. Full nodes also help bootstrap new nodes joining the network, keeping the network secure and decentralised. But as alluded to earlier, full nodes are heavy in terms of storage and computation costs.

Light nodes: Light nodes, or Simple Payment Verification (SPV) nodes, are the most economical way to join the network. SPV nodes only download and store the block headers. They do not validate signatures/transactions, rely on full nodes to know if an input exists, assume miners are honest and are vulnerable to several security and privacy attacks.

One of the most demanding tasks for a full node is verifying transactions, which involves checking whether the UTXOs actually exist, if the signatures are correct and so on. To validate the chain, it requires a lot of disk space to keep track of the set of coins that are currently spendable. As Figure 3 shows, there are over 60 million UTXOs at present and this number will increase as bitcoin enters a new bullish phase.

There’s no way to export the UTXO set since you cannot verify it, instead you have to share actual blocks and actual history. As the UTXO set grows over time, the storage requirements for full nodes (and even pruned nodes) will become larger and larger. Therefore, there’s a need for a node type that can fully download and validate the chain state but in a compact way to reduce the barriers posed by blockchain storage, without affecting security or decentralisation. One such proposal is Utreexo.

Utreexo: Running A Full Node with A Few Kilobytes

One potential solution to address Bitcoin’s growing storage requirements of the blockchain, without sacrificing decentralisation, is Utreexo. It is as secure as running a full node but only uses a few kilobytes, and the syncing time for a Hard Disk Drive is the same as for a Solid State Drive.

In essence, Utreexo does to the state/UTXO set what a pruned node does to the blockchain history: it goes through it and deletes things that are not needed. By pruning the UTXOs so you are not tracking anyone else’s UTXOs and only ones that belong to you, it eliminates the need for a database to store the chain state. You no longer have to look up coins on the local disk, where the database of UTXOs is usually stored.

This frees up disk space and also has the added benefit of strengthening the security of Bitcoin, since consensus would not rely on Google’s levelDB database implementation. Currently, Bitcoin’s consensus depends on levelDB working correctly as we currently need the database to retrieve stored transactions.

The original paper outlining Utreexo was released on June 3, 2019, by Tadge Dryja, a research scientist at the MIT Digital Currency Initiative and co-author of the Lightning Network paper. In June 2020, ademonstration release put the idea of lighter nodes into working code. Other contributors to the Utreexo project include Calvin Kim, Janus Troelsen, and Niklas Goggë.

Utreexo is considered a full node since it fully verifies, however it does not contain the full state of the system. Spenders prove their coins exist and they keep another hash for all other UTXOs, which is a tiny amount of data. When someone wants to spend their coins, they must provide a proof that their coins are in this hash. Under the hood of Utreexo is a cryptographic construction known as a hash-based accumulator.

Applying Merkle Trees to UTXOs

What the hash-based accumulator does is apply the concept of a Merkle tree to UTXOs. There’s no database to track the state, as Utreexo merkelises the UTXO set and retains a single Merkle root. UTXOs can be added to the accumulator, and we can prove that they are part of the set. We can also verify and delete things that are part of the set as well.

When there is an incoming transaction to be validated, the spender can provide a Merkle proof to show that their coins are contained in the Merkle root. Their UTXO is verified and then deleted. A new UTXO is created for the recipient of the transaction, which is added as a new leaf to the Merkle tree. The leaf is then hashed, combined with all other UTXOs, and the Merkle root is updated.

Consider the following simplified example with a Merkle Tree of 8 elements (or UTXOs). Numbers 1–8 represent UTXOs, while h1-h7 represent hashes. Utreexo nodes would only keep h7 (the accumulator root) and they discard all other hashes after validation.

Figure 4: Merkelised UTXOs

A Merkle Tree with 8 leaves. Each UTXO is hashed and concatenated. The hashes are combined together until we get a single root, the accumulator root. The accumulator root (and wallet information) are the only things stored by Utreexo nodes. As a result, a full node can be less than one kilobyte of data.

Let’s say a user wants to spend UTXO 7. They need to prove that the transaction exists by providing leafs: 7 and 8, and hashes: h3 and h5. The validating node would then create a separate tree with the received hashes. The verifying node can then calculate the empty spaces h6 and h7. If the tree’s root, h7, matches the root stored, then the transaction is genuine.

Now let’s say that 7 is deleted from the Merkle tree in Figure 4, we would end up with a forest of Merkle trees and three Merkle roots which would be stored by the accumulator (as shown in Figure 5):

Figure 5: A forest of Merkle Trees

Once element 7 is deleted, the Merkle tree becomes a forest of Merkle trees, with three accumulator roots to be stored by Utreexo nodes.

In practice, there are over 60 million UTXOs which would result in a forest of Merkle trees that move around as new blocks are produced, where new leaves are added and others are deleted as UTXOs are created/spent, but all of this can be computed fairly quickly by computers.

What are the Benefits of Utreexo?

With a Bitcoin full node, all the UTXOs in existence have to be stored which amounts to around 4GB. But with Utreexo, it reduces the storage requirements from gigabytes to kilobytes since you can run a full node only by storing the root of the UTXOs. An advantage is that the initial block download happens solely on RAM since the Utreexo state is less than kilobyte and there is no querying the disk during this process.

Another advantage of Utreexo is that the state size can fit onto a QR code when it is that small. You could sync a full node on a computer and then copy the entire state to a mobile phone. Instead of the mobile phone downloading the entire blockchain and syncing for few days, you can simply scan QR code and it will sync your phone up.

Finally, no soft forks are required, in contrast to other accumulator constructions such as the RSA accumulator as proposed by Boneh and others.

Getting Utreexo Off the Ground: Bridge Nodes

So far, we’ve seen how Merkle trees are applied to UTXOs to enable spenders to provide proofs and reduce the storage requirements for a fully validating node. However, to get Utreexo off the ground, the nodes need to validate transactions that do not submit any proofs derived from the forest of Merkle trees (i.e., transactions validated by existing full nodes).

Since users will have to opt-in, most transactions will provide none of the required proofs for Utreexo once it is available on Bitcoin’s mainnet. The upside of this is that there is no soft fork required to make this improvement to Bitcoin. However, bridge nodes are needed to maintain a proof for each and every UTXO to ensure that Utreexo nodes can validate all transactions, irrespective of whether a proof is provided.

What bridge nodes do is store the entire forest of Merkle trees and hence all the proofs. They can provide any proof instantly and give it to the node that needs it. For Utreexo to be feasible, not that many bridge nodes are necessary. Figure 6 illustrates how bridge nodes function:

Figure 6: Bridge nodes

Bridge nodes appear as full nodes to other full nodes, while they appear as Utreexo nodes to other Utreexo nodes. By providing proofs for transactions on the old network, bridge nodes enable Utreexo nodes can validate all transactions, regardless if they have provided a proof or not.

Bridge nodes prove a UTXO spent in a transaction corresponds to a leaf in the forest of Merkle trees, bridging the new nodes using the hash-based accumulator with regular transactions on the old network. By talking to full nodes, bridge nodes can store the entire forest of Merkle trees so that Utreexo nodes can update their accumulator roots as new blocks come in.

A bridge node cannot be dishonest; if it provides a false proof, the hashes will not match. Because they have to store every single proof, the storage overhead is larger than for a full node, and could be as much as double the size of the current blockchain (~600GB). However, this can be optimised by caching and batching proofs.

Most of the tree will not change that much since the lifespan of most UTXOs is relatively short (around 40% live less than 20 blocks). The scalability of Bitcoin and the speed of syncing can both be improved by removing dormant UTXOs.

Therefore, parts of the tree that do not change can be cached and parts of the tree that recently changed can be remembered. The more memory you dedicate to caching these proofs, the fewer proofs you have to download. With a few hundred MB of memory dedicated to caching, it reduces significantly the amount of data that needs to be downloaded.

Current Status of Utreexo

Currently, the Utreexo code is implemented into the btcd node software, with work being done to optimise it and ironing out any bugs. Developers will eventually incorporate it into Bitcoin Core, the most popular node software, so that users can start running nodes on older devices with a tiny storage overhead.

However, it’s important to note that Utreexo comes with a trade-off between bandwidth and storage requirements, meaning it is only worthwhile for individuals who have access to fast internet speeds, but have low storage capacity. When doing the initial block download with Utreexo, it requires around 20% more bandwidth to download the proofs and bridge nodes require additional storage to store the proofs.

The promise is that full nodes will be easier for people to run with the drastically lower storage requirements. Although, there is a lack of any financial incentives, so it remains an open question whether the reduction in storage requirements will be enough of a push for more users to run their own node. But by reducing the barriers involved in setting up and running a node, the hope is that Utreexo will incentivise more individuals to use Bitcoin in a trustless manner, making the network more decentralised and robust.

If you have any feedback or questions, please leave a comment below.

About Interdax:

Interdax — level up your trading in our next-gen digital asset derivatives exchange.

Sign up and register today to receive a 5% margin bonus. Join our referral program and earn 20% commission.