CodeChain’s Merkleized UTXO Set

Kwang Yul Seo
CodeChain
Published in
3 min readMay 2, 2018

Bitcoin UTXO Set

Bitcoin makes use of the Unspent Transaction Output (UTXO) set to keep track of output transactions that have not been yet spent. The UTXO stores all the required information to validate a new transaction without having to inspect the full blockchain. When a new transaction is created, UTXOs are used to claim the funds they are holding, and new UTXOs are created. Therefore, the data structure of this set directly impacts the performance and storage requirements of Bitcoin.

Bitcoin Core has constantly been improving the internal format of the UTXO set in favor of better performance both in reading time and memory usage. Analysis of the Bitcoin UTXO set succinctly describes the Bitcoin Core 0.15 format of the UTXO set as follows:

This new format uses a per-output model in contrast to the previously defined per-transaction model, that is, every entry in the chainstate now represents a single UTXO, instead of a collection of all the UTXOs available for a give transaction. To achieve this, the key-value (known as outpoint-coin in the source code) structure has been modified. Keys encode both the 32-byte transaction hash and the index of the unspent output, preceded by the prefix “C”. Regarding coins, each one encodes a code, that contains metadata about the block height and whether the transaction is coinbase or not (notice that the transaction version has been dropped), a compressed amount of bitcoins, and the output type and script encoded in the same way as the version v0.14.

In other words, an entry in the UTXO set represents a single UTXO which contains the information required to validate a new transaction without access to the blockchain. However, this is not the end of the story. One common argument against UTXOs is that the UTXOs are unnecessarily complicated and there is some truth in it.

Because Bitcoin uses Proof of Work consensus, there are occasional chain reorganizations. When a fork becomes the canonical chain, some blocks are retracted. The UTXO set must be able to undo all the changes made by the transactions included in the retraced blocks. Thus, Bitcoin Core keeps track of undo data for each block and applies undo data to revert changes made to the UTXO set by the retracted blocks. This complicates the implementation of the Bitcoin Core UTXO Set.

Merkleized UTXO Set

CodeChain implements the UTXO set on top of Merkle Patricia Trie. MPT has been adopted by Ripple and Ethereum and is used to store all account state, as well as transactions receipts in each block. The key properties of MPT are as follows:

  1. Every unique set of key/value pairs maps uniquely to a root hash, and it is not possible to spoof membership of a key/value pair in a trie (unless an attacker has ~2¹²⁸ computing power)
  2. It is possible to change, add or delete key/value pairs in logarithmic time

MPT is a key-value pairs map which has some nice properties. One property that is particularly useful in the context of UTXO set implementation is that we can safely go back to the previous state without keeping track of any undo data. For example, Ethereum can quickly go back to the previous state by switching the state root to the state root of the previous block as depicted below:

CodeChain stores UTXOs as well as account state and asset scheme in MPT. The keys of UTXO entry encode both the 32-byte transaction hash and the index of the unspent output, preceded by the prefix “A”. Values contain the information required to validate a transaction such as an asset type, lock script hash, lock script parameters and amount. Thanks to the nice property of MPT described above, CodeChain does not need to keep track of separate undo data. MPT already contains all the necessary information required to go back to the previous state including account state, asset scheme, and UTXOs.

In summary, CodeChain implemented the UTXO set on top of MPT so that it has become easy to restore state when chain reorganization occurs. We will share performance and memory usage data in an upcoming post. Stay tuned!

Check out the source code at our github repo. If you would like to chat with the CodeChain developers’ community, join our gitter channel.

--

--