Recently there has been a proposal to have UTXO based architecture as the fabric of the Hyperledger project. This piece will describe what UTXOs are and contrast the costs and benefits of UTXO architecture.
Thoughts on UTXOs by Vitalik Buterin, Co-Founder of Ethereum
What are UTXOs?
In Bitcoin, the way that a transaction actually works “under the hood” is that it consumes a collection of objects called unspent transaction outputs (“UTXOs”) created by one or more previous transactions, and then produces one or more new UTXOs, which can then be consumed by future transactions. Each UTXO can be thought of as being like a “coin”: it has a denomination and an owner, and the primary two rules which a transaction must satisfy to be valid are that (i) the transaction must contain a valid signature for the owner of each UTXO that it consumes, and (ii) the total denomination of the UTXOs consumed must be equal or greater than the total denomination of the UTXOs that it produces. A user’s balance is thus not stored as a number; rather, it can be computed as the total sum of the denominations of UTXOs that they own.
If a user wants to send a transaction sending X coins to a particular address, it may sometimes be the case that some subset of their UTXOs has a combined denomination of exactly X, in which case they can create a transaction that consumes those UTXOs and creates a new UTXO of value X owned by the destination address. When no such perfect match is possible, the user must include input UTXOs with a combined denomination greater than X, and add a second destination UTXO called a “change output” that assigns the excess coins to an address controlled by themselves.
Benefits of UTXOs
The UTXO model has been popularized recently because of its adoption in Bitcoin, and some private blockchain users are also using it; Hyperledger’s reasoning to switch to UTXOs was as follows:
We are also switching from our simplistic notion of accounts and balances to adopt to de facto standard of the Bitcoin UTXO model, lightly modified. While Hyperledger does not use Bitcoin in any way, the Bitcoin system is still extremely large and innovative, with hundreds of millions of dollars invested. By adopting the Bitcoin transaction model as standard, users of Hyperledger will benefit from innovation in Bitcoin and vice versa, as well as making Hyperledger more interoperable.
Aside from “Bitcoin network effects”, one can make some technical arguments for the UTXO model; one particular argument is that it allows transactions to be processed in parallel, as if a transaction sender creates two independent transactions they can take care to spend separate UTXOs, and so those transactions can be processed in any order. This order invariance and parallelizability property may also lead to scalability benefits. There are also some privacy benefits to having one’s coins split up, particularly if each UTXO that a user receives uses a different address for which the private key can be deterministically generated by the owner through a master seed, although the privacy gains are easily broken if the user is not careful about keeping their funds separate, and it is this author’s position that if privacy is strongly desired than the separation provided by UTXOs is vastly insufficient for the task and more complex constructions such as ring signatures, additively homomorphic value encryption and ZK-SNARKs are needed.
Why not UTXOs?
The core of the argument against UTXOs has two parts:
1. UTXOs are unnecessarily complicated, and the complexity gets even greater in the implementation than in the theory.
2. UTXOs are stateless, and so are not well-suited to applications more complex than asset issuance and transfer that are generally stateful, such as various kinds of smart contracts.
To see the first argument, consider how you would write an implementation of a wallet in a UTXO wallet — particularly, the function that generates a send transaction. This function requires as input not just the private key to an account and some trivial data such as a sequence number, but rather the entire set of current unspent UTXOs that belong to that account. The function must then take the set, and determine a subset whose value is greater than the desired output amount to use as inputs. If multiple minimal subsets exist, then there is the sometimes complex task of deciding which one to use.
Additionally, if a wallet wants to actually benefit from the parallel transaction inclusion property of UTXOs mentioned above, that wallet must take care to split up change outputs so that the wallet always has multiple change outputs which it can use as a source for funds; if a wallet only controls one large change output from which a small amount is always being drained to make the next payment, then the scheme becomes sequential. This is not purely theoretical; a majority of Bitcoin wallets still fail to make this optimization, essentially nullifying the parallelizability gains of UTXOs versus an account and sequence number model.
In the case of Bitcoin (and realistically any public blockchain), where transactions must pay a per-kilobyte transaction fee, the UTXO selection algorithm must additionally take care to optimize average long-run UTXO-per-transaction consumption, and there even arises a denial of service vulnerability as an attacker can spam a wallet with small UTXOs whose value is smaller than the marginal fee that is needed to spend them. Even outside of this, the presence of per-kilobyte fees introduces a wrinkle into the UTXO selection algorithm: it may be the case that UTXO subset S is sufficient to pay a desired amount X, but then the size of S requires a transaction fee F, and S is not sufficient to pay X + F, so then S needs to be expanded to S’, but then the size of S’ requires a transaction fee F’, requiring a UTXO subset S’’, etc. In short, with accounts and sequence numbers, creating a wallet is a high school problem, whereas with UTXOs it becomes closer to an undergraduate research level challenge.
It is clear how UTXOs do not mesh well with stateful smart contracts: if there is a need to create a contract with multiple phases, eg. where multiple parties must provide some form of input, then after some period of time those parties must perform some additional operation, and then finally the contract disburses funds as a function of those operations, then it is difficult to see how to fit that model into fundamentally stateless objects that can only be spent or not spent. In an account-based model, however, this is easy: one can instantiate a contract which has the desired code, and then this contract can be called by its static address.
To give another example, one potentially desirable use case is the ability to prevent asset theft by introducing a “recovery key” which would be stored in a secure location and could reverse transactions from your main account within some particular period of time. In a UTXO model, even an expanded one where transaction outputs can impose requirements on the destination addresses that they are sent to, this is a challenge worthy of an academic research paper. In an account-based model, this can be implemented in 20 minutes of coding time via a smart contract that simply implements the rules directly. Restricting asset ownership to a specific set of parties (eg. KYC’d users) is another example which can be managed with some complexity with covenants, but which is a simple code writing exercise in a smart contract account-based model. It is worth noting that these are advantages of a stateful scripting language more than they are benefits of accounts vs UTXOs; without a stateful scripting language (eg. in NXT) account-based models also cannot easily do any of this. However, the notion of a statically addressable object makes the logic around implementing these stateful systems in practice much simpler and more developer-friendly.
Can we have both?
In the current Ethereum implementation, we have an explicit protocol-level notion of accounts and sequence numbers on transactions; hence, we have made the choice for our users that this model is the one that is used to secure accounts. In the next major release, Serenity, we are planning an abstraction model that moves this choice down from the protocol level into the EVM; essentially, each user will be free to decide for themselves what mechanism is used to secure their account. This opens the door to innovations such as k-parallelizable nonces (essentially a scheme that combines a nonce with a k-bit binary filter that preserves the property that nonces are single-use but allows users to use future nonces ahead of time, allowing up to k transactions to be processed in any order), and even allows users, if they wish, to build schemes based on UTXOs.
In later versions of Ethereum where we target scalability through sharding, there is a cross-shard asynchronous calling scheme where if a contract (the Ethereum terminology for an account controlled by a piece of code) wishes to call a contract in another shard, that contract creates a “receipt” in its shard, which can then be verified through Merkle tree branches by contracts in other shards. This receipt concept in fact essentially merges the notion of an asynchronous function call in progress and a UTXO: if the function call in question is a value transfer, then the function call in progress literally is a UTXO — albeit a vastly more generalizeable version of one. Hence, once all of these protocol changes are implemented, Ethereum will support both the account model and a UTXO model in multiple forms, allowing users to get the benefits of whichever one they think is best for any given application.