There’s been some discussion and concern recently around BSV data permanence. One of BSV’s main features is the ability to store large data in transactions. The appeal is that once the transaction is paid for it will be in the blockchain forever. However, it is becoming apparent that miners may not necessarily keep the data as it isn’t required for them to mine blocks.
In this article I describe some of the issues and perspectives surrounding this issue and propose a potentially simple change to the mining algorithm which could solve the problem.
- Non-permanence of the current data storage approach
- Spendable data carrier solution
- Which node software?
- Economic forces against storing data
- Economic forces for storing data
- Potential solution: change the mining algorithm
Non-permanence of the current data storage approach
To validate new blocks, the Bitcoin node only needs to know about all of the currently unspent transaction outputs (UTXOs). This is not the entire blockchain. Shadders (CTO nChain) tweeted recently that there are two miners currently operating with pruned nodes:
This essentially means that these two miners are not storing any of the data stored on BSV!
The reason why data stored on BSV is not being stored by pruned miners is because the current approach to storing data (OP_RETURN) doesn’t end up in the UTXO set. The current OP_RETURN approach doesn’t allow coins to be sent to it, as a result the output has no value that can be spent, therefore is not part of the UTXO set and is pruned.
Spendable data carrier solution
At the CoinGeek Seoul conference, Shadders gave an example of spendable data carriers which will be supported in the Genesis node upgrade coming in January 2020.
A spendable data carrier combines a typical Bitcoin output (i.e. sending coins to an address) with OP_RETURN-style data. Two approaches were presented:
- Append OP_RETURN data to end of payment script
- Include data in payment script but OP_DROP it immediately so it has no effect on the script.
Because the outputs have coins associated with them, they are now part of the UTXO set and won’t be pruned by the node software.
Which node software?
Bitcoin is a protocol, not a piece of software. However in the case of BSV, currently all miners run nChain’s BSV node software. There are good reasons for this. nChain is making radical changes to restore Bitcoin functionality and it is primarily nChain who have the resources to appropriately test these changes. Miners, for the most part, are not in the business of software development and would most likely use nChain’s software unmodified, apart from configuration options.
One configuration option is to run in pruned mode. This is relatively easy for miners to do. If we store data in spendable outputs, pruned nodes will now retain the data, there is no configuration option to remove that data.
Economic forces against storing data
At the moment the BSV blockchain is still relatively small (<200GB). The entire BSV blockchain could be stored on a USB drive or a laptop. There isn’t really any economic rationale for miners to run in pruned mode, currently.
However, nChain is talking about teranodes, or terabyte-sized blocks. From a storage perspective, a new 1TB drive would be required every 10 mins to store the chain. With 52,560 blocks produced each year, 52 PB of storage are required each year.
Amazon’s S3 service charges $0.021/GB. A year’s worth of teranode storage would cost $1.1m. And this is increasing cumulatively each year, so the following year it will be $2.2m, assuming no growth in block size.
Mining is an economic activity. Funds spent on storage diverts funds from hash power, which reduces the miner’s block reward. At some point the economic forces to not store the data outweigh the economic forces to store it.
Can a miner prune unspent outputs? The answer is that the miner can, for the most part, do whatever they like. Take the following example:
- Someone stores 1GB of data in a transaction with 1 satoshi of BSV to ensure that it is in the UTXO.
- Because the data has a value associated with it, it indicates that the data is intended to be stored permanently, as a result this also signals it will likely never be spent. Additionally, the value is so small it effectively has no real value to be spent either.
- The miner is aware of these factors, and simply doesn’t store the large transaction, knowing that it likely would never appear as an input in any future transaction, and even if it did, it would be inconsequential for such a small amount of BSV not to be accessible.
So miners could simply not store large transactions. If a large transaction ever appeared as input in a future transaction (which is likely to be rare) the miner would simply not mine that block.
Klimenos has suggested storing increasingly large amounts of BSV (e.g. 100 BSV) proportional to the size of the data and how long it would need to be stored. My concern with this approach is that it unnecessarily locks up BSV and has no value for the miner storing the data. Also it is imprecise and still risks the data not being stored.
In the medium term, it is unlikely that miners would customise the node software to prune large transactions, however there would be a time, assuming BSV’s success, when it would be economically viable to customise the node software.
Economic forces for storing data
Derek Moore has argued that there are implicit economic forces which will encourage miners to retain the data. In essence, the end-users’ perceived value of BSV is intrinsically tied to the fact that the data that they are paying to be stored will be stored permanently. If the system does not preserve the data, the value of BSV diminishes, which in turn devalues the miners’ reward.
However, this assumes miners are rational. It only takes one miner to not store the entire chain to gain an economic advantage, which can be allocated to greater hash power, resulting in greater block reward. The value of BSV may not be diminished as good-faith miners might still store the entire chain. It creates an inequitable mining environment.
Potential solution: change the mining algorithm
One way to force miners to store all of the data is to change the mining algorithm. The mining algorithm involves performing hashes of the new block to be mined. To ensure the entire chain data is stored, the mining algorithm could be modified to hash the entire blockchain for each new block. This however would be unwieldily and probably unnecessary.
An optimisation would be to only incorporate some or even one prior block with the new block. The prior block would be chosen pseudo-randomly, but deterministically.
A possible selection algorithm could depend on the previous block hash:
SelectedBlockIndex = PrevBlockHash modulo NumberOfBlocks
The new block hash then becomes a hash of the new block data concatenated with the selected block’s data:
BlockHash = hash( NewBlock || SelectedBlock )
This would be relatively simple to implement and may not be more than a few extra lines of code. Here is a possible implementation that modifies the current CBlockHeader::GetHash() function from:
This would force all miners to store the entire chain as they won’t know which blocks they can discard. Additionally, the miners can’t selectively discard a transaction or a transaction output, as the entire block is required for the mining hash.
One of my main concerns with spendable data outputs is that it locks up coins. Additionally, there is no guarantee that miners in the future won’t prune the data regardless of how many coins are stored there.
I think there is a high likelihood that the economic forces are there to incentivise the storage of data. However, as the chain size increases, the economic forces blowing in the other direction will only continue to increase.
The mining algorithm proposed here is a relatively simple solution which avoids needing to lock up coins permanently in data carrier outputs and doesn’t require any modification to how data is currently being stored. It provides an algorithmic guarantee of data storage.
However, there are possibly quite widespread implications to such a change and it would need to be thoroughly investigated. Nonetheless some kind of algorithmic cryptographic guarantee would be beneficial for BSV where there is widespread expectation of the permanence of data storage.