Encoding Adjustment Algorithm

A Market Driven Script Scaling Solution

Harry Barber
Coinmonks
Published in
7 min readNov 12, 2018

--

Bitcoin Cash is about to implement the powerful opcode OP_CHECKDATASIG, opening the door to a wide range of new brilliant applications to foster adoption.

Some of the arguments against its implementation fell flat, whereas others, alluded to valid questions concerning the underlying philosophy of opcode implementation:

  • Should opcodes be small atomic blocks of computation or should they include large complex operations?
  • What criteria should a specific function meet before it is implemented as an opcode?
  • Is indefinitely hardforking to add successively more opcodes viable?

RISC vs CISC

In a perfect world, opcodes would act like a processor instruction set and high-level languages would be developed on top of this instruction set to create applications. Alas, the real world is not so beautiful — opcodes have a persistence that processor instructions are not burdened by; an opcode must not only be executed, but broadcast across vast distances and stored indefinitely on hard drives. They are subject to the overheads of bandwidth and storage.

If we agree that applications like signature verification should be cheap and accessible then we can’t possibly burden them with the overhead of a megabyte sized script unrolling a complex program. Not only does it impose immense expense on such applications, given the current network bottlenecks, it also prevents them from happening at all for quite some time.

The complex instruction ethos wins outright here — one should implement an opcode to remove this ridiculous overhead acting as a barrier to signature verification.

However, the same demon lurks around the next corner: what happens when the next application, for example SNARK verification, starts to become sought after? What happens when the users are enjoying the new SNARK opcodes but lust for the next 10,000 opcode script? As many have rightly pointed out — the network will become increasingly hard to hardfork as it grows in size. It is not viable to continually introduce new opcodes. One must draw the line somewhere or, taper off the frequency of their introduction.

Encoding and Compression

Encoding is a way of rewriting data, compression is an encoding which reduces the size of the data. Decoding and decompression are their inverses.

In many compression schemes there are two separable operations:

  1. Create the code “dictionary”.
  2. Apply the dictionary to data to encode it.

Typically, step one is computationally expensive while step two is fast.

Many compression schemes create the dictionary on the fly while the algorithm is traversing through the data, this is a “dynamic” dictionary. Alternatively, compression schemes can utilize a precomputed dictionary. Many use a combination of the two.

Without a precomputed dictionary, compression on small items is feeble to say the least.

7zip on highest settings “compressing” a 518 byte transaction to 634 bytes

In the circumstances where they are applicable, the introduction of precomputed dictionaries have advantages over entirely dynamic dictionaries.

  1. Compression of smaller sized data (e.g. transactions) improves.
  2. Reduction in run time during compression/decompression.
Example from Zstandard

For those of you technically inclined and interested in powerful techniques for dictionary construction I recommend Effective Construction of Relative Lempel-Ziv Dictionaries by Kewen Liao. et al.

Encoding Adjustment Algorithm

In BTC, the Difficulty Adjustment Algorithm (DAA) grabs the time taken to mine the last 2016 blocks and uses it to calculate the difficulty (and target) for the next 2016 blocks. As nodes have consensus on blocks, they come to consensus on the difficulty also. More generally, any value deterministically generated from blocks can additionally be in consensus across nodes.

An Encoding Adjustment Algorithm (EAA) would work in this manner. Nodes would train a compression dictionary on some deterministic sample of the last N blocks every N blocks and hence agree upon said dictionary.

Alice uses her dictionary to compress a transaction/block and she then sends the data across the network to Bob. Bob decodes it using his identical dictionary.

By leveraging the fact that Alice and Bob have a common precomputed dictionary we have ensured faster compression and decompression with higher compression ratio.

2nd-Layer Consensus

Such a system need not (and should not) be implemented at the base protocol. Compression and decompression can occur right at the edges of the node software. It can be entirely opt-in and compressed transactions may be flagged as such.

Supply, Demand and Proxycodes

Currently nearly every transaction contains a P2PKH in every output. Any compression dictionary would certainly flag this pattern and hence the compression would replace the 5 bytes of opcode with a single proxycode. In the future the same principle would apply to more complicated patterns of opcodes, massive scripts being reduced to a handful of proxycodes encoding frequently occuring patterns.

These proxycodes have no need for central governance — they are implemented completely autonomously based on demand via the training algorithm. That is, if a new application is invented which requires a specific pattern of opcodes a proxycode will be supplied by the dictionary to meet this demand. Similarly, if the demand for a specific pattern of opcodes falls, then the dictionary will cease to supply associated proxycodes.

If a high-level language was developed for creating scripts and became popular then it would be complemented by the dictionary.

Short Cryptographics

Not just patterns of opcodes will enjoy shortened representations. When considered in isolation cryptographic strings are very high entropy (hashes being maximally so by construction), across blocks, cryptographic strings will repeat. For example, in the case where a specific oracle repeatedly signs scripts with a pubkey or a large company tags their OP_RETURNS with a specific identifier then these strings will warrant a “speed dial” in the dictionary.

Supplemented by Governance

This model is still compatible with the introduction of opcodes by developer teams if ones roadmap insists on it. Suppose an immediate demand for a specific function arose, the EAA would allow the market to begin implementing the function without massive overhead before the release date of the consensus level opcode.

Supplementary to Pricing Heuristics

As script sizes grow, the satoshi/byte method of pricing scripts will become increasingly poor at estimating the true price of execution. Pricing heuristics can be designed with respect to the dictionary of proxycodes. An adjustment to the encoding dictionary can be matched with a predictable recalculation of the pricing (instantiated by some sort of functor for you functional programming/category theory fans). This would enable pricing heuristics to evolve naturally while minimizing overhead and maximizing applicability.

Orphans and ReOrgs

If two nodes fail to agree on the blocks within the N block sampling window then they will fail to agree on a dictionary and hence send garbled transactions to one another.

This is easily rectified by preventing the training set from being sampled from the most recent blocks in the N block window. This padding should be chosen sufficiently large to allow negligible probability of failure of consensus on the dictionary.

Compress Only Once

Only the originator need compress the transaction/block, everyone else need only decompresses it. This brings into considerations of data compression symmetry. As a point of reference Zstandard’s decompression speeds are ~3 times that of it’s compression speeds.

SPV Compatibility

The scheme is SPV compatible — wallets send requests for the latest dictionary when their dictionary has expired. Unlike block headers the entire chain of dictionaries need not be transfered.

Encoding of a transaction should be fast enough for even mobile wallets to perform in a matter of milliseconds. Dictionaries should be sufficiently sized to ensure good compression, but small enough to transfer to wallets every N blocks without significant overhead. Dictionaries can be made very small (bytes/kilobytes) while still ensuring great compression.

Different dictionaries can be used to suit specific needs. One could imagine a node serving a specific dictionary to a SPV user based on that wallets activity. Again, considerations to data compression symmetry must be taken into account.

Blocks are Filters and a Few Transactions

With set reconciliation techniques such as Compact Blocks, Xthin and Graphene, nodes no longer need to relay the entire block, instead relaying the few transactions missing from the receivers mempool along with order information.

Suppose Alice is trying to send Bob a block via Graphene. Upon “opening” the IBLT it is revealed that Bob is missing very few transactions from his mempool (the likely case). These few transactions need to be relayed from Alice to Bob and will likely make up a considerable percentage of all data relayed. Given the high entropy of these transactions compression without a precomputed dictionary would be lame and hence EAA may help considerably with block propagation too.

Concluding Remarks

We have laid out the basis for a scheme to reduce network/storage overhead guided by the market demand for specific patterns within transactions and blocks.

The next step is modelling of such a system to gauge the compression ratio and compression and decompression speed. We aim to perform these tests on a variety of blockchains to gauge performance. Anyone willing to donate data from their full node is appreciated.

Get Best Software Deals Directly In Your Inbox

--

--