Member preview

Toccata & Fugue in the Key of Blockchain

Variations of Blockchain Based Protocols

This article assumes some basic knowledge about blockchain and is best read after reading my last article on how Bitcoin creates trust. This gets into some computer science-y stuff — if you feel it’s particularly opaque, please let me know.

Blockchain is probably most familiar from the role it plays in cryptocurrencies, where it stores the distributed ledger. The term blockchain came into use after Bitcoin called its main data structure a chain of blocks. However, the general form of the data structure existed since 1992 [1]. The data structure was cemented when Haber & Stornetta wrote “Secure Names for Bit-Strings” in 1997 [2]. Eleven years later, the first modification to the structure was made, adding the nonce for Bitcoin’s proof of work scheme [3].

The origins of the underlying data structure has everything to do with ensuring a document or set of documents can be proved published in a particular form at a certain time. In this context, a document could be an image, audio file, movie, or other computer file. This is incredibly general and adaptable to many different circumstances.

The structure is really simple, though the nomenclature makes it sound scary. It is a directed acyclic graph (DAG) with a Merkle tree at each node [see note 1 for a verbal explanation]. For a visual reference, below is the visual representation of the structure. The node IH(t-1) has the same tree structure below it. The connections between each IH(t) to IH(t-1) define the directed acyclic graph. Only the leaf nodes, which a labeled with single letters, have data — the rest of the nodes are just hashes which verify the content in the layers below. In reality, the Merkle tree is always modified to help gain some efficiency.

The basic structure for blockchain. It shows the link from the current node to the previous one along with the Merkle tree contained in the current node. This is from Secure Names for Bit-Strings [2].

Cryptocurrency

There are plenty of articles detailing how Bitcoin generates trust. The gist is all nodes use the relationships between the tokens, transactions, and blockchain to verify the transactions. If there is ever a branching of the train the nodes keep working until one branch of the chain gets longer and then all nodes use that branch as the reality of the network. When the next branch happens, rise and repeat. Proof of work is only a small part of why the chain is trustworthy, as there are many other constraints. As has been demonstrated by other tokens, there are other methods of proving trustworthiness to write to the blockchain, such as proof of stake [4], proof of space [5], verifiable delay functions [6].

Git SCM

There is a project called Git, invented by Linus Torvalds in 2006 [7][8]. Git is used by many programmers for managing changes to source code. It actually uses a lot of the same structures. Git is a super set of blockchain. It has more structures in it, including a second DAG for notes. Each branch of a repo is a pointer to a particular commit (node in a DAG). While in normal use a branch will often share much of the DAG from another branch, it doesn’t have to share any part of the DAG.

I had this idea earlier this month and posted a thread on Twitter about it.

Git stores all of the files in a particular commit as Merkle tree like structure. This means its primary data structure is a DAG of trees— just like Bitcoin’s primary data structure. It does differ in that while Bitcoin requires the entire history of the DAG to be downloaded, Git can technically only download a snapshot of a specific point in the DAG and incremental updates are possible, because it doesn’t store the files in the tree — just a hash of the blob of content. In this particular case, using a Merkle tree inspired structure allows for quickly determining changes between each tree structure [see note 2].

The green blocks represent the DAG. The yellow blocks are the trees. The blue blobs are the file contents.
Trust is manually negotiated in Github.

Git relies on trust being built between people and people engaging in dialog to reach consensus on which branch should be used. Typically, this is the longest branch, but there are many ways of using Git which do not require the longest branch to be used. If someone wanted to do so, they could make automatic merge tool by incorporating tools like acceptance tests which tie into criteria and tools like logic and theorem provers, but many times merging or rebasing code requires manual intervention. The basic mode of trust as trust by negotiation — which can’t be automated.

Obviously, this model doesn’t fit with every use case. It works well in cases where adding or changing something needs many eyes on it. It is the exact opposite of the automated scheme used in cryptocurrencies.

Web of Trust Models

PGP is already the basis of trust used for negotiating cryptocurrency transactions. The basic way of establishing trust when using PGP is by establishing a web of trust [9]. In a web of trust situation, people sign each others’ keys in order to attest to the veracity of their identity. This removes the reliance on a trusted central certificate authority commonly found in Public Key Infrastructure. An analogous solution for HTTPS is FOAF+SSL [10], where the centralized certificate authorities are replaced by individuals and their connections.

In a block chain application where there is no risk of double spending, such a blockchain based social network, a form of this web of trust itself can determine who can read from or write to a blockchain. This means any proof of X — work, stake, space, etc— is completely abandoned in favor of manually managed trust. One might think this is ripe for abuse, but one could easily build in a penalty for actors which abuse the system. Perhaps actors which spam or abuse the system could be given some proof of work as a punishment. Perhaps there will be no way of sending messages between people who aren’t trusted. Additionally, trust could completely withdrawn from repeat offenders.

Non-global Blockchain

In a case where not everyone needs a copy of a global ledger, many individual blockchains could exist. Applications which manage financial transactions can use a distinct blockchain as an audit trail for each entity. Anyone who wanted to audit the transactions across multiple organizations within a blockchain could become nodes in the network, validating each transaction and making sure no one can rewrite the blockchain removing a transaction.

Imagine non-profits keeping themselves publicly accountable through integrating all financial applications with a public verifiable audit trail for expenditures and fund raising through a non-profit specific blockchain. The bank for a non-profit would have to write all transactions to the blockchain. Any fundraising platform would do the same. Any party withholding transactions which balance the ledger can be assumed to be dishonest. As long as the accounting is completely transparent through receipt level data, honesty can be determined. So long as anyone can manually aggregate transactions, there is a chance for dishonesty.

A social network could create group and individual blockchains. Individuals would have their own blockchain to which only they could write. Groups could have their own blockchain to which only trusted members of the groups could write. Each pair of trusted friends would also have a blockchain which would represent the history of their relationship. Though, perhaps the blockchain structure is a bit ridiculous in that specific context of trusted pairs unless a verifiable audit trail is desired.

This of course brings up the interesting counterpoint to shrinking the number of copies of a blockchain and known participants on the blockchain: it decreases anonymity. Instead of duplicating a blockchain and all messages across a broad swath of nodes, limiting it to a small subset makes it easier to track who is communicating to whom — which could be highly undesirable in certain circumstances.

Summary

  • The blockchain data structure, is flexible and can be adapted to many applications and trust models.
  • Ever since Bitcoin first came out, there was already an example of an application with similar data structures with the exact opposite trust model.
  • Exerting energy through proof of work is not good for all models, even though it is the more trustworthy of the programmatic methods of generating trust (PoS, PoStorage, PoDelay, etc.).
  • A middle road is a proof of trust model where nodes that are trusted by a group of people are able to write to a particular blockchain.
  • Blockchains do not have to be globally distributed, and in some cases should not be globally distributed.
  • Blockchain is perfect for creating an audit trail which can be verified across multiple concerned parties/nodes.
  • Decreasing the number of nodes with a certain blockchain and participants on a blockchain decreases anonymity.

Sources

  1. Bayer D., Haber S., Stornetta W.S. (1993) Improving the Efficiency and Reliability of Digital Time-Stamping. In: Capocelli R., De Santis A., Vaccaro U. (eds) Sequences II. Springer, New York, NY DOI=https://doi.org/10.1007/978-1-4613-9323-8_24
  2. Stuart Haber and W. Scott Stornetta. 1997. Secure names for bit-strings. In Proceedings of the 4th ACM conference on Computer and communications security (CCS ‘97). ACM, New York, NY, USA, 28–35. DOI=http://dx.doi.org/10.1145/266420.266430
  3. http://article.gmane.org/gmane.comp.encryption.general/12588/
  4. https://en.wikipedia.org/wiki/Proof-of-stake
  5. https://en.wikipedia.org/wiki/Proof-of-space
  6. https://link.springer.com/chapter/10.1007/BFb0055485
  7. https://marc.info/?l=git&m=114685143200012
  8. https://en.wikipedia.org/wiki/Git
  9. https://en.wikipedia.org/wiki/Web_of_trust
  10. https://www.w3.org/wiki/Foaf+ssl

Notes

  1. To unpack that:
    • graph —a bunch of dots connected by lines, it’s a mathematical term for a structure of vertices linked by edges.
    • directed — the lines(links or edges) between dots(nodes or vertices) in the top level data structure flow in a particular direction.
    • acyclic — the links between nodes can never form a circle.
    • tree — a graph with a root and one or more child nodes; child nodes with no children are call leaves.
    • Merkle tree — a tree where only the leaves contain the actual data and each of the other nodes contain hashes of the data.
  2. When the two root nodes are identical, we can be reasonably sure the content is the same and hasn’t changed. If they are different doing a simple breadth first traversal of both trees will allow finding the particular child nodes which have changed. For a simple example, while not being exactly a Merkle tree, one can see the rough sketch for a distributed profiling system I was writing (please do not use this in production, it will kill your service — it needs to be rewritten in C and might need a kernel module). It stores parts of ProcFS in a Merkle tree like structure. If it were truly a Merkle tree, it would put all of the properties in leaf nodes in a balanced tree.